 Good morning everybody. Hope you're all enjoying your first day here in Paris and welcome to Multitenancy Con. I'll be presenting Kubernetes Multitenancy Security and specifically navigating through DevSecOps visibility and control. So one of the key challenges with, and before we start, here's the agenda, right? We're gonna go through security responsibilities and multitenant environments and how they distinguish from single cluster environments. We're gonna highlight one of the key risk elements and what we're gonna do is build a framework across different risk pillars, right? And also figure out how to tackle those in an enterprise. Then we're gonna figure out the guidance and framework for protection against these kind of risks and go through some examples. And finally, we're gonna figure out how to implement this guidance from code to runtime. That's over archival. So this was a recent stat from Red Hat's Kubernetes 2023 security report. We're aside the top stat over there, 67% reported delay in deployment to Kubernetes security or Kubernetes infrastructure due to security concerns. Show of hands, when it goes from single clusters to shared clusters, for how many of you folks has security been a major concern that is blocking multitenant deployments? Any show of hands? A lot, right? And the challenges, there's tools that exist out there. There's checklists, NSA hardening, these guides, but it's very hard to particularly put those principles into practice. A, because of the lack of skill set sometimes, but B, also because these things are very hard to put into scale, right? And those are some of the things we'll tackle today. The other challenge is now the threat actors in the cloud of Kubernetes environments, they're not Kubernetes security experts too. So if you think about a threat actor, some robber coming into your house, right? I'm a security guy, so I have to think about these things, right? The security person is entering, sorry, the robber is entering and they're saying, which room am I in? What doors can I open? Where are the security cameras? Where are the valuable items? How many floors are in the house? They're thinking like a threat actor and thinking about all the different paths they can explain. And that happens in Kubernetes environments as well. They start with the most specific plot or namespace that may have public exposure. They look at the security cameras, those are the Kubernetes audit logs or the NTF events. They're trying to go for the valuable items, the data and MSD buckets, PII data or secrets. They're figuring out what other rooms that can I get into with lateral movements. And finally, what can I access? The R behind access control. So if you're operating a multi-tank cluster, you have to think about all these things, whether it's security or security. And the responsibilities are larger, right? If it's your own house, right? You may have your family would have access controls that can enter or exit the house, but if I enter my brother's room, it's not a big deal. I can't, if I'm gonna take a hotel or some large office complex, the consequences are severe. If I go in, if it's a government building and I go into some secret room that I don't have access to, that's a major compliance, that's not allowed, right? And so the thing around isolation boundaries, find usage of shared resources and having access controls for every single room. For example, in a hotel or a shared building, computer or own house gets larger. And so even, it was interesting in the last call or the last talk, we talked around the security from an implementation point of view in the Adobe call. I would say that's in some ways very similar, but the responsibilities are what get larger in multi-tenant environments. And that's the challenge, right? So now when we talk about risk and security risk in the cloud, I like to break it down into these four key buckets, right? The first part is about identity, who has access to what? And these risks can come from things like default service accounts, but you also wanna have in the namespace, when you have namespace isolation in a multi-tenant environment, you wanna have access on a per namespace basis. Then there's your supply chain, right? Who's pushing container images? Where are these container images coming from? If there's a vulnerability in my runtime, how do I map it back to which developer or which developer PR cause of given vulnerability? We'll talk about real-life examples where enterprises had to suddenly figure out what to do when, for example, certain packages were compromised. Then there's a network risk, right? Your lateral movement risks inside a cluster, as well as public exposure of pods or namespaces. And finally, your runtime and storage risk, your vulnerable processes or shared storage, secrets, et cetera. And the way to think, you know, what happens is there's great checklist, there's great sources out there that'll tell you, hey, implement network policies, or hey, implement OPA gatekeeper policies. But I think, at least for me, the way you get more appreciation for this is to think about an attack path, right? Where ultimately your attack paths can lead to a stealing of data, data exfiltration, whether that's S3 buckets or secrets. And if you work backwards, you can see that, or start from the top, right? Where initial access to a pod, for example, in Team Coke, and if there's no network policies, for example, you have lateral movements into other namespaces. And then because of misconfigurations, they can get access to the cloud and S3 buckets, or they can get access to secrets that belong to another team, in this case, Team Fanta, right? So ultimately it comes down to understanding the full attack path, right? And then being able to prescriptively apply the right sets of policies, the right sets of guardrails, and continuous monitoring and visibility to be able to remediate these kinds of attack paths. And that's what we're gonna do, right? We're gonna look at these four different types of risks, we're gonna map them to these three phases of the attack path, and then this is something that hopefully all of you can use to implement in your enterprise organization today. And by the way, the initial attack, lateral movements, and data exfiltration, if you look anything up like container, mitre attack framework, right? This is based on most of the common frameworks. It always starts with that initial access. It's then about how the attacker can laterally move within your container cluster or cloud environment. And then finally, what do they steal? The data exfiltration, the credential access. So the first one we'll talk about is identity risk, right? So here's an example of an attack path with identity risk, where by default, if you don't specify auto mount service tokens except a false, it gets set to true and Kubernetes, and so an attacker going in can automatically auto mount into a service token if that pod is exposed. Next, what can happen is, if that service account token is overly permissive, which many times in the cloud it is, because if you look at any managed cloud environment, a lot of times they come with default service accounts. And these default service accounts are very good from a dev ops and dev test point of view. They're not good from a production point of view because they're overly permissive. So now they can, for example, get into other namespaces or have the ability to create privileged pods, right? And one thing to watch out for is just because you have your limiting namespace access from a role-binding point of view, you also have to look at the permissions as well. Many times that is missed where just wildcard permissions are given, and one of those wildcard permissions is actually an escalate permission they can create or escalate a pod to be privileged. And when that pod is privileged, it can move vertically into the host in the cloud and that host has access to all the other namespaces, all the other secrets, as well as those S3 buckets that live in the cloud, which may have your sensitive crown jewel assets. And so these are some examples of how to solve for these across these different buckets in the initial access, lateral movements, and then persistence. For example, always blocking certain privileges like escalate. You don't need to give that in any permission, whether it's a cluster role-binding or a normal role-binding. And what happens is now, when we take a look, this was a nice example in terms of one security company doing research and seeing, for example, that this was a role that looks seemingly benign. It's called system controller, cube controller. It turns out this was an attacker faking a name of a service that we would think is doing some job in the Kubernetes cluster because we have a lot of those running inside of a cluster. But it turns out this was an attacker spoofing basically this name. And so what happens is you have that monitoring and visibility, but how do you make sure you catch these types of attacks? Similarly, a lot of times, if you implement policies, where you don't know how to be able to detect and respond to these kinds of things, then you'll keep implementing more and more policies without understanding how to catch these kinds of attacks. And so that's where the three-stack framework, right? Being able to actually do detections to catch these types of things and being able to monitor that in the control plane and then being able to implement policies that leverage this data and being able to implement those shift-left controls. I would say it starts in this way because what happens is if you try to go from right to left, you're gonna be continuously implementing policy after policy after policy and you'll miss these kinds of things. So how do you act on the real-time data or real-time threat research that it's taking place and then create policies from a shift-left point of view? So let's see if this works. So here, for example, right? You can see that you're catching a detection, right? This is looking at audit logs and it's looking at an over-privileged service account. So you can see it's mapped to a mitre in terms of privilege escalation. And then you wanna look at things like the service... Which service account is this and who created this? Because from there, you wanna figure out what is the risk of the service account and what else can it lead to? So for example, if I clicked this here, we can see that it has all these other permissions, for example, to patch, to create pods. It can suddenly create pods or update that can lead to host-level privileges as well as having access to pods that have vulnerabilities. Another example is you may, in your cluster, wanna see which permissions or which service accounts have wildcard permissions. So this is something, for example, that from a continuous monitoring point of view, you wanna be able to implement in your Kubernetes security environment inside the cluster and, of course, leading out to the cloud as well. And then finally, you wanna be able to create policies, right? So you can use tools like Gatekeeper or Caverna to be able to do this. So that last example that was there, that was creating a policy and then blocking a creation of an RBAC role that had added access. And so there are a lot of open-source frameworks to do this and plenty of examples in the Caverna or Gatekeeper library that you could use to leverage this. But you wanna start with being able to detect first then monitor and act. So next we'll go into supply chain risk. So the supply chain is getting a lot of momentum, right? We see companies like Endor and Chain Guard do wonderful work with basically the concept of secure by default. What that means is how do I use container images that are already hardened, have S-bombs, have zero vulnerabilities from day one, right? Or day zero even. So, but the challenge is, what if they're not using those Chain Guard images? You also still need to be able to detect certain events when those are not taking place or if certain packages that are commonly used are exfiltrated or manipulated. So here what happened, this was a recent attack where an NPM package was purposely manipulated by a developer and that caused basically thousands of applications to have vulnerabilities suddenly, right? And even executing malicious code. So if you're looking at a container image, right? Say you get some Amazon delivery or some box that comes to your house. As a security person, you have to analyze it very deeply, right? So you need to understand what's the content scanned or not, right? Was it scanned from a shift left point of view during when it was stored in a registry or when the image was built in a pipeline? And you wanna be able to scan not just for vulnerabilities, but for secrets or malware as well. Those can be introduced as part of a container image build. The second is traceability. So going back to analyzing what's happening in my runtime environment, because that's what we wanna protect. If tomorrow there's some vulnerability, like a log for shell on a container that was suddenly appearing, you wanna understand where did that get introduced and how? Which developer PR caused that specific vulnerability to get introduced in my container runtime environment? The third part is provenance. And this is a new concept that's now starting to get more attention. There was an attack a couple years ago with CircleCI where the CircleCI itself had vulnerabilities and that's another access entry point to basically, it's not just always about public exposure. It's about when, for example, an image is getting built, what is the risk of my supply chain itself? Does my Jenkins, for example, or my CircleCI have vulnerabilities? Can that make its way to the image and then to the runtime? And finally, integrity. Was the image signed? Who was it signed by? Can I trust the signature? That's where things like cosine, notary, and sigstore are gaining a lot of momentum. And you also see, for example, ChainGuard now being on Docker, the Docker library that was a new announcement, but also Docker trusted images basically having a valid sign or a valid source of truth, just like certificates. So what happens is it really starts again at looking at the runtime. So when I have some vulnerability, for example, on a namespace in a pod, what I want to understand next is how did that vulnerability get introduced? And how was the image built? And that's where you want to look at end to end, going back again from detecting to having that continuous posture. For example, who was it signed by? Was it scanned for vulnerabilities? But then you also want code to cloud traceability as well. And this is where a lot of companies, for example, I'm sure all of you have invested in GitHub security or some version of code security, but you have to be able to map it back to your runtime to get a full picture. So for example, if this image is deployed and this image had vulnerabilities, how do I understand the blast radius? Next, how do I understand which developer introduced the certain vulnerability for this image? And which code does it actually trace back to? This is something that is very, very hard, especially in multi-tenant environments where you have different namespaces, different resources, and you don't necessarily know who has access to what and who introduced what. You also want to be able to map, for example, to different benchmarks like CIS supply chain guidelines, where those tell you, for example, in this case we saw only one reviewer, even though typically you're supposed to have at least two or more reviewers, because sometimes with your GitHub security, if it's not enabled, you can review yourself, and then suddenly this image makes its way through your GitHub actions, through your pipelines, and it suddenly in your runtime when nobody had visibility into the actual code review for that image. This includes bringing all your data, for example, looking at code scanning as well. If you've invested in GitHub security or any code security, understanding how it's making, those checks are making its way to runtime is critical. And finally, looking at key data points like your Jenkins vulnerabilities to understand, for example, when the image was built, not now, when the image was built was my Jenkins insecure because there's some vulnerability that happens tomorrow, and my Jenkins was insecure, that's a breach of compliance, that's a breach of audit, and in multi-tenant scenarios, the stakes are higher. You may have 10 developer pipelines that are owned by 10 different teams, but if one namespace is vulnerable and that developer pipeline is owned by another team, that can still have a lateral movement to other teams as well, inside of a cluster. And so the ability to monitor these and make sure you have consistent control over those developer pipelines is critical. So that was supply chain risk. So so far we've talked about identity risk, we've talked about supply chain risk. The next one is network risk, right? And this one, I think we've, in some ways, beat to death over the last four or five sessions, but talking about things like network policies, right? Which you can use to prevent lateral movements. One interesting insight is we see different companies, especially when they're using something like Silly and Recalco, those come with different enforcement modes, and it's a very sort of strategic decision on which one to implement. A lot of companies end up actually implementing, I've seen always enforcement mode, where they're gonna block everything by default and only open certain traffic flows, but some companies may also implement just the default enforcement mode in Sillium. So looking at those level of detail, not just at the network policies you create is very, very important. Also understanding what are certain pod misconfigurations, and this is where in Kubernetes, you have your pod admission controller, you have pod security policies, where common things like preventing access to the host in terms of your host network privileges, host file system, a lot of times your pod spec will have those and you wanna be able to catch those and then implement shift left controls in your infrastructure as code scanning before those pods get deployed in a pipeline, in your GitOps, et cetera. And finally you wanna be able to, from an initial access point of view, create default deny network policies, as well as prevent default pod exposure. It turns out, even though, like for example, in CIS it says to not enable your endpoint by default, in many managed cloud environments, those Kubernetes endpoints are enabled by default. And there's a recent study done that there's around, I think it was 60 to 70% of Kubernetes clusters just exposed on the internet, just without any security whatsoever in terms of preventing public exposure. So these are the kinds of things that get missed and to be able to again, detect, understand what those misconfigurations are and then implement those shift left policies, whether it's your infrastructure as code, your image builds, anything involving code is critical. And finally, you have your runtime and storage risk and these are some examples. For example, if you're using persistent volumes, make sure they're deleted after use. If they're not deleted after use, they can accidentally get used by some malicious attacker. Another interesting one I'll talk about is web hooks. It turns out web hooks are shared resources or get globally scoped in multi-tenant environments, just a Kubernetes thing. And what can happen is if you don't limit access from a cluster role point of view to that web hook, somebody can go in, exploit that web hook. There is one example of a malicious admin, somebody basically tweaked an admission controller to pull vulnerable images, that was the web hook. So an attacker has all these different doors they can get into and you wanna really be able to think about how you can protect against that. Using security policies, making sure these globally scoped resources are managed only by certain admins at the top. And then of course, for example, from a storage point of view, using dedicated storage classes per tenant, as well as using things like a secrets manager and AWS or vaults to be able to manage secrets in a more encrypted and safe manner. And so end to end, right? If you look from code to your pipeline, which is your CI and registry, to the control plane and the runtime, these are all the things you wanna be looking out for, whether it's your scanning your images, monitoring your RBAC or network policies, from a runtime point of view, using EBPF, using those actual vulnerable processes to understand what is taking place, what kind of attacks are happening right now so I can implement the right sets of guardrails and policies to block any code or block any misconfigurations that led to that attack in the first place. And then finally, being able to make sure you shift left and have the right level of code scanning, the right level of infrastructure as code scanning, as well as that image provenance to make sure you catch that supply chain risk. It's not just the image build, it's actually the risk of your supply chain that can also lead to key attacks. And there you have it. There's a table, you can definitely use this. This is something that it takes the different risks in a multi-tenant environment, but maps it out to these different attacks. And as you're starting to think about implementing these as an organization, what you want to do is talk with your incident and response team, look at the types of threats that they are seeing in their environment, and then work to figure out a strategy saying, hey, if there's some type of attack that we see commonly taking place, why is it happening? Or is it because, are we creating privilege pause because of some automation? Okay, that's a false positive, but that's okay, so this is a way to overall think about it. Thank you for your time, and I'll answer any questions. Can I have any time for questions? Oh, okay, I guess I'll take some questions offline. Sorry, I went a little bit over, but thank you again.