 Hello and welcome everybody to this session. I'm Víctor Cuadrado and with me is my colleague Raúl Cabello and in this session we will talk about coordinate security, pod security policies and what happens after the removal and deprecation so we're gonna see what we have learned from them and what are the alternatives. Okay, so with that said, let's get into it. We hope that you enjoyed this session. So the first thing that we need to say now is that everything is a trailer. Here we have this XKCD comic. I'm not the first one to use it. You see that it's just a loop, no? And the loop starts with everything interconnected which creates bugs and security holes and then we get things unboxed and then everything is more difficult to use and then we're back where we started, no? I think that the tag for this XKCD comic is something like, all I want to do is something, it's a secure system where it's easy to do anything that I want. Is that so much to ask? Yes, it is. Everything is a trailer. It's something that we need to have in mind. With that said, let's move into security in the cluster. So how do we secure a cluster? How do we think about it? We can start by thinking about the forces of cloud native security. Here we have the forces. The first C would be the cloud or the data center, no? So the computer, the networking, the hardware that is behind everything. Then we have the cluster, the Kubernetes cluster, which is the second C. Here is where we deploy our workload, no? It's going to be in our case workload resources. So deployments, replica sets, jobs, demon sets, pods, things like that. Once we have deployed our workloads, those workloads are going to run containers, which is the third C. And that is a bit of sandboxing, no? That wraps our logic. And this logic comes from the fourth C and the last one, which is the code. For this talk, we're going to focus on the cluster and container part, which is the one that is relevant for us. So if we want to secure our workload resources in the cluster, how do we do that? Normally what we do is just have some authentication and authorization for those workload resources. So basically we use RBAC to secure that part of the topic and the problem. So what happens once we are done with that RBAC? And we allow somebody to create a pod, for example. If somebody can create a pod or update a pod, then they can check the spec inside of that pod. And maybe that's a problem because if we remember pods and containers inside have a spec.security context, and there we define the security settings that define the privilege and access control for those containers run inside of the pod. So once somebody is able to change the spec of that pod, then we are in a world of trouble. If we remember the security context, it's going to check things like allow privilege escalation, linux capabilities and things like that. And we need to make sure that people and users and tools cannot change that spec of pods and containers. How do we do that? Well, so far we have been doing it by using pod security policies, PSPs. PSPs come from Quarnetes, they are an entry mechanism, and they are implemented using an admission controller that is inside of the cluster. The admission controller will check every request that goes through it, so basically when somebody wants to create a pod or update a pod, that goes to the admission controller and then the logic for the PSPs is run and then we get the behavior. This logic is run, as I have said, for creating an update of pods, and they check what goes into the spec.security context of pods and containers. Pod security policies apply to all the pods in the cluster and check for all the things that come from the security context. So as we have said, file system of the containers, privilege containers, read-only root file system, up linux, sorry, up armors, linux, things like that. How do they look like? Here we have a minimal PSP. This is not particularly useful, we can see that the kind of resource is pod security policy, and then in the spec we are setting the minimal needed, which in this case does nothing, but we get an idea what's the minimal. How does a useful one look like? Well, here we have a useful one. Here we can see that we are setting so the containers cannot be privileged, we are dropping all linux capabilities, we are only allowing some specific volumes and so on. We can see, for example, that it's also allowing for pods of containers to be open inside of that pod, for example from port 8080 up until, sorry, port 80 up until 8080, everything, all the pods are open. Okay, we have the PSP. How does it work? How does it get applied? We just deploy this and we are done. Not really. Let's see it with this diagram. So imagine that we are Jorge, which is the user, which is on the bottom part, and we as the user want to just create a pod and we just do a Qt control run of the pod and then it gets checked by the PSP and if it's allowed it gets scheduled and if not it gets blocked. But what happens here? What's the rest of the things? Ah, PSPs need to be bound for them to apply to a specific pod, for example. And here it's happening because Jorge is trying to do Qt control run and he is doing it with a service account called WebAppSA. We can check that the admins of the system have created some service accounts, in this case the WebAppSA and have done a role binding of that WebAppSA with the WebApp role and in that WebApp role we are saying that that role can use the PSP and that's how we bind the PSPs. So we want to see how that looks in general. It would be like this. First, for example, we create the cluster role. Here in this cluster role we define the rules pod security policies use and then the list of pod security policies that can be used. This would make these PSPs apply and once we have this cluster role we create the cluster role binding and we bind it to whatever we want to bind it. For example, we can bind it to specific users, for example Jorge or we can bind it to the specific service account, the one that Jorge was using or we can bind it to all service accounts in the namespace, which is what's recommended because as we will see it allows for an easy transition out of PSPs into other things. Now we know how to set up the PSP and how to bind it. Is that enough? Not really. The PSPs have a lot of settings and some of them cause the request for the pod to be mutated and some of them don't and one needs to know them by heart. There's nothing to it. So then we get some rules which basically is, let's say that we have a PSP, you know, or two PSPs. Which one gets applied? First, the rule says that if a PSP can do the thing without mutating the request, that's the one that will get applied. Okay, perfect. That's rule number one. If we will need to mutate, that would go to the second rule, which is if a PSP will mutate, then one of the PSPs would be chosen from the list and then the thing would be mutated. Which one will get chosen? Actually, it's alphabetical in alphabetical order of the name of the PSP, which is a bit, I don't know, clunky if you ask me. But with this comes a mental model. With this rule, suddenly a mental model appears and how people have been using PSPs or how the community has been using them. This implies that we have first an application-specific PSP, which applies to the rule number one, which will not mutate the pod request. For example, let's say that you have a hand chart and in that hand chart inside you have a PSP that is targeted specifically for the pods of that hand chart and that application and so on. Since it's targeted specifically for that application, it doesn't need to mutate because it knows already how the application behaves and the application has been written in a way that it will pass without mutating mutation from the PSP and everything will be secure. The other approach is having a cluster-wide PSP, which will apply to every resource, basically, every pod. Since it's cluster-wide, it's a catch-all, basically, and then it will mutate. That's another way of approaching this problem. So either you write PSPs specific for the application in a specific manner or you write a catch-all PSP that performs mutation. That's the mental model that we have at the end. Okay, after this explaining, it has been 10 minutes or so, why are they going away? Well, you can see that it's not as simple as it looks. First, they are not namespace, which means that you have to set up an airbag as you see fit for whatever namespace, for whatever service account and so on. That means that then you need custom airbags depending on your layout of the cluster and so on, which means that they cannot be enabled by default. Either you go with the app-specific PSP or cluster-wide, but even cluster-wide, it's specific to the cluster, not so good. You have the rules about prioritization and things like that, so you don't really know which PSPs apply where. You have the indirection layers of several airbags, which cluster-roll and cluster-roll binding is making this PSP apply and so on. They are not easily composable, because the prioritization model is alphabetical and you cannot stack them easily, and also they lack modes, so they enforce. Once you have the PSP running, it's going to reject things, but you cannot have them worn or audit about things, which is a pity because then you cannot retrofit PSPs easily to a cluster that has already workloads, so you can learn from it and then evolve it. So yeah, it's a bit confusing. And it was already confusing three years ago. So for example, in 2019 now people were already thinking, what do we do? Do we fix pod security policies or not? Well, now it's 2022, we know that they decided to not fix them and move on and find better ways. We know that pod security policies are deprecated and will be removed in Kubernetes 125. So what are the other options? We have two other options. The second option is another entry policy mechanism, which is called pod security. It's more minimalist, it has less moving parts than the PSPs and it's in a way composable with the third approach. So if pod security is not enough for you, you can always go with the third approach, which is an out of three policy mechanism. Normally the entry policy mechanism comes with the cluster, the missing controller inside of the Kubernetes cluster, but you can have the missing controller outside of the Kubernetes cluster installation and just run your own. That's the third approach and that uses the web of the ecosystem and you can use that third approach to work towards standardizing in a policy framework using a policy engine. My colleague Raúl will explain it later on. So for now, let's go with the second approach, pod security policy. What are they? How do they work? Well, pod security policies are just labels in the namespace resource. So it's pretty simple. They are defined in namespace resources, they apply to namespaces. That's analogous to the recommendation for PSPs, which is great, and then they provide three standards, instead of providing all the options for setting up what would we want or not, they bundle all those options in three different standards. Privilege, baseline and restricted. Those standards can be applied in three audit modes. Enforce, allow analogous to what PSPs were doing and then audit on worm, which are new and we will see. They are done in a way that it can be enabled by default, as we will see, and it's kind of like a step back. You can use plugins and an external admission controller, but it's a step back, which is nice and it has a trade-off. Okay, so they provide three standards, the privilege, baseline and restricted. How do they look like and what do they do? Here we have them. It's pretty simple. They are like cumulative policies. We have the privilege, which allows for everything. And then if you restrict more things, you have the baseline standard, which basically allows you to deploy a minimally configured pod. So imagine the minimal pod jammel that you need to write for a pod. With baseline, you can deploy it. And then you have restricted, which just follows the current pod best harnessing, hardening best practices. That's it. Which is great because then you don't need to think. And those standards can be applied in three different ways. Enforce, audit on worm. Enforce just makes the violations to be rejected. Audit. In the violations are allowed, which is different. And then they get locked. They go to the government as audit log, and that's it. So you have a log of what violations are happening. And worm, they are also allowed the violations. But instead of going to the government as audit log, a user-facing warning is printed. So you do keep control, run or apply, and then you get a warning there. One thing to have in mind is that the enforce only applies to pods. So you have the rest of workload resources such as deployments, jobs, replica sets and so on. You try to create one of those. It will succeed. And then once the deployment creates the pod, triggers the creation of the pod, that pod will not get created, which will make the deployment in a failed state. But that's it. It's something to have in mind. Here's an example of a pod security. Pretty simple. Here we have a namespace, kind namespace, and then we add some labels to that namespace. We are in the very enforcing baseline, for example, for minimal security. And we see that we have also the enforce version. This could be set to latest, but here we have 124 and basically follows what the community knows up until the Kubernetes version 124. And we can see the new thing, which is audit and worm, which here we are setting to restricted. So basically this is enforcing baseline, but it will bark to the audit log and worm to the user about restricted. So we can start learning about what security problems do we have, which is great. And we talked about setting it as default, no? How do we set defaults for pod security? Well, we configure the audition controller. Here we have the thing. We can set defaults privilege, for example, which is the least restrictive and it's the one that comes from default. And we also have exemptions. We can exempt things such as user names, resource names, namespaces, and so on. One thing to have in mind is that this has some food guns, because, for example, most pods are created by a controller. It responds to creating a worker resource, no? And a user creates a deployment and then the pods for the deployment will be created by the reconciler of Kubernetes. So those will not be exempt. Just have that in mind. The same happens for other things. Some fields are ignored, for example, .metadata or .spec.tolerations. You cannot just try to exempt them and so on. And also don't try to exempt controller service accounts, for example, QSystem, because then everything breaks. Okay. Now we know about pod securities and PSPs. How do we migrate from one another? Well, it's pretty simple. First, we see pod securities are fit for our case, and then we try to change the things in the PSPs, no? For them to apply on namespaces and not mutate things, and then we just perform the migration. For disabling pod security policies, maybe one needs to reconfigure the cluster to take off the admission controller and so on, but yeah, that depends on per cluster. So for example, here we'll have the PSP from before, and it actually matches pretty well with the restricted standard of pod security. The only thing that doesn't match is host ports, which then we will need to remove, which means that then maybe our workload, the health chart or the application will need to be rewritten, so it doesn't depend on having ports open. But that's one thing that we should have in mind. And this is what we will end up, if we were to try to migrate from the previous PSP. It's just setting restricted as everything, and that's it. We could also set it as defaults, if we want, for the whole cluster. Okay. So as a recap, pod securities, trade-offs. We have seen they are simple, and they are all or nothing. You cannot select some specifics, which is okay. They don't allow mutation. They are only for pods and workload resources in a way, so you cannot use them for ingresses and things like that. They are not context-aware, so they don't know what's happening on the rest of the cluster. They don't provide compliance audit checks, so you cannot check, for example, for signature or annotations in things and cool things that you could do in a different way. So the upside. The upside is that it's the same on any cluster that implements them, which then means they are easy to test. They are easy to communicate with suppliers, teams, and so on. The downside of pod security is that maybe it's unrealistic to assume that your entire workload fits in the restricted policy. I mean, at the end, you are a consumer of applications of Helm charts, and some of them cannot fit into the restricted policy. So then maybe you need a different way. And for that different way, my colleague Raul, which is going to talk about admission webhooks. Thank you, Victor. So what's Victor? As I've already mentioned, there are several fields in pod security policy that are not covered by pod security standards. And if you must enforce these options, you will need to supplement the pod security admission with an admission webhook. So let's see what's an admission webhook and how you can implement it. An admission webhook is basically a HTTP callback that receives an admission request, which is a JSON object. It will perform some evaluation based on this JSON object. It will either accept the request or reject it with an error message. If it rejects the request, the object will not be created or modified or deleted. So in order to create an admission webhook, you need to provide a HTTPS webhook server that will provide an endpoint for each webhook. There are two types of webhooks. You have the validating admission webhook which will just perform validation on the request and the mutating admission webhook which in addition can mutate the request. So let's see what's happening when you try to create an object on your quantitative cluster which will hopefully help us to understand how the admission control works. So let's imagine we are the user and we want to create a pod. Let's see what's happening inside of our Kubernetes cluster. First thing that happens is the authentication authorization. Kubernetes makes sure that you're authorized to create the pod. If that's the case, then it calls all the mutating webhooks one by one with the pod we want to create. It will pass the JSON object of the pod. The mutating admission webhooks they will perform some validation. They will either accept or reject it and they will also mutate the original JSON object. After the mutation, it will perform a schema validation and after it will call all the validating admission webhooks in parallel because they cannot mutate the request. It's safe to call them in parallel. If one of the validating admission webhooks fail, it will return an error to the user and it will say you cannot create a pod for whatever reason. If all the validations are successful, then the pod will be stored in HCD and that's it. Your pod will eventually be created. If you want to implement your own admission webhooks as we said previously, you need to implement a HTTPS server and it has to be HTTPS because it does not support HTTP. You need to create a certificate and then you will need to pass the certificate authority you used for the certificate to the webhook validating object and you need to provide an endpoint for each webhook. Also, you need to create the webhook configuration object which can be either validating or mutating. That seems like a lot of work but there are open source tools that help you with this process. They are called policy engines and today we are going to talk about Q-Warding. Q-Warding is an open source project. It belongs to the CNCF sandbox and Victor and I were both working in Q-Warding at the moment. Q-Warding integrates with Kubernetes by providing a set of custom resources. These custom resources simplify the process of enforcing policies on your cluster. These are implemented as WebAssembly modules and they are distributed inside the OCI registry so you could reuse the registry you are using for your containers to distribute the policies for Q-Warding. The policies are run inside of the Q-Warding policies ever and they are run in a dedicated endpoint and they are isolated from each other. Let's see how Q-Warding works with this diagram which will help us to understand how it works. These are the custom resources that Q-Warding provides which are the policy server and the cluster admission policy. You will also have the admission policy which is basically the same as a cluster admission policy but it is a scope to a namespace. You will use a cluster admission policy for cluster resources for cluster-wide resources for all namespaces. So what's happening every time you create a policy server the Q-Warding controller is continuously watching for new policy servers or for changing in existing ones and it will be in charge of creating the HTTPS WebWare server and it will instantiate the policies inside of this policy server in a dedicated endpoint that will happen every time you create an admission policy. Q-Warding will locate this admission policy inside this policy server so I will instantiate the wasa module and then I will expose an endpoint for this wasa module for this policy. Then it will also create the WebWare configuration for this policy and that's it every time you create an object it will call the policy server which will call the wasa module and then it will perform the evaluation based on the policy you chose in the wasa module. As we previously said all the policies are stored in OCI registry so yeah they will be pulled from the policy server you can also use a normal HTTPS server if you want. So let's talk about policies you can either create your own policies or you can reuse existing policies if you go to Artifact Hub it's an application that enables you to find, install and publish packages for CNCF projects and there is a kind it's called Q-Warding policies and you can find many Q-Warding policies there. There are some policies that comes from the Q-Warding organization as you can see we have here three of them which are called PSP if you see PSP in the name that means that's a PSP replacement policy because it's a verified policy that means that it was very far that it was uploaded by Q-Warding and it's also signed with the system so you could use a co-sign to verify the policies in nature. So yeah, we said that Q-Warding uses WebAssembly for creating the policies. Why? Why WebAssembly? First thing is because security just out of the box by using WebAssembly we have a lot of security by design it cannot escape the sandbox WebAssembly modules are running in a sandbox environment and they cannot escape from there. Also there are many programming language that compares to WebAssembly so you can write policies using your favorite programming language at this moment Q-Warding supports writing policies in grass to go assembly is great swift and we recently added support for C-Sharp and we will be adding more programming language in the future so you can use your existing knowledge you don't need to learn a new language. As we also mentioned before it's distributed in OCR registries which means that you can reuse your existing infrastructure you're using for your containers. As there are WebAssembly modules they can run outside of Kubernetes you can run it in WebAssembly runtime actually Q-Warding provides a tool that's called CTL which allows you to run a policy outside of Kubernetes which is very handy for testing your policies. Ok, so let's see a policy. Let's see how a policy looks like. This is an example of the replacement for the PSP capabilities. The first thing we see here is the module this is the WebAssembly module. As we can see here that's coming from an OCR registry for the GitHub container registry in this case then we have the rules. These are the configuration for the webbook configuration object. It will basically tell Kubernetes which resources we want to watch. In this case we want to watch all the bots for the creation and update operation. Then we mark it as mutating that means that we are going to create a mutating a validating webbook which will mutate the original request. And then we have the settings field which is specifically to this policy. Settings are different for each policy. All the other options are the same for all the policies. And in this case what we are saying is again we want to allow these two capabilities CHO and KIL. That means that if we add another capability that's not in this list it will fail. And we want to add this default capability CHO. So if you create a bot without any capability it will add this CHO capability. Okay so let's see let's see how this works. Let's see how it works in the Kubernetes cluster. So I have here in my class that you were there in install. Let's first create the policy. This is the same policy we were watching before. Let's wait for it to be active. Okay it's now active. So first let's explore a bit. So let's see the mutating web book configuration that Q-word and created behind the scenes. That we went as we mentioned previously. The Q-word and controller was watching for new policies. It saw the policy we just created and it created this mutating web book. As we can see here it's labeled as Q-word and that means that it was created by the Q-word and controller. Then we have the configuration here. This is the CAA bundle configuration. This is needed for establishing a secure HTTPS connection to the Q-word and policy server but don't worry this is all created by Q-word and behind the scenes so you don't have to really do anything with this. This is the path. This is the endpoint we were saying before. When we created this policy Q-word and controller created this configuration and also it added a new endpoint in the policy server with this path and it instantiated the Watson module inside this path. Here we also have the name and then the configuration for the rules we created with the policy. Let's see in action. Let's try to create a pod with the net admin capability which as you remember it was not in the list so that's why it was rejected. As you can see we have an error message here. Now let's create a pod. This pod didn't have any capability so that's fine it was accepted but if you remember this was a mutating policy which means that it will mutate the request and we added a default capability which was the CHO so now if we explore the pod here it is. We didn't specify this capability but as we marked this policy as mutating we added a default capability it is here and this will happen to all the pod that you create. This is how a normal policy works in Q-wording. Now let's move into PSP. We were saying that PSP are deprecated, you should be moving from PSPs. We also mentioned that in Artifa Hub you can have a PSP replacement policies so how you can migrate? There is a great tool built by the app via team that helps us with this process. This tool will fetch all the information from PSPs and it will transform it into policies not just for Q-wording it also supports all the policy engines such as Kyberno or Parade Keeper. We created a wrapper based on this tool which is very easy to run and will actually fetch all the information from PSPs and transform it into Q-wording policies. There is an excellent blog post that you can find the link here written in the description below. Let's actually see it in action because very easy as you will see. Let's see how. We have this PSP. This was the same PSP example that Victor was showing before. Let's see how we can transform this into Q-wording policies. We downloaded the script we were mentioning before. We run the script that's it. It fetches all the information in your cluster of the PSP and transform it into Q-wording policies. Now once all these policies are created we can safely remove the PSPs and that's it. Just one line. Here we are actually applying the output. You could save this into a jammel file and inspect the policies if you want to your cluster. You could also maybe modify the policies if you want to enable the monitor mode because Q-wording policies by default they are restrictive which means that they will reject pods if they violate any of the policies but you can also enable in monitor mode which will not reject any object but it will lock. It will monitor. It will lock that the failure will do that for testing purposes but that's it. Very easy as you have to just one line and you can move away from PSPs. Okay, so what are the benefits of Q-wording? First thing it's simple. You don't need to write a custom web browser but handle certificates then create a dedicated endpoint for each policy you want to create that's happening behind the scenes. It will save you a lot of time. Also, you can verify any resource. It's not restricted to pod. This is a sample resource but you can actually verify any resource you want. Also, there are many existing policies in Artifah Hub. Actually, the Q-wording team created a lot of policies based on what user usually needs but you can create your own policies, UCI, user case that's not covered by existing policies and you can actually publish to all the people who can benefit from this work. Hopefully also contributes to that. As we have seen, it's very, very easy to migrate from PSPs. Just run the script and that's it. So now that they are deprecated, you can migrate to Q-wording quite easily. You can write your own policy using your favorite programming language as we said. If you're using a language that compiles to WebAssembly, you can use it for creating a Q-wording policy. Also, we created a special policy which is called Verify Container Images that will verify container signature based if they are signed using Sisto. If you're interested about Sisto, you can watch our talk enforcing a secure supply chain in Kubernetes. We talk deeper about Sisto and how you can use Sisto and Q-wording to verify your to enforce a secure supply chain on your Kubernetes cluster. Okay, so PSPs are going away. We talked about pod security admission which are provided by the Kubernetes team. They are built inside of Kubernetes. They are great. They are very, very easy to use and you should be using them if you are not. And then about Q-wording. Can you use both at the same time? Yeah, of course. That's fine. What will happen is that you label a namespace with a pod security standard label and then you have a Q-wording policy. What will happen is this pod will be created if any of these policies validate successfully. I mean, if one of them fail, it doesn't matter if it's the pod security standard label or if it's the Q-wording policy, it will be rejected. But yeah, it's fine. Absolutely you can have both running at the same time. Actually, yeah, we will see an example. Let's see an example of how you can do that. So let's create a namespace called test. And now let's label it with the enforce baseline. As Victor previously mentioned, this will create the pod security standard baseline. And it's in enforce mode which means that it will react. And now let's create another mission policy. This is the verified image signature. This is the policy we were talking about, about SISTO. It will verify SISTO images for all the images that comes from Github, the Github content registry. And for this one, it will actually verify that the signature was created inside of my Github account and for this example. So we have for this namespace, we have the label, the pod security standard label, and we have an admission policy. So now let's wait for the policy to be active. Okay, so now we have both at the same time. So let's create a pod that will violate the first one, the pod security standard baseline, which is a previous content. So let's see what happens. Yeah, so as we can see here this violates the pod security baseline, which means that this pod was not created. And now let's create a pod that will violate the keyword and verify container image policy. And as we can see here we're using the unsigned image, which was not signed by SISTO, so it was rejected by the keyword and admission control. So as we have seen, you can use safely both at the same time in the same namespace. And now if we apply a pod that will not be rejected by any of them, that's fine. It was created and that's it. So yeah, you can use both safely by using keyword and pod security admission. So yeah, that's all we wanted to talk about today. If you want to write in both please reach out to us in Slack or in GitHub. Feel free to create an issue or contribution are always helpful and that's all. Thank you for listening and I hope you enjoy our talk.