 Hi, everyone. I'm Sam. I'm a customer engineer and I also work as a GKPM focusing on the deprecation of PSP and also migrating away from PSP to PSA so Hopefully this doesn't come as a surprise to anyone But pod security policy has been deprecated since 121 and is gone as of 125 no longer in the Kubernetes code tree I presume that's why you're all here So rest in peace pod security policy Quick show of hands. How many of you have had a chance to try out pod security admission? All right, great. That's more than I was expecting so For those who haven't We're not going to go into too much depth on pod security admission here, but I just want to give a really quick overview So pod security admission is a admission controller. It's built into Kubernetes and enabled by default On all clusters as of 123 stable in 125 Briefly the way pod security admission works is you choose one of three standards and those are applied on a per name space level The standards are privileged Which means anything goes this is kind of like not having it there at all baseline which allows the The default pod level Fields but doesn't let you escalate permission beyond that and then restricted adds on some additional requirements like running as non-route To enforce hardening best practices The standards are published on the Kubernetes website. So Check that out if you want to know more detail about what all is enforced in there And all links will be on the last slide as well. So you don't have to try and capture them as we go So for the talk agenda in a moment, I'll hand it over to Sam who's going to give a demo of kind of the fast path migration And what that might look like and then we'll talk about some of the problems That you can face doing the fast path some of the like challenges in that approach And then we'll see another demo of how you can do a safer migration Going a little slower and taking into account some of those problems Then we'll wrap up with extending Beyond pod security admission. So when when you need more control than what's given to you with that Yeah, so with that Yeah, all right ready for a demo in this demo. We're gonna show a quick and easy migration from PSP to PSA Ignoring the fact that there's mutating PSPs, which we'll cough that we will cover after the demo We're gonna cover what a mutating PSPS and why you might have to take special care for that for now We're gonna ignore that we're gonna ignore that we're gonna assume in this environment. There's no mutating PSPs active So First let's verify that this cluster is using PSP. It's active. It's preventing previous pods Which is the main use case for this demo So we're gonna ignore the other rules of this PSP policy the main thing I want you to look at is previous walls So this isn't you should prevent us from having any previous parts in this name in in in this namespace Now if you want to use PSP, how do you how do you assign it? You have to create a cluster role where you say I Will be able to use a pod security policy made my my PSP Which is the one that we just defined Then you have to create a cluster role binding or role binding where you assign that role to a specific For in this case. We're assigning that role to the service accounts in the default namespace and that's the UX for a PSP is Do you guys love that? Is it is it very convenient to create a PSP policy create a cluster role then assign it for a role binding? Doesn't seem super convenient. I will cover that. We'll cover it in a moment So that's how PSP works. Let's let's make sure it's working as expected. We're gonna try to deploy a Previous part of security context per is true. So what we expect is that this Should not work. It should prevent it should not be able to create a pot. Wait, it did It is created deployment Successfully, let's take a deeper look at whether it actually really worked or not We're gonna describe the deployment and see if it actually created pots looks like it it did not create desired replicas says That's kind of expected But is it really PSP doing that? We still don't really know like why is it not creating these replicas? We don't really know where can we find the error message describe pot describe deployment. It's it's a bit hard to troubleshoot Let's take a look at the event logs. Maybe it's there and then in the event logs We see error creating pots engine x privs is forbidden because pot security policy is unable to admit the pot It says prayers containers are not allowed. So PSP is doing its job It was a bit bit hard to figure out where it was PSP doing it, but we figured out PSP is working as expected We're gonna delete this PSP is working is expected now. We're gonna deploy a normal engine X up to application That's not using a previous Security context privs. This is a very standard and next deployment They should also in this case. We hope that the Replica account will be one and that's what we show in a moment Let's see. Yeah, we see that the this this time application deploy successfully because it's another privileged part So great everything work is expected, but now what we want to do we're using PSP There's an application running in production in the default namespace Now we want to be able to migrate from PSP to PSA The easy strategy is to just Try the different pot security standards in enforce mode in dry run mode So that's what we're gonna try and do we're gonna start with the most secure And then we're gonna see if we're gonna see if the dry run mode chose any warnings to see if the currently running pots Could be admitted by the restricted pot security standard. So that's what we're doing here Starting with restricted then we see Existing pots in namespace default violate the new security Level restricted so that means that the currently running pots could not be enforced by restricted restricted as to restrictive for our Applications running it in default namespace So next we had three we had three standards restricted Baseline and privilege so next we're gonna try baseline which might be a better fit for our currently running application We'll do the same thing we run it in dry run mode to see if it throws any warnings and This time there's no warning strong which means that the currently running pots could be admitted If I were to enforce the baseline pot security standard So next we're gonna gonna run almost the same command but the only thing that's different is we're not gonna add dry run and If you do this we have started enforcing the pot security standard Baseline that's all we need. We just add a label Before we had to create a PSP policy cluster or roll binding all we need to do right now is we add a label That's all we need for PSA. So that's it. I think that's an improvement in UX I think feels more easy than having to to create all these different resources so next We enabled PSA, but PSP is still active in this namespace Actually one more thing I wanted to add we're also gonna add there's different control modes for PSA We set it in enforce mode at the same time We can also tell PSA to turn on warn mode and at the moment. I will show you what warm warm mode does But for now, there's one more thing we got to do we have PSP still active in this namespace We also have PSA active we should disable PSP in this namespace as well And then our expectation is that the previous pots should still be Prevented from being admitted because the baseline profile does not allow purpose pots so Tim came actually up with a pretty clever way is a way to disable PSP on a per namespace level because in most cases PSP is enabled at a cluster, but how can how can you only disabled for a single namespace? Tim came out of a way you define a person's PSP that basically has no restrictions whatsoever By defining this person's PSP and then assigning it only in the namespace to the service that comes in the namespace We can effectively disable PSP only for a single namespace and This is also great because it also allows us to roll back. Let's say for some reason My PSP policy is still required. I need to quickly roll back this migration You can remove this person's PSP from this namespace and then you're back to your original PSP that was active in this namespace so Next the same flow as with PSP you have to create a cluster role To be able to use this new previous PSP and then we create a role binding where we assign This previous PSP to all the service accounts in a default namespace and after we do this PSP should no longer be active. Let's verify it. Is it really no longer active? So we're gonna verify that by deploying the same engine X privileged Pot deployment as you can see this one has the previous true again, and I'm gonna apply that and This time you see directly a warning. You remember in the previous example where I did it with PSP There was no warning throne. It just had deployment created But I had no clue that something was blocking my pots from being created with PSA However, if you if you edit that worn mode It's in addition to enforce mode You will get a warning throne even when you create a deployment not just when you create a pot directly Which is very helpful for user experience users. Otherwise, they're like, oh my deployment is working tried my web application What happened? He just don't know you got to start looking at events log. It's not a great user experience so This time the the warning is directly thrown and if we go to the if you look at the warning You see that it must would violate pot security baseline So this is not this is not a PSP error And if we look at the event log where we previously saw that PSP error methods this time we should be able to see a different error measures From from pot security instead of pot security policy. So let's take a look. Yeah Yeah, it says violates pot security baseline and in our previous error message with with PSP It's a pot security policy. It's a subtile. It's a small change But based on the error method we know that PSA is active and PSP is no longer active In this namespace so this was the the fast and I know it didn't seem super fast But this is the fast and easy migration where we ignore the effect of mutating psps And tim will cover more about what are the what are some of the gotchas of this approach and why this might not be the right approach for everyone So back to you tim to let me put it in All right, thanks for that demo So yeah, the um, there's a couple problems with this approach So the first is what happens if you don't have any pods that are representative of a workload that needs to run So this could be the case if you have maybe a controller that runs pods on demand Something like a cron job that only runs periodically Or maybe for you have some workload that scaled down to zero or just something that hasn't launched yet So in this case pod security admission gives you two tools Warn mode and audit mode We already saw a demo of how warn mode works giving a warning and feedback back to the user directly Audit mode is basically the same thing But it's going to add an annotation into the audit logs So this can be useful if you want to enable audit mode across all your namespaces Let it soak for a while like a week or however much time you have And then you can go back through your audit logs and see if there were workloads during that time That ran that would have violated the policy that would have been blocked by By the enforce mode So the second problem, which sam already alluded to is mutations. So pod security policy Can mutate pods. It has a bunch of different ways of defaulting various fields So if something isn't set on the pod directly pod security policy will set it for you This could be a problem because what happens if one of those fields is actually critical to the way the application is running Then suddenly disabling pod security policy could lead to a production outage So a quick quiz here's a bunch of fields from the pod security policy spec Which of these are mutating? All right, so if you said all of them you would be correct Not in all conditions, but at each one of these fields can mutate the pod And I recommend checking out this resource on the criminities documentation page This has a full list of all of the fields in pod security policy Which are mutating which are just validating And it also has a mapping of how those Translate into pod security admission So with that back to sam Yeah, so tim just explained that our previous approach might not work if you have mutating psps And I will show you why exactly it might not work The way you think it would work. So let's let's get started with this one. This is this is actually this is actually fun No, it is actually a real application that we're going to show And um fun thing will happen with it. So same pod security policy is in the previous demo But this time I want you to pay attention to these rules Now we just paid attention to tim's quiz Which of these are mutating? Or raise your hands if you think it's all of them Raise your hands if you think it's only It is ac Linux is ac Linux mutating Is Supplemental groups mutating All right, I see a few hands is renesh user mutating in this case You guys you guys you know your stuff. Huh, that's great. That's great. If it is fs group mutating I Some people didn't raise any hands, but I don't know where that's because you think none of them are mutating Or just you didn't want to raise your hand, which is which is a good reason to so In this case actually ac Linux and supplemental groups are not mutating Because it says ren as any so it's not as simple The the feud is mutating or not. It depends on whether the rule under it Um Has run as any or must run ass the most run ass are well can be mutating Actually, they don't have to be mutating either But can be mutating these will never be mutating because it's run as any This is why it's not that simple as you can see is my psp muting my parts It's actually not a simple question to answer So let's get let's get further with this. So we have the same Deployment it's an engine x app. It serves a very simple index html. We're going to deploy this PSP is it's active. I'm just ignoring that but we did the same thing We have the version from the psp policy cluster or world binding. It's all still active Um We have a simple config map that shows welcome to kubecon 2022 that's being served By this application and I want you to take special attention to Only the own only the owner can read and write this index html um And let's take a look at the so we had a deployment spec That had a pot spec what it should be running, right? We'll compare it to in a moment What i'm going to show you right now is i'm going to show you the pot spec of the currently running pot And there are there are many more stuff in there that get automatically populated, but I want you to take a look at Stuff that I didn't expect to be in there Suddenly my pot has a security context run as user of 2005 It has the container spec in the pot spec has that and then the pot spec as a security context for for fs group 2005 and where where is this coming from who did this who added this to my pot? That's the mutating pspp's so we see here the um PSP policy we had is run as user min 2005 So what psp does it takes the first value if you didn't set it yourself From this psp from the min value and then adds it to the pot as a security To the security context of the container spec and if for fs group It does the same thing but it adds as it to the security context of the pot spec So this is mutating psp. It's just adding fields that I mean you look at this psp policy Did you expect that to happen? I Some people might if you have done this before but if you've never done this I didn't expect this to happen the first time I use psp So that's that's mutating psp now. Let's take a look at The application that's working fine right now with our psp policy Yes, my port is working the application is running I did a simple port forward to the service as basically exposing this nginx application and I'm leaving it out So the application is running fine now. Let's migrate to to pot security and mission and then force the baseline um Pot security standard just like we did before we already know that we need to do baselines. I'm not going to do the whole dry runs We're going to do the same flow where we're going to disable psp by applying the previous psp And um We create a cluster role for the previous psp Then we create a role binding to assign the previous psp to the service accounts in the full namespace And then after we do that we're like, well, we got a new version of our application. Let's deploy that and see Should be should those should also work fine, right? Just we didn't change smart. We just moved to ps from psp to ps a should be a small change Let's see it restarted and what it means it recreates the pot. So the pot got recreated We're going to the same port forward to check our our application is doing And that should should be fine but no I get a 403 forbidden That might that might cause downtime for my users if that happened, right? Like it's suddenly suddenly my application is not behaving the way I expected to She's like, well what happened, right? We did just we just did a move from psp to ps a Why is this happening to my application? Um, let's take a look at the currently running pots back After we recreated the pot and and migrated from psp to ps a So we look at the at the the pot spec now Before there was this security contacts run as user under the container spec has gone now And the security contacts under the pot spec is no longer being said And the reason is because we had the mutating psp that was doing this for us But we moved off of psp and now it's no longer happening and the The way I code it specifically the engine x app is it only that user that that 2005 user ID has access to Access it file. So now we're getting a 403 forbidden because the engine x users running no longer can read the nx HTML So this is a somewhat realistic scenario there are other scenarios that you might encounter where the mutating psp was adding Fuse that your expertise your application expected So how do we fix it? We've got to quickly fix this So instead of relying on psp to add these fields, we can add them directly to our deployment spec So instead of relying them to be added, we simply add these fields directly to the deployment spec itself To this one is at the pot spec level and this one is at the container spec level and Then we apply and hopefully that will get our application back up and running Let's see It worked. Yes, we're we're we're back online So I think this demo the the main the main goal of showing this demo is that mutating psp Is something you'd have to take special care of like in some cases the mutating ps you might your application might depend on this so you have you have to account for it and I should be assured in the beginning right like how do we how do you figure out whether my psp is mutating my pots Are you going to go one by one? Look at all your individual pot specs and look at my deployment spec and see is there a difference in the security context? It seems that seems I don't know. I don't know if anyone any humor should not be response for doing this is my view. So We did write a tool for that PSP migrated it makes it a lot easier To detect whether your pots are being mutated by psp It basically does the thing for you. It will check your it will check the owner reference of a pot It will then check the owner reference Pots back with the actual running pot and see if there's a difference In the security context if there is then it's Highly likely it's being mutated by psp. So as a pros we took with psp migrator. It's the same deployment we're using for this it's I'm quickly going to demonstrate how psp migrator makes it a lot easier to detect psp and migrate off from psp to ps a So we have the same deployment same html web application That's serving and I just want to quickly demonstrate that we have Two It's still being mutated. You see this is back. So I reverted everything back. This this is a clean mutating psp is mutating in this deployment this pot again But this time we're going to use psp migrator to detect the mutation and migrate from psp to ps a So we're going to install it. It's a very simple coi too I would say it's still a work in progress, but it's ready for people to try So I encourage people to try it out early get some feedback and then we can keep on improving the two We're gonna Run a quick quick list of all the pots that are mutating and it it correctly detects that my that this pot is Mutated and is mutated by my psp There's another comment where you can run mutating psp my psp and then it will list Which feuds are mutating and which annotations are mutating and it correctly detected run as user and fs group are Potentially mutating feuds in your psp object Take a drink Talking two more so The next thing we have we have an interactive migrate comment of psp migrated will go over all your namespaces and then suggest was the most secure Pot security standard is able to admit all the running pots in your current namespace In addition, it has a safeguard where it first checks if your Pots are being mutated if they're being mutated it will tell you first Please first fix your pots back. So psp is no longer mutating. So let's take a look at the output of the comment We see that it's going to check if there are any pots being mutated And then it says the table below shows the pots that are mutated by a psp object And then run this command to give to get more details of how your pot is being mutated and by what? So let's run that And then we can see that In the output it will tell you that the security context is different the output is still something that that i'm working on It wasn't it wasn't super straightforward to to code up, but it also tells us the fs group is different than In 64. Yeah, this is straight from golang output And it also tells the the mutating feuds. So I now I know that I have to fix my deployment I'm gonna quickly fix that the same way we did it before We add those two feuds to the deployment Pots back And then afterwards I run the migrate command again And my expectation is that this time it should it's going to check for mutating psp But this time it will allow us to continue with the actual migration um So this time it checked if any pots are being mutated and it didn't find any and then it suggests using the baseline The same thing that we found out by doing a mineral process is it suggests using baseline in namespace default And then if you press and force It will automatically apply the label for you on the namespace and we can verify that quickly by running a Get naysayers default and get the yaml And that's how the pop the psp migrator allows you to Have an easier migration process and having to do all this by yourself It also you can also use it kind of as a library there There are a few calls that you can import so you can script around Um, like is my pot mutating it's actually not purely a coi to it. You can also use it as a library Yeah I think those were the yeah, those were the demos Um, I mean next to you tim you're going back to the slides the slides. Yeah All right. So uh, so far we've been talking about migrating from pod security policy to pod security admission Um, when we designed pod security admission our goal was really to have super simple out of the box security for kubernetes And so we made some Design compromises to make to really chase that simplicity and ease of use But there are some cases. It wasn't designed to cover every possible use case So some limitations of pod security admission The first is that it's using namespace labels to control it This makes it easy to apply It makes it easy to search across your cluster which namespaces are using which profiles But it also means that if users have edit access on the namespaces They can modify those labels and escalate permissions It also means that if you're creating new namespaces you need to make sure that those get labeled as well Probably the biggest limitation is the lack of customization. We'll take questions at the end Um The lack of customization so, uh, you have to choose one of these three profiles Um privileged, uh baseline or restricted If you want to customize those, uh, you'll need to use one of the alternatives that we'll talk about in a moment We already covered the lack of mutation. Um, that's just not something that pod security admission takes care of Um, and then finally, uh, we strongly advise against trying to subdivide Policy within a namespace But if that's something that you really need to do, uh, then you'll need some tool other than pod security admission So in the cases when pod security admission isn't uh enough for your use case There's a whole ecosystem of third party admission controllers The two that I call out here are open policy agent, which is, um, a really powerful Policy engine that lets you write, uh, policies in rego, which is a kind of policy dsl You can also use gatekeeper, which is a framework built on top of opa That adds some kind of kubernetes native features. It lets you, uh, template policies and apply those through crd's Um, caverno is another policy engine, uh, that's designed, um, to be, uh, for kubernetes It's a little less powerful than inflexible than opa But, uh, it's also can be simpler to understand the policies and to write the policies, um Because of that tradeoff Um, I'm personally really excited about cell admission. So cell is another policy dsl And we're actually going to be building this into the api server starting with kubernetes 126 Um, and so you'll be able to apply arbitrary policy through cell policies without having to run a separate webhook Um, unfortunately the talk that's referenced here already happened earlier today So if you didn't have a chance to catch that I definitely recommend checking out the recording when that's out You can also read the cap directly that'll be hopefully going to alpha in 126 And finally, uh, if you especially if you're already familiar with the kubernetes client libraries Um, it might be easier than you think to just write your own, uh, admission controller Um, if you're using client go, uh, I recommend using cube builder Which can automate a lot of the boilerplate associated with that Or you can also check out the pod security admission, uh, implementation. It actually ships with a webhook Now if you're using one of these alternative solutions That doesn't mean you can't use pod security admission. Um, it was actually designed to work really well with another solution at the same time So it's super lightweight Doing it this way gets you some defense in depth in case your webhook ever goes down or is accidentally deleted Um It can also minimize your custom functions So if you're relying on say the baseline policy and just adding a few additional constraints on top of that You don't necessarily need to Reimplement the whole baseline Pod security standard in your custom policy engine. You just need those additional constraints on top of it Um, and then finally because pod security standards and pod security admission is developed by the kubernetes community and built in Uh, there's sort of a guarantee that any new features that get added to kubernetes are going to be constrained By pod security admission, uh, a recent example of this was ephemeral containers If you had policies that were checking fields on containers, you might have been checking The containers field and the init containers field But when ephemeral containers launched now suddenly there's a third type of container that needs to have its policy checked Um, and yeah, so our recommendation is to apply the most secure Pod security standard you can with pod security admission. Uh, and then apply additional constraints on top of that So in conclusion, um, we recommend that you start early Take it slow Try and unblock your 125 upgrade before you really Need to get up to 125 We definitely recommend an incremental approach So sam demoed how you can use a privileged pod security policy bound to an individual namespace To migrate one namespace at a time. You don't need to do the whole cluster at once And remember to be aware of mutations and um scaled to zero workloads So that thank you. Um, here's all the links that were referenced in the talk and a few other resources Um, I'm happy to take questions. Uh, if there's time If I understood what I saw there was a clever warning that you've admitted when you tried to create I think it was a deployment And it caught on to the fact that the pod spec template would later cost problem Did you have any ideas for how, um, custom Custom resources that say also could create pods could tie into that kind of mechanism Yeah, so we talked about this when we were, um, in case anyone didn't hear the question The question is, um, if you have a custom resource that's controlling pod deployments How can you have a similar warning or audit mechanism on that? Um We talked about building something into pod security for this. Um, we didn't Ultimately decide to go that way But the but pod security admission is designed as a library. So if you take a look at the code there Presumably if you have a crd that's doing this, you also have a custom controller that's monitoring those And so it should be pretty easy to extend that code if you want to Add a custom type in like a webhook that's working on that We probably have time for one question Hi, uh, are there plans to, uh, build in custom, uh, security levels into pod security admission? Or is that just find another solution? Um, there are no plans for that. Uh Well, let me rephrase that I sell admission. I think is really our our plan for that. Um, that lets you That will let you build custom security policies. Um, but also a lot more anything that you would do with admission Uh controlled today aside from like maybe some stateful things, but we really think that will cover Uh, most of if not all of the use cases beyond Okay, thanks All right, that is the time we have thank you for attending and please find the speakers after the session for more questions