 All right we're going to go ahead and get started. I'd like to thank everyone who's joining us today. Welcome to today's CNCF webinar ensuring compliance without sacrificing development agility and operational independence in Kubernetes with OPA gatekeeper. I'm Karen Chu, Community Program Manager at Microsoft and CNCF Ambassador. I'll be moderating today's webinar and we'd like to welcome our presenters today. My name is Sarah Taj Erzer-John, Software Engineer at Microsoft and Lockheat Evenson, Principal Program Manager at Microsoft. Just a few housekeeping items before we get started. During the webinar you will not be able to talk as an intensity. There is a Q&A box at the bottom of your screen. Please feel free to drop in your questions there and we'll get through as many as we can at the end. This is an official webinar of the CNCF and as such is subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that would be in violation of the Code of Conduct. Basically, please be respectful to all of your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the CNCF webinar page at cncf.io slash webinars. And with that, I will hand it over to Sarah Taj and Lockheat to kick off today's presentation. Wonderful. Thank you very much Karen. Hello everybody and welcome. You are in for a real treat today. Today we're talking about OPA gatekeeper and for those of you who've never heard of OPA or gatekeeper, we're going to walk you through everything. Everything you need to know about policy and governance for your Kubernetes cluster. So today we're going to go on a journey telling you what OPA gatekeeper does, sharing how you would use it, walking through a real world scenario from end to end. Both personas, both admins and developers alike, share some demos with you and then tell you how you can get connected into OPA gatekeeper. Now it's worth noting that OPA gatekeeper is a sub project of open policy. So when I say OPA, I'm speaking about open policy agent, which is a CNCF project, and gatekeeper is specifically about how we implement OPA into Kubernetes using the Kubernetes API and CRD. So we're going to go through the whole thing, how you can create policy and how you can make sure that all your Kubernetes resources are in compliance to that policy. You can see the link there down the bottom on GitHub. So if you're interested in following along, everything we're going to be sharing today is available at that GitHub link. So we're excited to get started here. First, next slide, Sarah Taj. First, we're going to share what gatekeeper actually is under the hood. Gatekeeper is, as it states there, a customizable Kubernetes admission webhook. So if you're not familiar with the mission webhooks, you can actually create pieces of software. And for each request that comes into the Kubernetes API, you can send them over to this webhook to make a decision to admit that request or deny that request. So we can actually create policy behind that back by OPA and decide on whether we should admit or deny each request that comes into the Kubernetes API. And why would we want to do this? It's to enforce policies and strengthen governance. So we're going to dig into exactly what that means and how it affects you. Next slide. So really, we want to understand why people would be interested in using gatekeeper and the problems it tries to solve. So if you've been using Kubernetes and operating Kubernetes, you've probably been looking at ways for end users to actually have a great experience and control what end users can do. Whether they're creating any types of resources, you want a way to actually understand and create policy and say whether those resources can be created. For example, can you label deployments a specific way? Can you create specific objects with specific specifications in those objects? So at object creation time in the Kubernetes cluster, we can actually control what we would admit based on some criteria there that we can define. So we can create policy here to meet governance and legal requirements and also just to enforce best practices. For example, I've seen some policies out there to say, hey, you're using a deprecated Kubernetes API. You could send a message back to your user so that they can be enticed to move to a stable API. That's just an example. We're going to walk through some other examples of how you might actually use gatekeeper. So let's get into a real world example here. Now, the way we're going to break this example down is we're going to do it in the perspective of two different personas in a fictitious company called Agile Bank. So of course Agile Bank is building the greatest P2P money transfer ever created, but more to the point is they're in a highly regulated industry. So they need to ensure that their cluster resources on a Kubernetes cluster are compliant with some governance that they've defined for those resources. Now, I like the last point here, both admins and developers are unhappy. Let's see how we can use gatekeeper to make both of those personas happy and how we can actually not get in the way of the great experience Kubernetes has. So first of all, I'm going to be wearing the admin hat. I'm going to pass it to Sertas to wear the developer hat. But let's take a look at the admin persona specifically. So I'm an admin. I can't keep up with infrastructure changes. People need new resources. People want access to different types of things. Persistent volumes, secrets, different back ends, different networks, load balances, for example. Do I want to allow access to external load balances? So that's really hard for me to keep up with when the idea of Kubernetes is that an admin does not necessarily get in the way of the developers doing their jobs. So how can I actually create a system that allows policy to be expressed without me and my team needing to get in the way of that? So we're spending at the moment too much time on understanding whether things are compliant or not compliant. It's manual labor. I need to audit all the resources and look for anomalies that don't meet compliance. Everybody keeps making the same mistakes. So we actually need everything labeled a very specific way so that is compliant. But some development teams aren't labeling things the way that we need them to. So can we take an action on that? And figuring out what resources belong to what groups is hard. Does this sound familiar? Can we actually solve this problem with Gatekeeper and make my life and my team's life much easier? Now we're going to pass it over to the developer persona, Satash. I am the developer here and I cannot make infrastructure changes. I have the app. I am ready to go. I know exactly what I want. I want to deploy and test my app. I want to deploy to production, but I don't have any permissions to do it because I need to wait for admin to give me access to the resources. I keep waiting for admin to make these changes and sometimes it takes a long time. Sometimes it's really long, so I just keep waiting for these changes to happen. And then when I do something, it's hard to know if that changes conformance. Sometimes these conformancy changes, so it's really hard to keep track of what changes over time. While I just want to focus on the app, I want to focus on my code. I also need to know about all these conformancy changes that happens at agile back all the time. These changes are proposed, rejected, updated, reproposed, so it's very hard to keep up with all over time. And then turnaround time is at least a day, so it takes a long time to keep up with all the changes that happens. Okay, before we go into the user requirements, I see some great questions that I would love to answer live. So I'm going to take a moment. Please keep the questions coming. We are here to answer those questions, and thank you for asking them. First question is, will the demo code samples be made available afterwards? Absolutely. So if you go to the gatekeeper repository on the first slide that we had there, which is open policy agent gatekeeper, there is a directory called demo. In that directory is a script where we are going to run through the demo later in this webinar, and you can recreate it on your own afterwards. So you should be able to get access to the complete demo, so don't worry about typing or taking screenshots. Also, this will be recorded, and we have recordings of the demos as well, so hopefully that answers your question. Thank you very much. So all the demos will be under demo at Agile Bank, and then there's a script called demo. So if you run it, you'll get the whole demo experience that we'll go and show later. Thank you, Sertaj. Second question, I'm going to take a stab at, at some point it would be good to understand the difference between OPA gatekeeper and Keverno, which is a similar project. Unfortunately, I don't have all the knowledge of Keverno. I've had heard of it, so I can't speak authoritatively on the differences. Sertaj, do you know? It's the same for me. I only know like very high level. Yeah, so unfortunately we can't speak to it. Hopefully we will give you enough of the equation today to answer what gatekeeper does and how it operates. And then we can take a look at Keverno. Thank you for asking that question. The last question is, does OPA allow specific cluster role allowed or not to a cluster and by who? Is there a GitHub repository to give useful rules to be set inside a GitHub inside a Kubernetes cluster? So there are two parts of this question. I'm going to answer the second part. We have links later in this where we have policy libraries and example policies that you can apply to a Kubernetes cluster for the most common use cases. You're also free to write your own policies and PR them up as well. But we have covered many things that we'll cover in this webinar. The first part of that question, can you answer that, Sertaj? It is, does OPA allow specific cluster role allowed or not to a cluster and by who? So using gatekeeper, you can set up rules to be able to allow certain things. It could be like images, it could be labels, it could be anything you want basically in those things. We are going to show some of the libraries in this webinar. So you'll be able to try those out and then see if those work for you or not. And then we also have these part security policy equivalent sort of policies that you can try out and then see if they work for your use case. If not, you can always add onto it or develop your own. Excellent. Well, thank you very much for those questions. Feel free to keep them coming. And as we see them come in, we'll take a moment to have them answered. Back to user requirements, Sertaj. Okay. Slow down, slow down. Okay, there we are. Okay, so we're at user requirements. So remember, we're going through the requirements of Azure Bank from the perspective of the admin and the developer. So now what we're going to do is just codify exactly what those requirements are. So we want to free up admins time, allow them to have audit and enforcement and have that automated for them. Make sure they have common best practices. So we just had a question about what are common best practices will show you how to enforce common best practices for Kubernetes clusters. And all resources have a key clear owner. So that's just for our administrative team, our operational team to understand who owns what resources. And from the developer side, we want to unblock developers so admins are no longer standing in the way of all the changes they need to make to their specific application. So self-service is no longer a risk to conformance. And fail fast means that developers, we can actually implement ways to give developers instant feedback. Okay, so from here, let's take a look at some specific governance policies that we're going to create. So here are some policies that Agile Bank would like us to create. All namespaces must have a label that points to a point of contact. All pods must have an upper bound for resource usage. Everybody knows having no resource usage on your pods can cause all kinds of fun. This is a great way to actually make sure that everything has a resource limit. All images must be from an approved repository. So your container repository, you have an internal corporate container repository. How do you make sure that everything deployed to this Kubernetes cluster only comes from that repository? Super important. I hear that one very often. Services must have globally unique selectors. Have you ever typoed a label selector on your service and actually pointed to the wrong service on the back end? Also very common. I know I've done it. So I would love to have a way to make sure that label selectors are globally unique and ensure all ingress host names are unique. It's also a fun one if you reuse ingress host names unbeknownst to somebody else in some other namespace. You can have weird and wonderful effects on your ingress controller. So you can actually enforce that out of the gate and make sure that all your ingresses are globally unique. Okay. So we're going to dive into it now and get straight into constraints and constraint properties here. I will flick it to Sotage. So we want to define constraint properties as things we want to end together. So in the sense that we want to express intent by ending them together, we only make the cluster more constrained. And then removing can only loosen these constraints. So there should be no weird interaction between adding one constraint that something happens over there. We only want to make it more constrained by adding more constraints and then removing can only loosen them. And then if, since these are all ended together, one rejection should end up being the whole request being rejected. So in this way, we can keep track of like, hey, we just want to end these together. And then if any of them is not true, then we request, then we reject the request. And then we also want to define a schema where we can write constraints that gives the intent of like, hey, this is the, this is the name that we want. Or this is the reg X that we are looking for. In this way, it's going to be less error prone. In this slide, we are going to see an example of an constraint. So this one is API type constraints, gatekeeper.sh. And then it is kind of case required label. So this is the agile banks, all namespaces must have an owner use case. So in this case, we are just looking for the namespaces kinds. Since namespaces are in the core API group, we just leave that one empty. And then if you want the deployment, for example, it will be unlike the apps API group. Or if you have, if you're watching any, if you want to match anything, any other resources that you would just define as part of the, the kinds here. And then this is the, the parameters that we see. So these are the intense. So the, the, the message is, is a nicer message that user gets when their request is rejected. So all namespaces must have an owner label that points to your company username. So this way you failed, but at the same time you, you, you get a message that's saying why you failed. So this is like a nicer way of knowing like, hey, I need to do these actions. And then here is the, for the agile banks, as you use case, here is the allowregics that users can add their user names or whatever. So if you're, you will put your name there.agilebank.demo in this case for the accepted user name. So this way when somebody looks at the namespace, they would know, hey, this person from this, this department created this namespace. So this way we can track who created what over time. So Tash, can you just go back a second? I'd like to ask you a few questions. Is that okay? Yeah, of course. So when we think about constraints, this is a new word to me. How should I understand constraints? Is it, are they essentially a policy or? So constraints are, it's basically, so we are going to, in later, we are going to see constraint templates. So in constraint templates, you define the logic of what happens of like, it's the rego code that executes when you want to, when you want to run some of the policy engine. So then constraints are basically constraints, give some of the parameters to those, to those templates. So if you want to have certain restrictions, like I mentioned before, they're ended together. So if you want to keep adding more stuff, so you would do so with constraints. For example, you will be able to say, hey, I only want this reg X to be allowed for allowed images, for example. So you can only pull from my company's private private registry for all images, or you can only pull from this other registry also that I am using for something else. Okay, so I see that this is of kind, Kubernetes required labels. So I would expect to see a constraint template, which is a CRD for called Kubernetes required labels. Okay, understand. And those parameters would be inputs into the schema into that. That's right. We're going to see the templates at the end also. So this is basically how you give parameters to that template. So if you want to your message, okay, excellent. Do you, this is how you do it. Excellent. Okay, thank you. Do we have a moment to pass a couple of questions? So I will read the first question is gatekeeper a validation web book. Can it also be a mutation web book to satisfy constraints that are not validating my book. We don't support mutating web books right now, but that's something you're discussing. Okay, so mutation is on the roadmap. Second question. It seems that there's been some months since I heard OPAGK will be made GA. Are we in the same status? If yes, what are the main blockers to go GA? So we are on the road to GA. We have a defined list of things that we want to accomplish. And we are getting there. And I think so, so basically you in our issues this if you go and then look for the labels blocking GA, you'll see exactly what needs to be done. So today we want to define HA, like high availability. But just in case, so you want to have the, at least like the definition of high availability. And then some other items that if you go to the issues, you'll see what needs to be done. So it's one example is, for example, cash warming. For example, when when gatekeeper starts, it should look at the existing constraints and constraint templates in the system and then put those into gatekeeper in OPA and then start serving after. These are all processed and then have the ready check after all of these are processed. Okay, two more questions. Do you think that OPA, that with OPA we can manage to obtain a multi-tenancy Kubernetes secure cluster? I would, I would have a, I think this can be used as a piece of the puzzle. Security has many layers. This can be one of them to enforce how resources are created and how they're an allowed access. So it would be one piece. I wouldn't go as far to say that if you implement this, you would have a multi-tenant secure Kubernetes cluster only because multiple layers, even down to container run times authorization. There are many pieces to that puzzle. But this is can be used to complement your security model in Kubernetes. And this one I think I can answer, but I'll read it out for Sirtaz. If OPA gatekeeper is down, does it block active deployments? Does it run in HA mode? So as Sirtaz was saying, the definition of HA is one of the blockers to get B3 to GA. So at the moment, it depends on how you configure your validating web hook configuration. You can either have it fail open or fail close, meaning that if gatekeeper is down, will you allow the request or will you block it? And that is part of your validating web hook configuration on the Kubernetes cluster. So you can do it in both ways. But obviously we want it to always run, which is why we would like to have a HA option. Yeah, so currently we are failing open, which means if the gatekeeper is down, it is going to allow requests. But we want to get to fail close. So that's the ultimate end goal. But for that to happen, we need HA. Thanks Sirtaz. Please continue. So the next topic is basically audit. So audits, we want to look at the resources in the cluster and then periodically evaluate how these are doing against constraints, if these are in violation, or if they are in compliance. So, and in this one is good so developers or admins can look at the cluster states and then the compliancy of the resources that are running in the cluster, as it happens basically, and then you can always take action against these audit results. So these audits are exposed via the status field of the constraint. We're going to see in the next slide. So basically, this is like a really nice way to just look at the state of the cluster. And then a recent change in the gatekeeper is we are now allowing the gatekeeper audit as a separate deployment. So, for example, you don't have to have the validating webhook deployed to run audit. So this is like a very, like, as for use case, because some people, they just want to start with audit and then see what are the compliance without actually deploying the webhook and then seeing rejections. So this is like a really good way to see what are the stuff that is in compliant or not compliant in your cluster. So this is basically the constraint we just deployed earlier that we just saw earlier. So in the status field, like I mentioned, you'll see the audit timestamp. So this is the last time that audit happened. And then under violations, you'll see that what namespaces are in violation of this rule. So in this case, we have default gatekeeper system to public system. And then because of resource constraints in the cluster, we are keeping these violations limited by default, I believe it's 20, but you can always increase it. So since these are limited, we also added a total violations field. So these will be truncated at some point unless you change that setting. And then the total violations will always list whatever violations you have while the violations list is truncated. Any questions about this one? I'll keep going. Okay, cool. Yeah, like I mentioned earlier, we want to test these without enforcing them. And then this is where the dry run comes in. So just, just like audits. I mean, this is part of audit. You'll see the violations in the status, but in this case, they are not actually enforced, but they're only seen in the cluster as violations. So similarly, if you add a enforcement action dry run, you'll get into the dry run mode for that constraint. By default enforcement action is deny. So if you didn't specify enforcement action, that would be deny. If you did specify deny, that would be the default behavior, or you could specify dry run. So in this case, the violations you see are for dry run. So they, they're visible in the audit results, but they're not enforced with the web book. Yeah, I just want everybody to take a moment to understand audit and dry run specifically. So I think the use case here is, I have a cluster with a lot of resources and I want to bring that cluster into compliance. So I haven't installed a brand new cluster or created a brand new cluster and put gatekeeper on it. I have a lot of resources. I want to bring them into compliance. So with dry run and audit, it allows you to actually get visibility and pass through that loop of all your policies and see what's out of compliance and then take action to fix things and you would eventually have this status field empty once everything is in compliance. Now I want to pass that answer back through a question we have here. Is it possible to crash or block your Kubernetes cluster if you have a bad rule? It is absolutely possible to do that, which is why we create a dry run and audit to allow you to actually say I want to test this policy and see if it's catching the things that I want it to catch and then clean them up without having the risk of actually blocking admission to the cluster. So that's one question. Keep going Satash. And then one of the other agile banks scenarios is we want to enforce global unique in resource names. For the handling uniqueness, this is an interesting use case because we want to define some constraints that compare things to each other and then to make sure that they are unique. And then some of these constraints are impossible to write without access to to a state more than just the object under test so because we want to see the other objects in the cluster. And then by default audit will request each resource from the communities API server so it uses the discovery client to check with the communities API server in each cycle of the audit. But, and then we have a flag called audit from from cash equals true. So if you set this flag, the source of truth will be the opera cash. So, so PA will have the cash. So in and is defined by this config object on the right you'll see here. So it'll basically cash things like in this case service pod name space, you could have like ingress or service and insert sorry here in ingress for example here whatever things you want. So in if you want to compare objects against each other things like this you need this case. So you would need data replication and the conflict object by but by default, it will use the communities API server so you would not need this, unless you are handling uniqueness cases. And then we talked about constraint templates earlier. So, Rego is the is the language for OPA. And then this this this the constraint templates contains the regular old signature. So it basically contains all the logic of what happens when the gatekeeper executes these constraints. And if the rule matches the constraint is violated. And then we also talked about the schema for the constraints and then this is where the schema is defined. This is a little small hopefully you can see this one. This for the same example for the case required labels, you'll see the schema defined here and then this is the exact one we looked at earlier. So this is where you define your schema while the constraint contains your parameters. So in this case we are defining things like the message which is of type string or allowed regs which is also of type string. And this is the schema of the constraint. And then under the targets in the in the Rego. This is where the the Rego like the logic code so if this this this matches, then you get the deny message basically. And we also saw like the message earlier this is that that's the message that that you would get get returned. So if this executes and if it's true and you get that you get a deny basically. So and then this is where you define your different logic. And then we're going to see on the library you can take a look at it. And then we have some of the like more common use cases and then so you can always get started with those things. So you don't have to come up with all these yours yourself but in the library we have most of the common use cases. And then you can also define things like helper libraries or anything in here also. We want to talk about metrics also so one of the later additions to a gatekeeper is the inclusion of metrics. So currently we are supporting Prometheus as a back end for metrics. So we are, we are metrics for things like violations per enforcement action. And this is the one you see here. So you would see like how many how many denies if you seen like how many dry runs of these violations. So you could track overtime what happens in your cluster. But if you're going to see a demo of this also in a little bit total number of constraint templates and constraints. When was the last audits. How long did the audit take and there's a lot more metrics in this case. Any questions so far. Continue will take questions in a moment. So I think one of the one of the biggest values that gatekeeper brings to OPA is allows for code reuse. So if you've worked with open policy agent building your own rego policies to actually do the thing that you intend to do can be complex and you need to understand it. But one of the things gatekeeper brings is a structured schematized version of that by constraint templates and constraints using Kubernetes. So that you can actually share these things around put them in your CI CD pipelines test them make assertions on the schema the types. So you can actually build some really good safeguards using Kubernetes APIs around what you're putting into that rego which is the native language policy language of OPA. So the good thing about constraints and constraint templates and gatekeeper is we can build policy libraries and share them and bring your own parameters which is absolutely fantastic. So what I'm going to do is take a moment to answer a few questions here and then we're going to move into a timer demos. So a couple of questions. And this is one that's probably interesting to you. Interesting that the violations are kept in the custom resource for the rule. How about generating events for the concerned resources so that it also shows up on the violating resources. Does this make sense. So I'm guessing that you would annotate in the status field of a deployment if it was in violation is what the question is. Yeah so you're right right now we have it under this status field of the of the constraint. I think the question was what events you don't have. I don't believe you have events. But yeah so they're currently under this under the status. I think that's interesting at the moment we wanted to coalesce them into one place around the policy so you didn't have to go and look in many places. But that's that's an interesting thing. Feel free to raise an issue if you want to discuss that further. I think there's one more related the violations can either be viewed in the constraint status or sent to logs. Is there an alerting integrations. I send the violations to slack or how can we implement with any community tools. So do we have an eventing system. You just said no. So you can always define these two metrics. For example if you could set up Prometheus to set up alerts to whatever platform you want. Okay so alert manager via the metrics in Prometheus is what you can do today. How much performance impact can I expect. What if I have hundreds or thousands of policies. It's really hard to say like hey this is this much is what you can expect but the more constraints you have the more constraint templates you have. And then also depending on the audit limits. The more performance impacts you see. And then you should definitely test for your use case. It's really hard to say hey this is the value. I have done some testing in the past and these are also published. So you can take a look at it. I can also put that link somewhere. These are great questions keep them coming in the same vein how scalable is the cash if I have a thousand plus services a thousand plus ingress is in this in the cluster to keep them for unique names will this be a problem for OPA. Yeah it definitely the more resources you have the more like memory it takes. If you are using the OPA cash. So yeah I mean that's definitely a concern if you have a lot of stuff. And then you're talking about like me. Like tens of thousands or something like that that's just like it becomes a lot of memory. We're going to build some benchmarks right so for guidance about how much memory and so we're working on that that is something that they're interested in building. So the short answer is we don't know the long answer is we're going to test. Yeah. I did some load testing before I can I can share the results. Okay can you. Just a few more questions here go to the next slide please. So we have some demos they're going to the prerecorded we're going to walk through them now can you just pull up the first demo and I'm going to quickly answer just a few more questions. First one is is there any existing repository hub for constraint templates that has rego generic policies. Yes. There's a link in the deck a little bit later so we will you'll see that but there are some policies already there to go. Can metrics be sent to anything other than Prometheus like data dog not currently only through Prometheus and then maybe you could use alert manager to fire something to data dog. In our roadmap we also want to implement open census agent so you can specify whatever the back end. Yeah. So that's definitely an issue we have. So if you like to contribute that would be like amazing. In the future gatekeeper could evaluate service to service policies. I think absolutely there's a flexibility in open to be able to do that. You could absolutely do that on custom resources. Let's say you're running a service mesh. You could send them over and do validation absolutely. How is com test different from OPA GK. So I will have a stab at this and let me know so it tells you if I miss anything com test test rego policies so raw rego policies there was some talk to implement it to make it understand constraint templates and constraints. So OPA GK is using and leveraging the Kubernetes APIs and custom resources and custom resource definitions to schematize your policy and parameterize the rego. Whereas com test is just taking raw rego and then passing it through a set of resources. So if the world had the same effect. I would say the way to think about OPA GK is a Kubernetes native implementation leveraging the Kubernetes APIs to make policy authoring and sharing a little bit easier. Anything you want to add so touch. Okay and one more is the deployment model always to co locate OPA gatekeeper in the same cluster as being controlled. Please describe more about the target model, including HA remote deployment one to many clusters. I don't I don't believe I've tested with different clusters but as long as like if there's like a validating webhook that can target should work, but it's not something that I've tested before. Yeah it's certainly doable. I would just caution obviously your failure. Failure scenarios when you use a webhook that points off cluster or to another cluster obviously you need to really understand the paths there to understand how it's going to operate if it fails. But we've been co locating it and testing with co location currently. Okay, so thank you for all those questions. This first demo we're actually going to step through those policies that we created for agile bank and show you how they look for not only an admin but for a developer so kick it off. This great demo. Yeah. I was just going to say there's nothing magic behind the scenes. So right now we're just creating a kind cluster. So for those that that are not familiar with kind it stands for communities in Docker. So it basically creates like a local communities cluster in your machine. Yeah, absolutely. So you can install gatekeeper on any Kubernetes cluster, right, whether that's local using kind or whether that's a cloud provider or something that you are running on prem. Okay, so we're just going to get pods. I think we're going to run through the install right now of gatekeeper and and get it on to the cluster. Okay, so we're just applying the gatekeeper set of manifest which creates and sets up gatekeeper and installs it on this cluster. So we can see that that has happened. Grab the cluster information here so that we can test it. There we go. So we can see now there in the gates keep your system namespace we have a pod called gatekeeper controller manager. So now where the developer go ahead. So I'm the developer I'm creating this advanced transaction system namespace qtl create and it's just right at. And then. So five three scope I moved to another different projects, and then everybody focus about advanced transaction system. And then lucky. So I want to understand as an admin here. Who created this namespace and how can I trace it back to the person that created all the team that created it. Okay, so before I delete it. I have I can go and create a policy to make sure that I make sure that this never happens again. So here we have some constraint templates that I've just applied to the cluster. The next thing one is required labels for this specific demo. Okay, we're going to have a look at those constraints. And we're going to apply the constraints which provide the parameters to those constraint templates. And as we can see here that all must have all namespaces must have an owner. And that which means that there must be a key owner with a value that meets that rejects for it to be allowed. Okay, so we've applied those constraints back to the developer. And then now I'm going to try to create a production namespace because I can because I'm a developer. Oh wait, I got the error from server. So it's all made all namespaces must have an owner label. But I don't have it because I just created a try to create the namespace production directly in QCTL await. And here's what a good resource looks like. So I'm including this owner label into my production namespace awaited created. That's nice. So now anybody can identify it. And then I want to deploy some pods with that I define no limits. And then I get another error from server, same like that my my pod doesn't have specific resource limits. So I should maybe I should specify some resource limits. So communities would know how to take care of. So maybe I specified too high. That's not good right because it's going to take a lot of memory and then a lot of the space. I deployed from my my personal repo, for example, that's not allowed because I should be using the companies allowed private registry for example. And then here then I want to deploy I change it to allow the reports to open policy agent in this case. And then as you can see, the error messages are like in instructable I mean actionable. And then I deploy it immediately based on the the error. So and then when I deployed the the OPA wanted. It proceeded as is because if that was allowed. And then here is a duplicate service. So I deployed my service. And then until the big outage happens, then lucky right so now we're looking through the audit use case how do I actually look at resources that already exists that are non compliant with that policy. So we're going to go check the audits because we have some pods without resource limits how do we actually look at that. Of course we take a look at the case container limits. We have a resource and we can go and take a look at specific in the status section. We can see the pods that don't have any limits so you can see down here in status. We have a bunch of pods that do not have resource limits. So then I can take it upon myself to go and action them and apply resource limits. Again super simple I've defined the policy and then I'm leveraging audit to go and remediate. Okay, so we've rolled out this new policy to production. I can now guarantee that everything is in compliance. We also want to make sure that all Ingress names are unique. So this is a common one that causes a lot of failures. Now we're taking a look at introducing new policies can be dangerous. We had a question about this earlier. How do we gain how do we gain confidence that a policy that we've defined is actually doing what we need and doesn't break our Kubernetes cluster or bring down the attire stack. So here we can actually create a constraint in dry run mode so that it doesn't enforce and we can see that if there are any conflicting Ingress host names. Okay. All right, so we're going to apply that dry run template and then we can obviously use audit. In this case we're looking at the dry run template constraint template and this is it uses the cash to confirm if anything names overlap. And there's the constraint. We get ingress as we have to ingress is with the same host name you can see there. And now if we get the constraint we should be able to see. So there we can see that we do have some that conflict. I'm going to create a third Ingress to show that dry run does indeed not block any creates with the same host name. And since since we are doing this in dry run mode it does not block block. Yes. So as you can see there here is another ingress host to with the same host name as the other. And that's being allowed. It should show up in the order to say that we indeed have a that example. Okay so soup to nuts that is the the whole flow from admin to developer. So I'm going to show you what the metrics look like. So Taj, which is the second and I go ahead. This is the second demo for the metrics. So this is the Prometheus dashboard. So we're going to see the we're just doing this locally from the from the from the kind cluster. So these are the same metrics that that happened during the demo basically I just recorded it and then I had the Prometheus running. So you'll see, in this case, we had 33 deny violations and then 24 dry run violations and you can also have another. You can also define your own sort of like enforcement action. But then that one is categorized is on on recognize and then we didn't have any of this. So that's zero. And then you can graph it. So you and then over time, you can see like how much changed and how many dry run changes you had. So you could have alerts, you know, alert manager to say like hey if there's any dry in any team. They're denying alert me on Slack or whatever. And in this case, we are seeing the audit last last run. So that's the last time time stamp of the audit one. And then here, how many constraint templates we had. So in this case we had four and then we deployed another one. And then that's five. And then I believe we're going to see the number of constraints. So we have one dry run it as you saw last last part of the demo. And we had four deny constraints. And then you can see the values there too. And then these are the requests counts as you know communities always does requests, which are allowed to the red one you see are the, the request of the communities does to itself and then these are the ones that we did it and then that got denied. So those are for for denies, but communities keeps doing requests so you can keep track of those as well, because we didn't do any things to deny those so that just will keep happening. And then going back to the demo. Okay, excellent. Thank you so much. I'm going to power through the the rest of the deck here and we have a few questions. So just to start us on the project it's in beta which we've answered it's looking to define ha to go to GA for v3 of gatekeeper. If this is something that interests you where interested in understanding how you want to use it. Feel free to raise issues. I've heard a lot of questions on different things that we haven't thought of as part of gatekeeper filtering them to bring them to the community. And we will show you the links in a minute, keep going. I know there is worse and questions about policy libraries they're all in the upstream repository. So there are pre pre defined constraint templates and even a pod security policy equivalent where we've modeled effectively what the pod security policy API does in a gatekeeper constraint templates. So you could consider using that. And that was something that was just asked generally in the community so the maintainers went and worked on that. Okay, and keep going potential growth here we've had a question about mutation mutation is complex. So it's a lot of work to make mutation. Work in the way and make it have stable outcomes. So feel free to come and help with that if you're interested external data sources like different directories different places where you can make decisions on whether you admit or deny a request. Authorization, likely a separate project but Kubernetes allows different web hooks for both admin admission and authorization could it be used for authorization as well. Developing audit features we had some great questions about that where we can just put audit data and just develop a tooling making it way more simple to use and integrate into their all areas where interested in getting people involved in. And finally feel free to come and join us there are meetings every alternating Wednesday. And there is on the open policy agent slack a Kubernetes policy channel, you can come there or you can just go to the gatekeeper repository and start getting started. And with that that concludes our slide deck I will try and rush through these final three questions that we have before we close at the time. Is there any support to deploy OPA GK via customers. So today by default we use customized for the default Q builder like installation, but other than that we support YAML to deploy and also help help chart to deploy. Thank you. This question is really interesting. I'll take it if if an old pod pod from a deployment violates a new rule I understand that it will not evict what will happen if the node where the pod was running is drained. Does the old pod get re-recreated on another node or deleted. So this is an important thing to remember and I'm going to answer this very quickly. This is an admission so it doesn't change values at runtime. If a node is drained and the pod needs to be recreated by a controller like a deployment controller. If it now violates that you're going to see an error on that replica set saying I cannot create this anymore and you're going to see the policy violation rule at the replica set level. So go check there, but it's not going to modify things at runtime only at admit time. So if you have pods running that are now in violation, you've got to delete them yourself. Can I use Prometheus metrics to track violations for a specific policy or just across all I think I can answer this. It's just across all currently and I'm going to try and knock the last one out. We didn't want to do for specific things because of like privacy concerns since Prometheus and then the metrics endpoint runs in the public publicly so everybody can see it. So we didn't want to specify things in there that could be. I'm cutting you off Sir Tash Karen's got 10 seconds. Okay, so. All right, thank you so much and lucky for a great presentation. That's all the time we have for today. Thanks for joining us the webinar recordings and slides will be online later today. And we look forward to seeing everyone at a future CNCF webinar. Have a great day. Thank you. Thank you.