 So today we're gonna talk about Kubernetes policy management. So a few topics we wanna cover in this session. First, we're gonna talk about why policy management matters for Kubernetes. Where does it fit in the overall ecosystem? We're gonna also deep dive into what the policy working group, which is the CNCF Kubernetes working group does. And then we'll talk, we'll spend half of the session talking about the policy reporting API that was created in the working group. And also talk about policy reporter, which is an open source tool, which you can use for reporting, monitoring and management of policy results. So first quick introductions, I'm Jim Baguardia, co-founder and CEO at Nermata. We are the creators of Kiverno, which is an open source policy engine, also a CNCF project. I also co-chair in the policy working group in the Kubernetes community. And I am a maintainer of Kiverno. Yeah, welcome also from my side. I'm Frank. I'm a senior software engineer at LeVoo, German based dating platform. I'm also the creator and maintainer of policy reporter and open source contributor to tools like Kiverno as well as Palco. So quick notification about the CNCF code of conduct. So if you haven't read it, it's available online. Here's a QR code. Also, for virtual attendees, there's closed captioning available. And for session Q&A, you can enter in questions on the chat. We will be answering after the session. Or of course, as we have time in the live audience, we'll also take questions there. And thanks to our sponsor for the recording itself. All right, so diving into it, let's start with what policies are and how they fit into Kubernetes. So in real life, we all know policies as a form, a set of rules, which helps us govern or manage things within an organization. Like within your company, you might have a vacation policy or other policies which are just established. But in the software world, policies are a form of configuration management. So within configuration management itself, you have the ability for some config like meta configuration to manage either other configurations or runtime behaviors of your systems. So really, as an operator, as an ops team or a platform team, using policies, you're setting rules to govern the behaviors as well as other configurations within your clusters and environments. So in Kubernetes, of course, we're all familiar with things like network policies. That's an example of a policy object which is baked right into Kubernetes, part of the API itself. But let's talk about why policies matter, why would you want them and what does it mean for you as operators or users of Kubernetes? So first of all, Kubernetes configurations tend to be fairly complex. And partly that's because Kubernetes has a lot of different roles and responsibilities into the system, right? In some ways, it's the first platform really designed for DevSecOps. So you have developers trying to deploy workloads. You have security teams who care about workload security as well as platform security. And you have the operations of the platform team that's managing all the clusters, the add-ons and other configurations. So how do you kind of bring all of these roles together and make sure that they can share cluster configurations and still have the autonomy, the flexibility that each role requires? Policies become that digital contract across these roles. So you're codifying how each role should behave, what different users are able to do and you're able to now, in Kubernetes itself, again, based at the level of policy you're providing, you are able to have these either as Kubernetes resources, custom resources or native resources through the extensible APIs. So in many ways, policies help simplify the configuration management for Kubernetes. It helps automate certain portions of configuration and more importantly, it can help you prevent misconfigurations, which I'm sure you've all seen the headlines and the reports. That always tends to be the number one cause of security issues is by either lack of proper configs or misconfiguration in Kubernetes itself. So within Kubernetes today, there's at least four different classes of policies. There's API objects, like we talked about network policies or there's RBAC configurations, things like that. There's built-in admission controls. So within the API server through flags, you can manage several different types of admission controllers, like for quotas, for defaults, et cetera. There's the validating admission policy, which is an alpha feature introduced into Kubernetes, which lets you program the API, which also lets you exercise policy checks on the API request itself. And then finally, there's dynamic admission controllers, which are more responsible for complex checks when you need to reach out to other systems or anything that requires multiple objects and configurations to be checked. So typically in a production cluster, you'll likely end up using, if not all, at least most of these types of policies. And all of these, again, serve different purposes, different roles, and have a place in terms of your overall toolbox. So within the policy working group, and for those of you who may not know, in CNCF, there's, of course, the oversight committee, but then there's also special interest groups called SIGS. So there's networking and SIGAuth and SIG Security and others, but then there's working groups, which are cross-functional and also kind of play different roles. So policy working group is one of the working groups, chartered, of course, which simplifying, defining, and kind of advocacy for policies and how policies should behave within Kubernetes itself. So some of the projects that we are currently, as well as looking at in the future, doing within the policy working group. So first off, the policy reports API, which we'll talk about and then also show in some demos. There was a policy management paper, which we have produced about roughly a year ago. And then we're looking at doing a v2 of this to describe some of the newer features, newer capabilities and developments. There's also a new paper on Kubernetes, governance, risk and compliance using policies as a building block for these business functions. And we'll briefly touch on that. There's also discussions on how do you go from policies to other things in your business, like compliance management, if you were to map policy results into that. And then finally, we're also looking at what should we do in terms of policy management for the Kubernetes docs. So today that area is a little bit lacking, so we will be updating that and publishing things there quickly. So just quick references. Certainly these are things you would wanna look up in more detail later based on your interests. The policy management paper, like I mentioned, was published about roughly a year ago. And the two things that covers as well as a few other areas. So first off, it describes a reference architecture and the different components involved in policy management, especially in a multi-cluster environment. So it talks about policy enforcement points like admission controllers, or you might be running policy enforcement in your IAC or CICD pipelines. And then it talks about policy administration points and what the function of those are, like some management tools which are sitting outside of Kubernetes and operating across multiple clusters. And then it also kind of covers how policies fit into the lifecycle, like the different phases from build to deploy to run and where policy management fits, what kind of tools you would wanna look at, again, whether it's for configurations or runtime policies, et cetera. The GRC paper expands on this and starts mapping policies as building blocks into other higher level business functions. And what we're specifically exploring in that paper and describing is how you can use policies as part of your overall security frameworks. And also, if you're kind of looking at this for operations automation, so policies are great to automate certain things through mutate and generate type of policy definitions. As well as for cost management, right? So most of us who are operating clusters, especially if they're cloud-based, resource management, cost management becomes important and policies play a critical role in that as well. So here's a kind of scan code for the GRC paper and this is open for public comments right now. It's in draft phase. Those of you are interested, leave this up and of course, it's gonna be in the slides, but please take a scan, read through, let us know if we're on track, if this addresses the right questions and if you'd like to see anything else in there as well. All right, so, and one other project to quickly mention is compliance mappings, right? So today, of course, in the industry, there's things like OSCAL, which is a language developed by NIST, the U.S. Institute of Standards, OSCAL stands for Open Security Compliance Assessment Language and what it does is it provides a format, a well-defined structured format for compliance assessments, right? And we all have been through, like whether it's anything from SOC2 to NIST to other compliance standards, ISO standards, we've all been through audits and sort of these compliance assessments and today it's very manual. So what we're exploring here is how do you take things like, which are automated in systems like Kubernetes, like policy definitions, policy reports, and then map them back into these compliance standards to move from these manual processes more into automated and continuous compliance. However, what we're not sure about today is whether we'll take this on in the working group or we will move this into a CNCF level project and look at compliance across systems, more from not just Kubernetes's perspective but even other CNCF tools, et cetera. But anyways, it's a key initiative that we're gonna pursue one way or another. So certainly if this is of interest, it's something you would want to track and also feel free to share feedback, thoughts and comments on this. And then finally, I talked about the docs update, right? So one of the projects we're gonna take is this is the current Kubernetes policy page and certainly it's lacking in many ways. So lots of things we can do over here and we're gonna start work on this within the policy working group. So again, feel free to chime in with thoughts on what you would like to see here and how this can be more helpful. So quickly to summarize what we've covered so far before I hand off to Frank, we'll talk about the policy reports in more detail. So policies are essential for Kubernetes. You are gonna be using, if not one, probably all of the four types of policies I mentioned. They become a key building block for you in your toolbox, again, for security, compliance, other automation type of workflows. And the policy work group, again, our focus and goal here is to make it easier, make it more understandable where policies fit, why they're needed. And of course, as new things come out in Kubernetes, sort of publish other guidance papers and things do that, as well as create software which eventually gets promoted to SIG level or Kubernetes level projects like we'll talk about next with the policy reporting. So with that, let me hand off to Frank who'll cover the policy reports in more detail. Thank you, Jim. So after Jim's introduced us into the policy working group, I want to show you how the policy report API is used to provide feedback about policy validations in our clusters. And when we talk about the policy report API, we are talking about the policy report and the cluster policy report CID, both providing the policy report results for either names-based or cluster scope resources. And these results contain information about the status of our policy validation and optional metadata, such as the related resource, rule categories and some more. The policy report API has two main use cases. At first, they reflect the current validation results of existing resources and compared to the policies applied there. Examples for this use case are Calvano validation policies, the Q-bench adapter or the 3D operator with the policy report adapter. All these tools scans the resources already existing in your cluster against their policies and report with a policy report about the results. The second use case is more like a log. So policy reports are used as logs for runtime security tools like FICO or Tracy and providing a list of last recent violations of various policies because they has a nature for infinite growth. You'll mostly have some kind of configuration to limit this list and last results then dropped as soon as the limit is reached. We also have other use cases but these use cases are very too specific. One of them is in the use of the open cluster management project which checks if a required policy is in a cluster violated by checking the corresponding fail results in our policy reports. In this use case, the project acts as a processor of policy reports so it does not create their own. It use them from other tools and check if a policy from this tool is violated. And now I want to introduce and show you a policy reporter which is also a processor of existing policy reports from other tools. It's intended to provide observability and monitoring possibilities in your cluster based on the short policy report API. It has, this is a list of a few features. These features are intended to solve challenges you would have with the normal get and show me all policy results in our clusters because informations from policy reports are distributed against namespaces, resources, or even clusters in a multi-tenant environment. So we have features like sending results to external tools to Grafana Loki, Slack or S3 buckets. It provides metrics so you are able to show your violations in well-known monitoring solutions like Grafana. It also ships with a standalone web based web UI which also has all detailed informations. It provides some filters and graphs. You can send email reports on a crown base to from different clusters to a central email about the current status of your cluster security and all these features also have granular configuration and filtering options. So you only process and sending results you're really interested in. To show you these features, I prepared a little demo. I have some tools already installed. So we have Prometheus and Grafana with Loki for monitoring tools. We have Kaverno and the Trivia Operator along with the Trivia Operator Policy Adapter for some security tools that produces some policy reports and policy report itself. If you want to see how it's configured, I prepared a GitHub repository and you are also very welcome to try it out yourself. This is a public instance of the Policy Report API UI. I will just show you and it's not that mobile optimized but you get an idea how it's worked and what you can do with it. So before we start with the actual UI I quickly show you how this policy report we talked about looks like. So this is a policy report created by Kaverno. It's just provide us one result with some metadata like a category. We have a message in the related policy. We have also the related resource which is an engine spot in this case. It pass our validation and we also have some summary in case we have more information or results. Then we have a quick look on our policy reporter configuration to see what features are enabled and how this looks like in our use case. So we want to send only warning results and hire to our Loki instance to have some kind of logging. We configure Loki by just configure our host for it. We can add some custom labels if you want to. And we also want some kind of real time notifications for some stuff which we are using is the slack integration. So we are able to set some filters. For example, we want to exclude 3V system and cube system for our general slack channel. We can use the channels feature to add multiple slack channels to our configuration. So we don't have one channel which gets all results which is not really an overview. So we configured and Kaverno validation by relations channel which just gets results from Kaverno for single policy and we exclude the cube system namespace. We do quite similar stuff for our 3V validations with config, audit and vulnerability scanning. And because it's very important if something happened in the cube system we want to push notifications related to the cube system namespace in one related slack channel. Then we also have a look into the Grafana and Prometheus integration. For this we enabled the metrics. There are different modes for metrics and custom provides only the labels we are interested in so we can reduce the continuity a bit. We also added the integrated monitoring subchart. This is an integration with the Prometheus operator and predefined some dashboards and also do some pre-configurations with service manuals, CRDs and last but not least we just enabled our UI with the Kaverno plug-in so we have some special features related to the Kaverno system. So now that we have a quick look how the configuration looks like we just jump in our terminal where we will create an engines to violate some stuff in our cluster and hopefully create some notifications. So first we can have a look into Slack where we already get the first validations. So in our general channel we had no really filter except for the cube system exclusion and because we created our engines into the test namespace we got our informations with the policy and description and some metadata and that's for all findings related to trivia. And now also Kaverno. Then we have the dedicated channels for Kaverno where we only get this one policy result we configured. So we have our message and also in this case additional informations about the resource category and some other stuff and also the trivia. So you get an idea we are able to route our notifications to dedicated Slack channels. This could be helpful if you have some kind of multi-talent where some teams related to a specific app or namespace and you are able to route the notifications to the correct team. The same goes for the low key integration. So when we now run our low key query which just looks for events with the source of policy reporter. Yeah, we see quite the same. We have now our violations only for warnings from trivia as well as Kaverno in our low key and can do some analysis on it when it happened and I can aggregate and filter them with all possibilities low key have. Now we check out our Grafana integration. So as I mentioned, policy reporter has a direct integration into the Prometheus operator which I use in this cluster and this integration provides three dashboards. All have this policy reporter text so it should be easy to find them. The one is more like an overview. So you have an idea how many policies in your namespaces fails if you have cluster policy results also amount of them. You have a timeline and some details about the source, the namespace policy and status that other labels we configured. Yeah, and then you are able to know where you have a look to check what really failed. The other dashboards are quite similar with the difference that they have more information also about past and warning results. So you can see all your policy report results independent of the status, also of the timeline and the details. If you have a sandbox cluster, some dev environment or just want to try it out and you don't have a big monitoring solutions because it costs a lot of resources, we can use the presented policy reporter UI which is just a simple web-based integration and a replacement or additional to the Grafana integration. So you get quite similar informations. At the top, we see our integrations into the different other tools we configured. So you have an idea, okay, I have my low key integration for this cluster as well as slack, carbon violations and trivia. Then we have our dashboards which have informations about all our sources we have configured. In our case, it's trivia with its different reports as well as caverno. So we just see what fails in our cluster and if we want to have a more detailed look, we go to the source we are interested in like caverno and now see quite the same as in the Grafana dashboard with all informations, messages and some filters you can use to only have a look in the subset of your resources and check out where you have or what policies fails and what you can do to make them pass. The same goes for cluster policy reports. The only difference, we have no namespaces and yeah, a locks page which is some alternative to a low key instance, very simple to just have a timeline when which violations happened. At least we can have a short look into the caverno integration. It's just an overview about your policies with the background scan feature enabled or not. The audit actions that we have audit to just create reports, we also have enforce to block requests. We can see the YAML configuration behind our policy and if you want to send your reports in a more readable format also to external or other team members which don't have directly access, you are able to create this HTML reports. You also are able to filter for namespaces and policies and this creates a simple web page which you can persist and send to anyone else. Yeah, I think that's it. And now your turn. Thank you Frank. Yeah, so just to quickly talk a little bit about some of the ongoing and current projects in the policy working group and summarize. So one thing we are actively looking at doing is now working towards promoting the policy report API that Frank demonstrated to either a SIG level or a communities level API. So this can be further standardized used by different tools and projects. Like I mentioned, we're also actively working on completing the GRC paper. So talking about how policies and policy reports can be mapped into other functions. We will be starting a project or at least creating a PR on the Kubernetes stocks and then also discussing where and how the compliance mapping fits in so you can take policy results and map them into compliance assessments itself. So certainly if you're interested in any of these, feel free to join any of our work group sessions. We meet every Wednesday or every second Wednesday at 9 a.m. or yeah, it was 8 a.m. Pacific. So that all of the details are of course in the web links and we have a GitHub repo where most of all of our projects and activities are and if you go under the Kubernetes SIGs and working group pages, you should be able to find us there as well. Also please give us feedback on the session to know what else you'd like to of course see covered in future sessions and how we did and how the content was. And with that I feel I think we have a few more minutes for Q&A so we can take some long questions and of course you can also follow up on the Slack channel with other questions later. So any questions from folks in the audience or thoughts, comments, feedback? Hi Jen. So this is great. I like what you guys are working on. Thank you. This is great. I like what you guys are working on. Are there things in upstream Kubernetes that you feel are missing that are limiting your ability to do more policy work? So certainly like we mentioned briefly, the validating admission policy, being able to run things in the API server, that was perhaps in some ways long overdue and a good addition, right? To be able to run policy checks directly. In terms of the other major area was of course the reporting, et cetera. So the more we standardize there and there are some good discussions in the community on how do we scale the reporting to, to take data from different sources and be able to create this uniform API which other tools can consume. So those are at least the two areas and then moving beyond just validating to mutating policies and even perhaps to be able to generate resources, et cetera. Those seem like good steps as we kind of look at more standardization across the board. So I was interested in more examples how policy policies are used. For example, are there someone who is trying to create their own Azure? For example, having their pool limiting resources pod and stuff like that. So maybe some examples. Sure. Yeah, so the basics we typically see in terms of policy examples are starting with the fundamentals like pod security, right? So today, of course, in Kubernetes PSPs have been deprecated. Now we have pod security standards. There's the built-in pod security admission but even being able to, you know not every cloud provider or managed Kubernetes service will have the same level of pod security. So really as operators, we need to be cognizant of that and make sure that we're enabling all the right secure defaults for pod security which deals with the security context in every pod. Now beyond pods, there's several different security checks you will want to perform on workloads like services, deployments, you know, stateful sets. There's also a lot of best practices, right? Being able to make sure that you have things like the resource quotas or other kind of probes and health checks and things like that defined for each workload. So policies, those are some of the fundamentals. Beyond those, there's use cases like software supply chain security. So signing images, verifying those, verifying attestations and metadata. So those are becoming fairly popular and widely used as well as, you know, other more complex checks where you might want to even query out to other APIs or different tools. Like maybe you want to check with open cost and make a policy decision based on that, right? So just the wide variety of things that some projects also there's a lot of automation you can do with policies, like distributing your image registry secrets, your CA certificates to each workload, things like that, which are common, painful things people end up doing in clusters, which policies can help automate. All right, so I think we're right on time. So thanks everybody and, you know, feel free to stop by either the Kibirno booth or we'll be in other sessions as well. Thank you. Thanks everyone.