 Hello everyone, this is the KubeCon Session Policy Matters. Why, what, and how of Kubernetes Policy Management? We've got Jaya Ramanathan, Radna Chital, and Jim Bagwadia with us today to talk about all things policy. Why don't we start with some brief introductions, Radna? Good afternoon, all. My name is Radna Chital. I manage the cloud security team at TIA, and also I'm the co-chair of CNCF, Technical Advisory Group. Thank you, Jaya. Thanks, Radna. Hi, everyone. My name is Jaya Ramanathan. I am a Distinguished Engineer at Red Hat and focused on security and governance. Really excited to be here because this policy management is one of my passion topics at this point. Go to Jim. Hey, everyone, this is Jim Bagwadia, co-founder and CEO at Nermata, and I'm also a maintainer in Keverno. Hi, and I'm Robert Ficalia. I'm a policy workgroup co-chair, also volunteer with the CNCF Security Tag and CTO at Sunstone. So the policy workgroup, we are focused on all things policy, but specifically looking at Kubernetes Policy itself and how Kubernetes operators can use policy as code. We have two current projects, but before we talk about that, just some infrastructure, logistics. We meet every other Wednesday, 8 a.m. Pacific. We have our own Slack channel, and here's a link to our GitHub repository where we have some of the prototypes Jim's gonna talk about next. Yeah, so one of the efforts and initiatives we're leading in the policy working group is creating a common way of reporting policy reports, right? So one thing we found is as Kubernetes policies become increasingly important for production deployments, there are several policy tools and different tools of different languages, perhaps different features, but what seemed to be missing and something which we felt could be common is a way of reporting results from these policy engines. So the policy report CRD is that effort and working with the community, we have now several tools like Giverno, Kubebench, Falco, and others supporting the policy report. There's more integrations and adapters in progress and we continue to sort of expand the outreach and projects that can work with the policy report. Yeah, and we're absolutely looking for more projects to incorporate into the policy report CRD process. So Jaya, maybe we've been working on this policy white paper for now for a little bit of time. Maybe you could give us a brief intro on the what and hows that we've been discussing in this white paper. Yeah, thank you, Robert. One of the things we have been discussing within the work group is, as customers that are transforming to adopt cloud, they still need to meet security requirements, regulatory compliance requirements, et cetera. So they need to make sure that the cloud is configured properly for various controls and the SREs that are managing the clouds are not necessarily the experts in all the aspects. So this is where policy management comes into play where the best practices are represented as policies and used to make sure that the controls are configured properly. So we wanted to kind of put together overall best practices for policy management that covers what the goals, the overall architecture, as well as how a customer can go about implementing this approach. So that's really what this white paper is about. It's conceptual. And we plan to then have an item that references some existing open source technologies on implementing the concepts outlined here. So we definitely welcome participation from the broader community to help us progress this further. Thank you, Ross. Yeah, so well, let's jump right into it then and have the panel talk about the why's and some of the details behind how you should manage policy and communities. So, Oradna, let me throw this to you first. The why, why should we care about policies? DevOps operators, Kubernetes cluster operators, why can't I just deploy containers and forget about policy? So Robert, the misnomer in the industry is that containers are secured by default, which is not the case as we've seen from a number of breaches that have happened in the industry. The attack surface is just too white. Infrastructure threats, operating system threats, network isolation threats, software supply chain security threats. So if you look at the CNCF landscape, there are a number of tools and technologies which are provided there that are used to build all this container platforms as well as the CI CD pipelines. The complexity of integration of all these tools into the deployment pipelines is tremendous. So to improve the developer experience, provide them a frictionless way to go deploy these containers in runtime. I think it's really important to start thinking security policy as code. And that is what this working group has been working on, defining what are those security controls and how can they be built into security policies that can be deployed at different stages of a platform. So developers have a frictionless deployment experience and yet it is secure from the start and in the runtime as well. So a great point. I wanna come back to the mapping that to the how, but before I did one more why question to Jim, to why we have some Kubernetes and we have things like CIS, why is that not enough? Right. Yeah, so when you think about Kubernetes, of course, Kubernetes doesn't exist in isolation. It builds on tops of many layers, like of course the cloud or your infrastructure if you're running things on-prem or within a data center. And of course, within Kubernetes there's also several components, there's several different roles, right? So things like CIS benchmarks, tools like Kubernetes which manage CIS benchmarks address one specific layer of that. But one of the things we detail in the white paper and of course, which is also very important in practice is you really need to think about the cluster, the containers and the containerized workloads. You need to think about the different roles interacting with Kubernetes and use policies as a contract across these roles, right? So the sort of key takeaway there is making sure that every layer in the stack is covered including the containerized workloads application, the declarative configurations in Kubernetes which can also be managed as code. So just to jump back quickly right now to what you were saying, how do you accomplish all that complexity and map it back to security? How can you do that in a real world, always changing Kubernetes environment? What have you seen out in the real world? So in the real world, obviously all these security controls that can mitigate the threats, obviously you have to do a threat model and based on that threat model you know what your attack surface is then you want to minimize your attack surface and define what are those security controls which will mitigate specific threats and then convert them into security policies. Obviously you need a policy administration point where you can define all those policies and then also policy enforcement points at different places including the CI CD pipelines where you have admission controllers which can validate various metadata elements from the incoming deployment of containers and services and based on the policies they may allow them to go deploy or may not allow them to deploy based on all the policy requirements that have been set up by the security and appliance teams to reduce the attack surface ultimately. And same thing can be done in runtime as well. There can be runtime security controls which look for continuous validation of the images running. You can have a totally different image in the CI CD pipeline of what is actually running in production could be totally different. So image programs, container breakouts and several other threats in runtime can be enforced through policy as well. And Jaya, what do you see out there in the field? What are these kind of the key security implementations that move the needle? Yeah, I think what I'm seeing is customers are driven by enterprise security requirements meeting those standards as well as the related compliance requirements, right? Whether it is depending upon the industry whether it's PCI or HIPAA or FSMA, et cetera, right? So, and what they want is and they have to go through so many audits and some of them have to have to be done annually, right? So that is really the pain point and that extends obviously they've been doing this for traditional IT and now for cloud they have to deal with the same issues, right? So what I'm finding is that customers are resorting to policy management more as a way to manage configuration because if you look at the end at the end of the day, right? If you take a particular compliance standard and then you look at a control in that standard and then you're implementing that control using a technology, you need to make sure that the technology for that control is configured properly. So it's really a configuration management problem because you have a disabled configuration state which is based on best practices like for example, using TLS 1.3 or using strong ciphers, et cetera, right? And you are essentially defining policies to ensure that those configurations are set up properly, right? And imagine if you are able to accomplish that disabled configuration state for every control at every layer of your software stack through all the life cycles, CICD, runtime, et cetera, right? That's kind of the ideal vision, right? If I can accomplish that then I'm continuously security ready. I'm continuously audit ready, right? So that's really the eventual goal of policy management and what I'm seeing customers doing is they are starting to scratch the surface. They are definitely looking at things that they are automating today using whatever homegrown scripts and so on. They are now converting those into policies and applying policy management techniques to those as the first step. And then taking it to the next level saying, okay, now I'm using this particular vendor for implementing this security control. Now they're telling those vendors, can you now implement this practices for your technology as policies and give it to me, right? That's what customers are doing. So I think I'm very hopeful. I'm seeing a lot of progress in this space. So I feel like the time has come to kind of realize this vision. And just to talk to that vendor perspective because as an implementer of a policy engine you're seeing where customer needs meet the technology needs. So what are you seeing as those key asks from the community and the larger Kubernetes operator? Yeah, so policies like Jaya was very well articulating. Certainly it fits within the configuration management realm of Kubernetes and other systems. There's also, when you think about policies typically people think about governance and compliance, those type of security realms as well, right? But when applying that policies can also be used for automation, for reducing friction, reducing handoff across different roles like developers, operators that are concerned with Kubernetes, right? So one of the things that was especially what we see as very important, one of the reasons why we've been built Kivarno and are looking at doing things in a very Kubernetes native manner is to make sure that the operations team is very comfortable implementing, managing extending those policies and using DevOps best practices to policies as well, right? So not treating it as a foreign realm but somehow some central team does separately but making it part of the infrastructure as code and other DevOps best practices itself. And Narada, what is your experience and how do you realize that policy maintenance life cycle? What does that really look like as folks get kind of beyond the simple, here's how I define my policy document. What challenges do they see when trying to maintain that over time and over clusters? Obviously, security controls means that you have to convert them into policies and security controls change based on the landscape, right? Thread landscape, sorry. Based on the threat landscape, the security controls will change and the policies are updated but then the developer community has to be aware of those policies and controls because their deployments may fail based on a new control. So the communication is very important for the development community about the new threats and the controls that are going to be implemented. At the same time, enforcement points have to be the integration with the enforcement points and decision points have to be made as well. And then obviously validation of the policies that they are not in conflict with each other. At the same time, thinking of policies as hierarchy is important, right? There could be some global policies that apply to everything. And then there could be regional or a particular application deployment policies. So we can build an inheritance model where you have global policies that are inherited by all the workloads and then we can have application-specific security policies as well based on the classification of the application. It could be highly sensitive or moderate. So, but again, this is continuous work, right? Because imagine in a regulated environment, auditors tomorrow rather than having these paper data calls where you have to provide spreadsheets and spreadsheets of answers, all they have to get is an output of a report from your policy enforcement points. So that makes life for the developers as well as the organizations much easier and the auditors very easy to see where the violations are and where the focus needs to be in terms of improving security. And actually on that focus point, Jaya, what I'm hearing is this is a bit of a culture evolution or a mashup maybe of kind of old and new culture where you've got to be thinking about continuous policy implementation and policy maintenance, but yet you have to integrate that with existing ITN and existing control structures. What do you see and how you approach this? Yeah, that's an excellent point. I think that's why I feel like two things. One is if I look at customers, right, they don't just have one cluster that they're managing. Typically it is a fleet of clusters, right? And like Aradhana was saying, you know, some fleet, some set of clusters are dedicated to certain application teams, et cetera, right? So really what we are looking for is you need a policy management that is multi-cluster and it is oriented toward splitting those clusters into different application teams and having a policy specific for those aspects, right? And that said, I think we also need to think about existing IT operational processes and tools that customers are using today. That they use, for example, to prepare for audits to make sure that the, of their security posture day-to-day, how they action, incidents, et cetera, right? So this is where I think combining the policy management or policy-based governance with the existing IT operational processes. I think really that I'm using the term automated governance to refer to that, which is to say, you know, customers are already automating things. Now, if we can add the element of governance to it through policies, then we can basically say, here are your best practices that are presented as policies. Here are violations you can detect using this system and now you can trigger your existing automations to fix those violations, right? So you close the loop, right? So I think that's really where we are headed. So multicluster automated governance is the way to go, in my view. And Jim, you know, again, developers and DevOps folks who come to you and who are interacting with Coverno and the community, they have to kind of build a business case, even if it's in the open source world, they have to focus time and attention. Is that argument about automation across governance, across controls? Is that the argument they're using? Are there other arguments that are effective in, you know, getting their teams excited about this, getting their management. Yeah, so what we're seeing is a strong push towards policy-based operations, right? So if you look at Kubernetes today and of course control planes, we know that most enterprises tend to be, you know, hybrid cloud or hybrid multi-cloud. Control planes are, you know, being run on public cloud, private cloud, edge data centers, things like that. So as, you know, Jaya was also pointing out, if you need central management across all of these clusters, what's really needed is autonomy across these DevOps roles, but you need strong alignment and policies become the way to do that. So really operational efficiencies becomes the very, very much in Kubernetes, the first reason why policy management becomes critical. And then of course, securing Kubernetes and making it secure by default, like Aradna was pointing out becomes another leading reason which is equally, if not more important. And then, you know, we hear about use cases around compliance and governance and other mappings, right? So those are the sort of hierarchy of needs, if you will, if we see starting with, you know, the basic is look, Kubernetes is complex, make it easy. All right, we solved that now. Is it secure and how do we make sure all of our clusters are secure by default? And we eliminate all these manual steps. And then what else can we do with it? Can we get to that push button compliance reporting that push button, you know, kind of governance integration with that complete feedback loop through policy as code best practices. But we're gonna open this up to all of you. Aradna, I'll start with you. So how do I know that I'm doing this correctly that I'm implementing this and managing this well over time? So in my opinion, I think you have to improve this iteratively on the day one, you're not gonna have a full repository of all the policies, right, policies as you learn. I mean, of course there are best practices out there that can be converted into policies as a starting point, but from there it has to be continuous loop of feedback from the intelligence that you get and you continue to improve and harden your policies from there. And maturity is built over time, obviously, because again, the threat landscape is gonna change. There's gonna be evolution of all the policies over time and slowly from one cluster to multi-cluster and cross-cloud clusters, obviously. And with solutions like Anthos and EKS on-prem, you're gonna have hybrid clusters out there as well. Yeah, same question here. How do you define success? Yeah, so I think one additional angle I wanted to add here is we talked a lot about policies for security and compliance. There are also policies for resiliency and policies for software engineering standards, right? Those are important as well. So I think the reason I wanted to mention that is as an example, right? Let's say that you have a policy for resource limits or a policy for liveness projects, things like that. These are things that you want to kind of catch early that an application developer is thinking about these things upfront rather than, you know, they're not thinking about it and then it becoming a crisis, right? Because you're running out of resources on your cluster, you know, then they are called over the weekend or late in the night saying your application is, you know, failing or whatever, right? So I think this is where I feel like the policy management aspects span those kind of criteria as well, right? And I think, so that's one point I wanted to convey. The other thing is, as Aradhana was talking about, right? When you first start, you do have to make some investment because you are going to put this architecture in place, right? And so you won't get the benefits right away. But once you have it in place, then you will kind of see that your operations team is kind of having a view of things on a day-to-day basis, right? They're not scrambling then to get ready for an audit, right? It is a continuous process, right? So it does, like Robert you were saying it does require a change in culture, right? Of how IT operates, but it does have benefits because you can kind of manage on how things are configured, how do I action them, you know, on a day-to-day basis, right? And then to the people who are listening to this panel, this is more a call for participation into our community, right? Which is we do have to build out a library of policies. So the concepts that we have in our white paper talks about various control areas and aspects, but we need to really build out that library based on various policy engines that are out there, like you are now gatekeeper, et cetera, right? So I think this is where I think the community participation in that to actually make this real is going to be extremely important. Because like I said, you need this enabled for all controls across all layers of the stack. So to get the full benefits. And Jim, same question, but a little twist. Anything that you think that the community developers need to enable to make this success more efficient or more possible? Yeah, so certainly to me, success would be in a happy developers and happy security and operations team means successful business, right? That's a lofty goal. Right. Yeah, so getting to that point and making this so seamless, so easy where it really seems like a nice productive experience without worrying about the complexity, worrying about the security posture and things like that. But yeah, to the specific question in terms of what else can be done in the community. So the power of Kubernetes is its extensibility. And that's where using native tools, using native policy as code best practices. Really embracing Kubernetes as a platform for building platforms. I think there's more that can be done there. Some of the things beyond what Jaya just mentioned what we're also doing. The community is also starting to look and this is the broader Kubernetes community in general and the CNCF is how do you make sure these admission controllers themselves are secured as well as protected, right? What are the right controls to have in place as you're looking at the different layers of your cloud security model? So certainly there's a lot of good ongoing work and I think this is pushing and supply chain is another area where admission controllers are also playing a critical role. So all of this is really pushing the awareness as well as the increased sort of improvements off the security posture overall. And then to close it up before we take questions, 10 seconds each, Aradna, make some predictions. What are gonna be the biggest challenges or the biggest wins, the biggest successes in the space over the next 12 months? I think in the multi-cloud hybrid deployments where you have portability of applications from one cluster to another, I think policy management brings a big value because you can deploy it at one place and then distribute wherever you want it to be. Janya, your predictions? Yeah, my prediction is though this requires a change in culture, I'm kind of seeing that change happening. So I'm optimistic that we will have, like I can dream big, right? So I'm hoping that we will get to a point where we have a library that says, okay, if you deploy this library of policies, then your cloud cluster is ready for this particular compliance standard, right? For the technical controls. I think that's kind of would be a nice goal to achieve. And I feel like the time has come to do that. Go big or go home. Do you have your prediction? Yeah, along similar lines, right? So just automation, automation, automation and being secure by default. Fantastic. My prediction is we're gonna grow the policy workgroup community. We're gonna have more participants on our Wednesday calls. Our white paper is going to be a success. So thank you. Thank you all for joining in this conversation. It's been fantastic and very thought provoking. And then I think we now have questions on the virtual platform. Thank you Robert. Thank you Robert. Thanks Robert.