 Thank you for joining us for this intro talk on the Open Policy Agent. My name is Ash Narkar. I am a software engineer at Styra, and I'm one of the maintainers of the Open Policy Agent. I care about developing software that can be easily deployed, skilled, managed, and is secure by default. Hi, I'm Rita. I'm a software engineer at Microsoft, and I'm also a maintainer of the OPA Gatekeeper Project. I'm a firm believer that great software requires user empathy. In today's talk, we'll learn a bit about the Open Policy Agent, its features, we look at its roadmap, and then we'll get some updates from subprojects like Conf Test and Gatekeeper. So let's get started. The OPA project was started in 2016 at Styra, and the goal of the project has been to unify policy enforcement across the stack. One of the earliest adopters of OPA was Netflix, and they've been using OPA for authorizing their GRPC and HTTP APIs, and companies like Chef, Cloudflare, Pinterest, Intuit Capital One, and many, many more are using OPA in production for a variety of use cases like admission control, RBAC, ABAQ, risk management, data filtering, and so on. OPA is a graduated project at the CNCF with more than 150 contributors on GitHub. It has a healthy slack community of more than 4,500 members. It's been starred more than 5,000 times on GitHub, and it's integrated with more than 50 of the most famous open source projects out there, some of which we'll see later on. So let's look at this policy enforcement problem. If you look at the cloud native landscape, you know that it's constantly evolving. There are new projects that get added every day, like a new database, or a new gateway, a new service mesh, and so on. So if you're building a system comprising of all these projects, your system is going to have a lot of moving paths. And if you think about security, specifically authorization for such a diverse system, you can think of each of these projects having a different way of controlling access, which may be tightly coupled to the underlying system. Each of these projects could be written in a different programming language. And so when you're building your system with all these moving paths with such a diverse set of projects, you end up having no visibility into the security posture of your system. Even from a flexibility point of view. Imagine if you had to swap out the service mesh tomorrow. And now you have to again spend time and money to conform these policies to your new service mesh. And you might say, we'll just use docs and wikis to enforce policies. Until you have 10s hundreds or even thousands of these services, and the wiki approach is no longer sustainable. What we need is a unified way to control access across all these diverse systems. And this was one of the motivations behind creating the open policy agent. We'll check out this slide again later. But for now let's look at what is the open policy agent. OPA is an open source general purpose policy engine. When you use OPA, you are decoupling the policy enforcement from the policy decision making so your services can now offload policy decisions to OPA by executing a query. So let's understand this concept a little more using this figure. Imagine you have a service. Now this can be any service at all. It can be your own custom service Kafka Kubernetes on boy is still any service at all. Whenever your service gets a request is going to ask OPA for a policy decision by executing a query. OPA is going to evaluate this query based on the policies and the data it has access to and send a decision back to your service where it gets enforced. So you can see that we have decoupled the policy decision making from the policy enforcement. The policy query itself can be any chase on value. So for example, if you're doing Kubernetes admission control, the policy query could include your part manifest for example. Or if you're doing HTTP API authorization, the policy query can contain your request path, your method, the user and so on. The policy decision also can be any chase on value. And then it's up to your service how to interpret that decision. So this is our overview of how OPA works with your services. Now, let's look at some of OPA's features. At the core of OPA is a high level declarative language called as Rego. And with Rego, you can write policies which are more than allow deny through false yes no, your policies can be sets objects collection of values strings and so on. OPA can help you answer the question, can Bob access a particular field. And also, OPA can help you answer the question, which fields can Bob access. So the policy decisions can be much more expressive using Rego. OPA is written in Go. And you can deploy it as a sidecar, a host level demon, you can embed it inside your Go code. And now you can also compile your ago policies into wasm and use those executables to answer your queries. It's designed to be as lightweight as possible. So all the policies and all the data it needs for evaluation is stored in memory. You can think of OPA as a host local cache for your policy decisions. OPA does not have any runtime dependencies, meaning it does not have to go to an external service to make a policy decision. It does not have to contact an external database to make a policy decision. You can extend OPA to do that, but that's completely optional. OPA does provide you with some management APIs, which allow you to pull policy and data from a remote service. OPA also uploads the logs, the decisions that it makes to a remote service, which allow you to debug it for offline auditing. And OPA can also update its own health, its own status to an external service. Finally, along with the code policy engine, OPA provides you with a rich set of tooling, which allows you to build, test and debug your policies. There is a unit test framework, which you can use to test your policies before you actually deploy them. There are integrations with IDEs like WIM, VS score, IntelliJ to help you better author a policies using Redo. So these were some of OPA's features, high level declarative language, multiple deployment models, management APIs, and a rich tooling set. So we've seen this figure before. And the reason OPA is a general purpose policy engine is because it's not tied to any particular data format. As long as you give OPA some structural data and you write policies that make sense for that data, OPA gives a decision back to you. And that's why it's a general purpose policy engine. And this is just a snapshot of all the projects OPA is integrated with. There are much, much more. And the cool thing is you can take any of these integrations out of the box. And without having to write even a single line of code, you can use OPA along with these systems to enforce custom security policies. So like I mentioned before, the previous slide just showed a snapshot of all the integrations. If you check out the ecosystem page on the OPA website, you can find much more of OPA's integrations. And if you would like to contribute to OPA, one of the best ways is to integrate OPA with your favorite project and then have it featured on the OPA website. Next, let's look at OPA's roadmap. So the roadmap which we release every year is a snapshot of the work the community will tackle this year. And so one of the things we always try to do is improve our documentation to make it much easier to get started with OPA and to learn OPA. So one of the things we are doing now is we are going to update all our tutorials to use our management APIs like the bundles, the decision log and the status to help users to get started with these in a very easy and intuitive manner. Like I mentioned before, all the policies and all the OPA needs for evaluation are stored in memory. So your memory usage is going to scale with the increase in the policies and the data you load into OPA. To give you an example, when you take raw JSON and you load it inside of OPA, it takes 20 times more memory as compared to the same data loaded on disk in a compact serialized manner. So now we are looking at ways in which we can store policies and data on disk and OPA can efficiently load them from disk during evaluation. So there's some work and investigation going around a persistent storage for policy and data. Firstly, I'd like to point out the work we're also doing on bundles, wherein we are working on a new kind of bundle called the delta bundle. So for those of you are not familiar with bundles, bundles are basically gzip tar balls that contain policy and data. And so this bundle, it represents the entirety of OPA's policy and data cache. Whenever you have new policies or new data, your service creates a new bundle, OPA downloads that bundle. It first has to erase everything it has in its current cache. And then it can load this new bundle and start using the policies and data from that bundle. So imagine if you had lots of data inside your bundle and you wanted to change just a small tiny part of the data, you would have to create a new bundle. OPA has to download that new bundle, erase everything it currently has, load this new bundle, all of this work just done for that small change in data. So this is not a really good way to propagate these data changes because you're wasting network resources as well as OPA resources. And so to help resolve this issue around propagating changes to OPA in a really efficient manner, we believe delta bundles would be a good solution for that. So that's going to come up later this year as well. We've also been doing a lot of work behind improving the performance around Wasm and that's going to continue as well later throughout the year. So this is a snapshot of some of the projects being taken on by the OPA community. This roadmap is publicly available and you can check it out in the GitHub repository. Okay, so now let's get some updates from some sub projects starting with Conf Test. So as you all may know, Conf Test is a utility to help you write structured, help you write tests against structured configuration data. Conf Test became part of the OPA family last year and we thought we'd share some updates from the project. Conf Test now supports the GitHub output format and what this does is allows you to use Conf Test for checking incoming pull requests. So now you can take this command and you can include it inside your GitHub workflow inside your GitHub actions and then GitHub will annotate the file which has the policy violation. So this is a really cool feature that the Conf Test team has come out with. Secondly, there's also the new report flag that has been added to the Conf Test verify command to help you debug failing tests in a much better way. Third is the support for the new property file type. So Conf Test supports a bunch of file types, JSON, XML, YAML, Dockerfiles, Ignorefiles and now it's added support for the properties file type as well. And so what's next from the Conf Test team? So they are working on a new and improved way to allow more fine-grained control over which policies to exclude. So if you all want to chime in on that, check out the Conf Test repo, check out the issues in the Conf Test repo and add your commands and feedback in there. And so this was the updates from the Conf Test team and now for all things Gatekeeper, I'm going to hand it over to Rita. So first, what is Gatekeeper? Gatekeeper is an extensible emission controller for defining and enforcing policies over Kubernetes. It leverages OPA under the hood for making policy decisions while providing a Kubernetes native layer on which operators can define their policies once and reuse them across multiple contexts. So how do we get here? Back in 2017, Styra released Cube Management, a sidecar for loading OPA policies from config maps. In 2018, Microsoft released Kubernetes Policy Controller and donated to OPA. Later that year, Microsoft, Google, Styra and others began collaborating to reimagine policy management, the fruit of which became the Gatekeeper project. This diagram visualizes how an API request flows through the system. Kubernetes provides hooks for mutating and validating a request prior to emitting it into the cluster. Gatekeeper integrates OPA and Rego into the flow to help make these policies decisions. And for this KubeCon, we're introducing the external data feature, which allows Gatekeeper to connect to external data sources to help with policy decisions. Gatekeeper core features. Write policies as declarative configurations. It can be parameterized to encourage reuse, validating emission control, control what end users can do on the cluster. It also supports audit, which periodically evaluates resources against constraints to track compliance across time. It also allows for ongoing monitoring of cluster state to aid in detection and remediation of pre-existing misconfigurations. And with dry run, you can gradually roll out new policies without breaking existing workloads. Finally, with contacts awareness, you can define referential policies that enforce uniqueness of a field across different resources. The Gatekeeper API is built on top of the constraint framework. The constraint framework defines Kubernetes-based abstractions over policy in the forms of constraint templates and constraints. The constraint template is a custom resource describing a parameterized Rego policy to be enforced by OPA. A constraint is an instantiation of a policy, which binds it to a particular context using matchers as well as providing arguments to customize behavior. Now, for project updates. Let's talk about some of the new things that's been happening to Gatekeeper since the last KubeCon. We now have pre-alpha support for external data, leveraging external large data sets in our policy decisions. Pre-alpha support for Gator test CLI, which makes testing the YAMLs much easier for things like constraint templates, constraints, and objects to be validated. Enable support for Kubernetes 1.22, moving the CRDs to extension V1 and admission registration V1. Constraint template CRD now moves to V1. Reduced system mutate runtime by 87%. Fixed race conditions in watch manager and constraint controllers. Pre-fix based matching for namespaces and excluded namespaces. This helps support various match sections across constraints, configs, and mutation. Integer key value support for mutation path parser mutator, which helps us express path like this one to select items in a list. Helm CRD hooks to upgrade CRDs so we can update CRDs with Helm upgrade. Updated metrics reporting for mutation to track requests to the mutation webhook, the ingestion of mutator objects, the number of active mutators in the system, and the number of iterations across the system to mutate objects. Last but not least, unified gatekeeper and controller runtime metrics into a single endpoint. So now we can have published the controller runtime and goaling metrics by default. Like last KubeCon, we introduced a new mutation feature. In this KubeCon, we're introducing a new pre-alpha feature that we've been working on for the past few months called the external data. External data allows us to validate and modify requests using external data sources before applying it to the cluster. It is a generic provider-based protocol to extend access to external data providers to leverage large data sets in our policy decisions. This opens up new opportunities to strengthen security policies and establish security defaults. Here are just some few use cases. For validation, we can verify image signature before deployment against pre-configured public keys. Based on the results, it can either accept or deny these requests. We can also check for vulnerabilities of images against image scanning tools and block if vulnerabilities at a certain level are found. For mutation, we can mutate image tag to digest to pin an image to a specific digest. We can also mutate labels to add owner's information with the AAD user information of the operating performing the creation of those Kubernetes resources. Now, let's see a demo of this. Here, as you can see, is a cluster that I have and I have already deployed gatekeeper to this cluster. And as you can see, I have the control manager, plots and the audit running. And here below are the new feature, a new external providers, which works with the external data feature that we're adding. And here, as you can see, we have four different providers for this demo. The first one is AAD provider, which mutates the labels to add owner information with the AAD user information of the operator who is running the command to create a particular Kubernetes resource. Next, we have the cosine provider, which we are using to verify image signature before deployment against preconfigured public keys. Based on the results, it can accept or deny these requests. And here, the external data provider is leveraging the cosine project from the six store project to verify the signature. Next, we have the tag to digest provider, which we're using to mutate image tag to digest to pin an image to a specific digest. Last but not least, we have the trivia provider, which we're using to identify vulnerabilities of images before we're deploying them. And here, our external provider is the trivia server running in the cluster. The results will then be returned and we will be blocking if there are vulnerabilities found. So here we have the tag to digest provider. Here we have the mutate image assigned resource. Assign kind is a custom resource for the mutation feature. As you can see here, the mutate image assigned resource is applied to deployments and it looks for all containers image. And upon an emission request, the request will be also sent to the tag to digest provider, which is used to get the digest for the particular image. So next let's apply this from a test and here we, as you can see, we have a test deployment and we have two images one has the digest one only has a tag. Once we apply this, let's see the appointment is created and let's see if the digest was added. And as you can see, yes, the digital list static image now has the shot. So for our next example, let's take a look at the cosine Paul provider and here we have a country template case signed images and as you can see within the regular is calling the external data cosine provider. And it returns the the image whether the image has a valid signature or not. And for the constraint where we have enforcement action deny and applying to deployments. And here we have a unsigned example, and it's applying to once we apply it, let's see if cosine thinks it's signed or not. And as you can see here, the request was blocked, because cosine identified that it does not contain a valid signature. And here we have a signed example, using a different image. And then once we apply it, as you can see here, cosine provider did not block it because it has a valid signature. Next, we have the trivia provider and as you can see here in this constraint template is calling the trivia provider external data. And here for the constraint, we are looking at all deployments with the warn enforcement action. What is happening is when we apply the example here with the alpine image is going to call the trivia provider and the trivia provider will respond with a number of vulnerabilities scanned or identified. And here as you can see it what we receive the warning, because it contains over 30 vulnerabilities. And here we have a sample using a different container image. And let's see if trivia will will block that. And here, as you can see it did not because it did not have any vulnerabilities. Next we have the ad provider. And here is applied to config maps, and he asked a owners label using the ad provider. We are deploying a config map. And as you can see here we're deploying as a user who is an ad user. And once it's applied, once we applied the config map, as you can see, the owner for the owners label we now get the users first and last name from a ad. And that concludes the four different providers to demonstrate the external data feature in gatekeeper.