 All right, I think we're about ready to start it. So welcome everyone, I'm Anders, and I work for Styra, the founders of the Open Policy Agent Project, or OPPA. And I'm Will Beeson, I work on Get Keeper Full Time for Google. All right, so the agenda for today is an introduction to the OPPA project. I don't know how many of you are using OPPA already? A lot of hands, but some hands missing as well, so you might appreciate that. Followed by some project updates for those of you who are using OPPA. And then Will Will guide you through Get Keeper and some project updates there. So starting with OPPA, like what is the challenge or what's the problem we're trying to solve with OPPA? It's basically this, it's to manage policy in increasingly distributed, complex, and heterogeneous systems. So a modern application stack and consists of a wide array of programming languages and platforms, frameworks and so on. And obviously an application needs to be deployed somewhere. And on top of that, you have infrastructure and you got data. So the goal of the OPPA project is really to try and unify policy across this whole cloud native stack. So that's basically what OPPA does. It's a policy engine to unify policy across the whole stack. So the first question to answer might then be what is a policy? So a policy, it's basically a set of rules, governing perhaps what you can and can't do. So rules can of course be anything like organizational rules or policy. It'd be things like app authorization, determining who is allowed to do something or are not based on some conditions. Kubernetes Admission Control of course is a popular use case. We see policies from infrastructure, build and deployment, pipelines, data filtering, much more. So a benefit we get from kind of breaking out our policies from PDF files or Word documents and kind of bring it out into the domain of code is that we may treat policy as any other code. So all the benefits are provided by code. So things like peer review, testing, analysis, linters and so on. So we can start to reason about policy as any other code. Kind of making that accessible to developers. And the key concept of policy as code is this idea of decoupling. So sort of that like how you decouple storage from an application by moving that into the database. We believe that policy deserves the same kind of decoupling. And aside from the benefits already mentioned that you can reason about policy in its own, it's also that the lifecycle of policy can be treated independently from the lifecycle of your application. So you can make policy changes to your application without having to redeploy or having to recompile. So that's policy. And OPPA is an open source general purpose policy engine. And the policy engine works that you can ask it based on the policy you have loaded into OPPA, you ask it a question and you get back a response or a decision. As of last year, I graduated CNCF project. So OPPA offers a unified tool set and a framework for working with policy across the whole stack. So across all these, this wide set of technologies and frameworks. And OPPA builds on this idea that you decouple policy from your application or your business logic. We separate policy decisions from enforcement, meaning OPPA makes decisions. It doesn't enforce those decisions. So if OPPA says no, this person should not be allowed to view this journal or this endpoint. It's still up to the application to enforce that decision. So that's an important distinction. Policies written in a declarative language called RAYGO. Sure, many of you are familiar with that. We'll take a look at that in a bit. But first, some numbers from the community. So OPPA now has over 250 contributors, 70 listed integrations in the ecosystem page. So there's a big ecosystem of tools, integrations, frameworks, all using OPPA. 800 projects listed on GitHub as using OPPA. 6,600 GitHub stars, 5,800 Slack users and over 130 million downloads. So the ecosystem is not just OPPA, but there's a lot of tooling built around OPPA even within the OPPA project. One such tool is Conf Test to allow you to write policy and run that policy on files, local files on your system. So it's commonly used for CI CD pipelines. There's obviously also the OPPA gatekeeper, which we'll be talking about later. And there are some editor integrations like VS Code and IntelliJ. So kind of a good quote here, summarize what OPPA is really about. So the OPPA policy agent project is super dope. I finally have a framework that helps me translate written security policies into executable code for every layer of the stack. That's basically what OPPA is. So how does it work then? There's, I think there's basically two concepts or two key things that makes OPPA work with all these technologies. And it's the policy decision model and it's Rego. So the policy decision model, it's a very simple one. It basically works like this. You have a service, a service that serves requests. So there's requests coming into the service and rather than making a policy decision itself, it forwards that request over to OPPA. And that query is just a REST call and the input is just any JSON and OPPA based on the policy and data it has available, makes a decision and returns that to the service. And that response is also just JSON. So pretty much anything that can talk HTTP and understands JSON can integrate with OPPA. And when we say service, it's not necessarily like a microservice. It could be anything. It could be a Kafka broker, a Linux PAM audio, an API gateway or whatnot. And of course, like all most of these technologies have some way of communicating over HTTP. So Rego then, that's the other kind of pillar of OPPA. It's a declarative high level policy language. So not really a general purpose language, but it's surprisingly versatile. It allows you to write policy across this whole cloud native stack. And a policy, just like a real world policy, it's just a number of rules. And these rules, when you query them, they return a decision. And that decision is commonly true or false. Are you allowed or not? But it's certainly not limited to that. But a decision can be anything that would be valid JSON. So strings, lists, objects, and so on. Pretty common thing to do is to return a reason, for example, if you deny someone, you might wanna tell them why so that they can do something about it. OPPA ships with a unit test framework, which is really useful as well. You can test your policies in isolation, kind of build trust in those before you deploy them out in your applications. It's a well-documented project. And there's the Rego playground if you wanna try things out on your own. So, crash course in Rego here, teaching yourself Rego in one minute. So on the top, we have a policy. We have one rule here. And you can think of a rule, a bit like an inverted if-then statement. But we'd rather say then if. So we're gonna flip that around. And we say allow is equal to true if all the conditions here in the body are equal to true. So in this case, we'd say allow will evaluate to true if the input request method is get and the path, the first path component is users. And the next path component is equal to the username provided in the query or in the input. If you were to write this in JavaScript or some imperative language, you'd see that it gets quite clunky to kind of repeat assertions in if statements. And it doesn't compose very well. So that was an introduction to OPA and an introduction to Rego. So now for those of you who are familiar with OPA, some project updates. So there's a couple of new keywords. If you haven't tried them out already, I can highly encourage you to do so. There's a new keyword called in, which does pretty much what it sounds like it would do. It checks for membership. So you can say is this value in this collection, similar to what you could find in Python, for example. So the way you do this previously would be to iterate over a collection and then check if there any value equal to this. Now you can rather say something like is admin in the groups for this user or is the input request not in head or the get set. There's a sum in keywords, which is a new way to do iteration, which I think might feel more familiar to users coming from other languages. And finally, there's the every keyword, which is a new way of expressing for all. Just kind of very useful if you wanna express conditions like you wanna require all containers in this deployment must come from the internal company registry and things like that. This was previously a bit clunky to do. A common way of doing it would be to iterate over all the containers, check for that specific value, and then compare that to the total count of the containers. And now you can just say every container in containers, starts with my internal company registry and that's either gonna be true or not. Some notable features recently released. First one is delta bundles, which allows bundles of data to, or it allows OPA to fetch only the deltas. So if you have a large set of data and you make an update to that, OPA can fetch only what was actually changed and not the whole bundle, which was previously the case. There's a new strict mode, which allows you to catch common errors, unused variables, unused imports, things like that. So it's not all the way to a linter yet, but it's a good step on the way. So check that out if you're working with OPA. Other metadata annotations, she allows you to, and asking to see on the picture there, she allows you to annotate your rules, your packages and so on. And these annotations can then be fetched by other tools. So you can generate documentation or you can even use the annotations from inside of your rules for things like severity levels and so on. There's no new OCI bundles registry support, meaning you can package your policy and data, push that to a registry and have OPA fetch that. This storage is another nice feature if you have a lot of policy or data, and it's more than what can normally fit in memory. Now you can also choose to have that stored on the disk. Finally, there's function marking added to the OPA test framework. So, okay, so some planned improvements or some things on the roadmap, which I think we'll see in the Zoom. So there's work in progress on some built-in functions for working with GraphQL. There's been a lot of demand for that. Named built-in function arguments. So all the built-in functions will now have names by default which can be used by editors, things like that, so we can provide better documentation in line so you don't need to hop back and forth between your editor and the OPA docs. dependency management, that's been very frequently requested, that's coming. We'll see, it remains to be seen in what form and so on, but so you can basically say that this policy depends on this other policy somewhere else on the net. And test result diffs, so you can say this test, not just that this test failed, but what was the diff or why did it fail? There is an optimization flag already available for OPA build, plan to extend that to other commands as well. And finally, we wanna include non-deterministic values. There's a few functions in Rego, like the HTTP send one, which you obviously, you can't expect to send the same call two months later and expect the same result because whatever is on the other end might have changed. So we kinda wanna save the result of making those calls along with the decisions, so two months later you can go back to your policy and you can replay that decision exactly as it was at the time. Finally, some updates from the ecosystem, again, OPA is much more than just OPA, the core of product. There's a new hook OPA-based for AWS CloudFormation. It allows you to do much of the same things that you've been able to do with OPA for Terraform in the past, you can now do for AWS CloudFormation as well. There's a new setup OPA GitHub Actions, so you can easily pull in OPA in your GitHub Actions workflow and have OPA run tests or whatnot. Sanschel, it's an interesting project, it's a non-interactive daemon for host management. So you can basically, you can have policies decide what should be allowed on a host or not. So maybe you wanna read a file or write something to, so it's pretty much like an SSH client, but policy powered. Reposauer, another cool project, allows you to specify policy on your GitHub organization. You might say like, things like any pull request must have at least two reviewers, and then you can scan your whole organization for any violations over those policies. And lastly, there's the OPA Cuddle, which was, I think that was published like yesterday or something, so very new. It allows you to turn any regal policy into a CLI command, so you can pipe the output of things like LS into that and have policy determine what to return. So you might wanna use that for data filtering or whatnot. So that's some updates from OPA. I'm handing over to Will for the Gatekeeper part. All right, so I'm gonna talk about two layers that are built on top of OPA, which there's what we call frameworks, which is just a repository that several projects actually depend on, and then Gatekeeper, which is built on top of frameworks. So the idea of frameworks is representing policies as KRM objects, so you have types that, so an individual policy that is regal code, that gets instantiated as a CRD in Kubernetes, and then you can apply constraints, which are also KRM objects. So you have these objects in your cluster, and then whatever applications you have can evaluate those in real time, and you can add and remove constraints, you can add and remove templates to can configure policies on the fly. Gatekeeper is written specifically as a way of both an admission controller which determines whether or not objects pass all of the OPA policies that are configured on the cluster, and then additionally you can configure it as an audit hook, which periodically monitors all resources on the cluster so that you can be sure that everything is compliant with the policies that you have. And then I have, the main things I'm going to cover are with it for frameworks, we have a ton of performance improvements the past year, and then we have interface changes, so any projects that are dependent on frameworks, there's gonna be some pain points around that the interface has changed and that kind of sucks, but we think the performance improvements are worth it. As far as gatekeeper, we have implemented the external data feature, we have the new Gator CLI, which is built to be similar to Conf test, but specifically for gatekeeper constraint templates and constraints, and then there's a few other changes I'm gonna go over. So first off, changes to frameworks. So what you see here is a, it's a memory graph of a launching gatekeeper, obviously built on top of frameworks with 100 constraint templates and 1000 constraints, and this is memory usage over time. The darker blue is from several months ago, and then the very flat lighter blue line is just from a couple weeks ago. So as you can see, memory usage is way down, it's much more stable. If you note all of the spikes in a few weeks ago, that's because the go garbage collectors being called a lot more, whereas now memory usage is much more stable. So the big thing here was when, or one of the big things was what it used to be was that when if you booted up a cluster and you applied 100 constraint templates, we had a quadratic load time problem. The difficulty was that because we put everything in a single OPA compiler runtime environment, we had to recompile that every time we added or removed one, and since a compilation environment is one unit that you can't add or remove things from, recreating this every time a template was added or removed was just awful to have it start up. So this means that you get a 3x to 20x speedup and template compilation, and the time that you spend waiting for a template to compile is just the time for that template, not for all templates that you've had before. There's also a 2x speedup and adding constraints for these templates, and then because we rewrote how queries to the OPA compilation, the compiled Rego are run, so they're actually two to three times faster, and we actually achieved 100x speedup and running the match criteria. So this is where in gatekeeper constraints, which specify constants for Rego policies, you can specify match criteria which say these policies only apply to these specific versions, these specific kinds, or objects with these specific labels or annotations. They do note that these numbers are very use case dependent, so if you're able to move your match criteria to the constraint itself rather than the Rego, then you'll get a lot more of a speedup than if you have more specialized use cases and aren't able to do that. And then additionally for audit itself, we actually achieved a 20x reduction in memory usage. This is using a similar set of setup apps, as for the previous slide, and so again, 100 templates, 1,000 constraints, and then a bunch of namespaces and fig maps, so it should be much easier to enable auditing for your resources on your clusters. Then behavioral changes, so finally updated OPA, we're now on version 39, I believe the pull request for version 40 is out, so we get all of the future keywords and we will soon have the other features that Anders talked about. Client driver, those interfaces were reworked. Definitely go visit their repository if you have a project which depends on that because there have been significant changes. We hope that the changes are more online with how you would reason about building these kinds of systems. And then gatekeeper changes. So first off, one of the new features we have is external data. This lets you communicate with external systems. So it's more secure than HTTP.Send, it's also much more constrained. So whereas HTTP.Send lets you, I mean, just send arbitrary HTTP requests. External data treats the whatever external data provider that you're communicating with as a key value store. So it's able to batch requests, you get improved cache hits. But when writing these data providers, you're able to configure a lot of things yourself. So you can do things like, how long do I keep caches? You can tune it to your specific use case. Example use cases for the external data feature are things like LDAP. You can actually integrate with LDAP now. You can limit things like who can change what fields on specific Kubernetes resources. You can even auto label resources with team metadata using the mutation feature. You can also do things like you could connect with a CVE vulnerability system to check to see if images have specific vulnerabilities that you absolutely don't wanna have on your cluster. For mutation, external data is a bit constrained. Evaluation is synchronous and because you're very latency constrained when you're modifying resources because you don't wanna do things like kill a leader elections, so you can only use string value data for this. Then there's the Gator CLI, which I mentioned before. We've built this to be similar. It's a fuel very similar to Conf test. So you have Gator verify, which is like Conf test verify and this is unit tests for templates and constraints. So you'll say, I have this template. It specifies these variables like you have a template that might say I require some annotation to be set to some value and then you have the constraint that says what annotations and values you want. And then the tests will say, okay, these objects should be let through and these objects should be rejected and not let through. Gator test, however, is similar to Conf test test. So this is where if you have an entire repository of YAML and you just wanna be sure that everything that you're going to apply to your cluster does actually conform to all of the policies that you have, Gator test just lets you see all of your current violations all at once. This is the sample output from running Gator verify. So if you're familiar with Go test, the output is designed to be very similar to this. So you have individual test suites and then you can see failure message it is. So in this case, we got a violation where we didn't expect one in unit tests and the other, we got two violations on a resource when we expected three. So in this case, obviously we'd have to go back and change either the constraints or the templates themselves. You can also validate that specific messages are sent by the Rego policies. So if you wanna make sure that the message that you wanna show up to whatever developer tried to apply a resource is actually sent to the developer that they see that because you don't want, it's awful. It's an awful experience if you just get, oh, this resource isn't allowed to be applied but you have no message. Other improvements are we have Hermitius metrics for conflicting mutators. So if you have a mutation, which is one of the gatekeeper features that lets you modify resources as they're incoming to the cluster. Sometimes these can be configured in ways where we can't necessarily catch this problem as you're applying the mutation objects. So you now have a Hermitius metric for detecting when this happens. It's kind of infeasible at an organizational level when you have dozens of people all adding these mutators that they're never going to conflict. And then also for the Helm charts, you can now configure to remove the webhooks before uninstalling gatekeeper. So don't take out your cluster while uninstalling. Before what happened was if you just did kubectl delete everything, if you delete the pod first and you have gatekeeper configured in a fail closed mode, you can no longer modify anything on your cluster because Kubernetes tries to call the webhook because you tried to apply or delete something. The webhook wasn't available and so you just break your cluster. So please do remember to configure webhooks to be removed first before uninstalling gatekeeper with Helm. Yep, and thank you. Is there any questions or no questions? Okay, I'll be in the styroboof later. If anyone has, oh, there's a question over there. There's another question over there. Time faster, how do you do that? Oh, if you wanna talk to me after, I am happy to go and do great detail on how we achieved the speedups. It's just extremely technical and my initial report on this was literally 20 pages long but I would love to talk about it. There's another. Hi, thank you so much for a great presentation. So I was wondering about the cadence for gatekeeper and the release cadence compared to the normal OPA project. How often are you supposed to upgrade the OPA binaries or OPA library? We've been trying to, so the question was what's the release cadence for gatekeeper? How often should you expect to upgrade the binaries? We're on 3.7 or 8 now, so the minor version upgrades approximately every three months and then the patch version, usually it's every one month. Really it's best to not fall behind, say, six months but I don't recall off the top of my head what our support window is though. I wonder how much usage do you see in Istio Service Mesh? Oh, I'm sorry, I didn't catch that. Yes, I was wondering how much usage of OPA do you see in Istio Service Mesh? Istio. Istio? Or what about, what about you? Yeah, do you see that there is a lot of users using the OPA plugin in Istio? I'm sorry, I do not know. Oh, that might be more for me or I'm talking about the Envoy plugin or Envoy Istio plugin for? About the Envoy plugin I didn't know, I know about the Istio plugin. Yeah, so there's an, yeah, it was called the Istio plugin previously, I think it was renamed like a year ago or so, so it's not called the Envoy plugin but it works for both Envoy and Istio. And yeah, for sure, I think it's one of the more, it's one of the more common ways to call out OPA for any kind of, any cluster where you have that Service Mesh, so definitely a very popular project used by many large corporations and organizations. All right, there was another question and one down there as well later. A short question, Candice. How many, how much load can OPA handle? Like, what are the big use cases you've seen? Like, how many requests a second, can we expect OPA to handle? Oh, yeah, I think questions like that are, they are always very difficult to answer, like, because it all depends on like, how much quality, like, what does your policy look like? What does your policy do? And not to mention like resource allocations and how much do you have, but in general, I think like OPA, your application is gonna go down before OPA. Yeah, if you committed me after the talk, I do have specific numbers for gate keep or for frameworks actually, so I might be able to get you specific numbers on that. So I have several different policies that I can figure to varying complexities. I hear some other question down there. Thank you for the presentation. I have a question regarding the Rigo language. What was the need for this language? Because I saw that you are implementing a lot of features that already exist in other languages. So for comparison, for lists, for dependency management, so. So the question was, what was the kind of reasoning behind Rigo or so? And why not, for example, Python, JavaScript, or Lua? Yeah, that's a good question. And it's a pretty common question. People are like, can I not just use JavaScript or whatever? And I think like, there are certain characteristics to Rigo that we want for policy. Like, first of all, what you want to do with a policy is basically adding a safety net. It's a guardrail around whether it's for authorization, whether it's for admission control or whatnot. So a very important characteristic is of course that policy does not turn around on you and fails or that it can't terminate, for example. So if you would have a language where you can do things like end up in a never-ending loop and you can't guarantee that this will never terminate. So things like that is basically like kind of the premise of Rigo where you have certain guarantees on things like process consumption, that the policy evaluation terminates and so on. And as for the design, like Rigo this has its root in the kind of logic programming and data log. So yeah, it does look fairly different if you come from like an imperative language, which I think many of us do, but spending some time to learn kind of the principles is I find very enlightening. It kind of makes you recent about any code in a different way. So there's definitely a learning curve. Once you get past it, it's like, it's a great place to be. Oh, there's, oh, there's sir. Yeah, so my question's regarding, so if you look at OPA it seems like it's meant for server back, you know, like backend type of applications. But what about like user experience? You know, do you see anyone, like can OPA be leveraged to enforce like some of these policies at the front end user experience layer? Yeah, that's a great question. I think like, yeah, to some degree for sure. Pretty common thing to do is like to have, you do the enforcement in the backend, but you kind of propagate that in some nice way up to the front end. There is also a few projects that have, that are working on like not even, perhaps not even sending the request to the backend in the first place, but rather kind of evaluate the policy in the front end and show the result without even propagating. You'd obviously want that control still in the backend, but for performance reasons, or maybe you want to show a view that only lists items that you have access to and so on. There's a couple of ways you can do that. You can either, you can compile, you write your policies to WebAssembly, and run those in the browser. There's, and there's a few other projects to try and make that possible and easier, but yeah, it's definitely an interesting topic. I think there's a lot of potential there. So, I mean, would you say it's advisable or could you, would it make sense to say like, you know, you deploy OPPA and all that, but then you tell like some other stakeholders to write policies. Do you think they are easy enough to learn like, or what's your experience on there and all of that takes any sense? So, is there a question? Around learning rego, or yeah. So again, I think there's definitely a learning curve. And yeah, so, but learning the basics is not, I don't find that very difficult. And then there's a lot of details, of course. But if you wanna just do something simple, like the allow rule you showed before, I think that's like after a couple of hours, you feel pretty confident with like the basics of the language, but then you definitely need more time for to learn like all the details. But of course, I think like having, if you're a development team or whatnot, I don't think perhaps not everyone needs to know like, not everyone needs to be an rego expert, but in a mastering the basics, that's a good start. And I think that does not require that much of an investment. Yeah, there was a, there's two more. Are we good on time or yeah? There was one in here. Great talk, many thanks. One question, actually two. Can OPA coexist with PSP? And how is the migration in that direction? This is the first question, the second one. Do you have some resources or GitHub source with best practices or how to use the language, right? Oh yeah, can I take this one? For sure. So OPA and PSP can absolutely coexist as part of Gatekeeper, we wrote something called the Gatekeeper Library, which is a companion repository and what the Gatekeeper Library repository has is dozens and dozens, I think it's about 70 policies in rego implemented as constraint templates. So you actually see the full rego code and I think around 20 or so of those are specifically for PSP. So PSP and OPA can absolutely exist and I know several people that are doing exactly that. Yeah, and as for the second question there, we're just like a style guide or there's not been one but I've actually been working on that for the last week and I'm hoping to make that public this week. So yeah, last question I think. This one's more of an adoption question. Like how do you typically like you have in large enterprise, you have maybe a common set of policies that everyone needs to use and then application teams may have specific policies. So what do you see in the enterprise from an adoption perspective? Like how do you know what policies out there, enforcement that's happening, alerts, anything you can share on that space? Yeah, sure. So the question is like how do you manage OPA at scale basically and how do you prohibition policies to like thousands of decision points? I think like OPA allows for via management APIs, you have like the bundle API, you have the decision log API and there are various control planes that you can use to offer policy, to test policy and to distribute policies in larger environments. And of course, like Styra offers one such control plane and there are other vendors as well. Thanks again.