 All right. Hello, everyone. It's great to be here at KubeCon and CloudNativeCon China 2021. Today, we're going to take a deep dive into open policy agents. And on matters, I work as a developer advocate at Styra, the creators of the open policy agent project. I have a pretty long background in software development, where I work mainly in identity systems. I have about three years now of experience working with OPA. And before I worked at Styra, I worked at another company where we integrated OPA in large production clusters, primarily for authorization. When I don't work with OPA, I'm interested in cooking, food, and football. So what's the problem are we trying to solve here? What's the challenge? I'd say it's this. It's really to manage policy in increasingly distributed, complex, and heterogeneous systems. We have, of course, applications and the programming languages and the application frameworks. We have our deployment targets and our infrastructure. We have our cloud providers and cloud environments. And we have data sources. Of course, there's way more than we can list on a single page like this. But really the challenge is to try and manage policy in this complex environment. And the goal is to unify policy enforcement because policy is already in all these places. Most of these deal with authorization to some extent or they deal with other types of policies, but they all do it in their own unique way. So the goal of OPA is to unify the way we can do that and unify policy enforcement across the whole stack. So that's really what OPA is. And OPA is an open source general purpose policy engine. And as of February this year, it's a graduated CNCF project. OPA offers a unified tool set and a unified framework for working with policy across the stack. OPA decouples policy from application logic. So sort of in the same way that a database decouples storage from your application and moves that into the database. You can think of OPA in the same way, but for policies. OPA separates policy decision, which is what OPA does from enforcement. So your application, so OPA makes the decisions, but it's still up to your applications to enforce them. Policies, they're written in a declarative language called REGO, which we will look into later. And since OPA is a general purpose policy engine, we have some use cases ranging from Kubernetes Admission Control, app authorization, infrastructure policies, data sources, build and deployment pipelines, and many, many more. So OPA as an open source project have more than 200 contributors. We have more than 50 integrations listed in our ecosystem pages. Almost 6,000 GitHub stars, 5,000 Slack users, and more than 100 million Docker image pulls. So it's a really popular way of dealing with policy. The ecosystem includes not just OPA, the core project, but also interesting projects such as Comtest for testing configuration files or basically files or static resources. You have Gatekeeper for Kubernetes Admission Control, VS Code and INIDI Editor plugins. So it's a large, vibrant ecosystem and a vibrant community. But of course it's not just a hobbyist project, but OPA is actually used in production at some of the largest enterprises and some of the largest organization in the world. And to sum it up what OPA is about, I kind of love this quote from Kelsey Hightower, saying, the open policy agent project is super dope. I finally have a framework that helps me translate written security policies into executable code for every layer of the stack. So I think that's pretty much what OPA is about. So how does it work then? The answer to the question can largely be answered in the policy decision model, which means that we have a service. And when we say service, that could basically be anything. It could be an application, a microservice. It could be a database, an API gateway like Kong or it could be a Linux PAM module. It doesn't really matter, but anything that services a request. So when a request comes into that service, rather than trying to decide should that be allowed or not, the service turns to OPA and asks a policy query. And OPA then based on the policy it has been provided and possibly data from the outside world then makes a policy decision and returns that to the service. So it's kind of like an oracle that you have next to you where you can ask, should we allow this? Should this or that? Is this or that doable or not? And OPA will answer. So that's the policy decision model. And since it works with just JSON, any input is just JSON and the output is also just JSON. So any service that can handle JSON can pretty much integrate with OPA. And that's one of the secrets behind the general whites. So like generally applicable. In terms of deployment, OPA runs as a lightweight self-contained server binary. It's ideally deployed as close to your service as possible. And that's also a reason why it's so lightweight. So rather than having like one OPA to service all of your applications, you deploy one OPA per application or even one OPA per replica of your application. And you want them as close to your service as possible. And this is normally to minimize latency since you're going to ask a lot of questions to OPA and you want OPA to be hosted as close to your service as possible. So this is normally on the same host running as a daemon or possibly as a sidecar container in Kubernetes. And most applications communicate with the OPA server through its REST API. So that's how you ask these policy queries. But if you have a Go application, you could also import OPA and use that as a Go library since OPA is also written in Go. There's an Envoy is still an integration as well. If you are using that. And finally, you can even compile your policies into WebAssembly. So you can run policy decisions anywhere where you can run WebAssembly. So policies, they are written in a declarative high-level language called Rego. And just as a normal policy or a real-world policy, a policy in OPA or Rego consists of any number of rules. That's basically what a policy is. When we evaluate a rule, it commonly returns a boolean value like true or false. Are we allowed or not? But it could also return any type available in JSON, like strings, lists, nested objects. A pretty common pattern is if you deny something, you might also want to return a reason and say, you can't do this because you don't have the admin rule, for example. So Rego includes over 150 built-in functions. And these are tailor-made to work for the purpose of policy evaluation. So things like JSON web tokens, which might carry user identities and roles, date and time, IP address ranges, math, et cetera. Policy testing is easy since OPA provides a unit test framework. So policy is code. So of course you should be able to test policy just as any other code. It's a well-documented project. The docs are great. I recommend checking them out. And there's also a playground available so you can test things without even running or having OPA on your local machine. And it's also a great way to kind of share snippets between friends or colleagues who are also working with Rego. So policy is one of the components required for a policy decision. But policy commonly needs data as well. And OPA offers several ways to provide policies with data. And one of them I think we already touched on, and that's as part of the input query. So when you ask OPA a question, it might also provide some data with that question. So you say, I have a user here trying to access this or that endpoint with these roles. Should that be allowed or not? OPA can then make a decision based on that. Again, you can also provide JSON web tokens. That's a very common way now of having ID tokens or access tokens which carry user identity or client identity. Since OPA exposes a REST API, you can also push data inside of OPA. So that data will be available for subsequent requests. There's also a bundle API which is kind of the opposite to pushing data, but rather having OPA periodically go to remote endpoints and pull data. Finally, there's an HTTP send function. It's a built-in function which allows you to kind of add policy evaluation time from inside your policy, go out and fetch data at that very instant. Okay, so OPA and Kubernetes. This is obviously KubeCon and CloudNativeCon, so how does OPA and Kubernetes fit in? And before we kind of approach OPA, we might want to remind ourselves how the Kubernetes API works and how a user interacts with the Kubernetes API. So a user might do something like KubeCDelaApply to deploy a resource, but before that resource is deployed or really persisted in the Kubernetes database, normally at CD, it first passes a series of modules. So first off, the user needs to authenticate. And after that, the request will pass authorization. Is the user allowed to deploy this resource? And after that, the request enters the mutating admission controller. Where we might modify the actual resource about to be deployed. And finally, the resource is validated by the validating admission controller. These modules are shainable, meaning you can have any number of authentication modules, authorization modules, or mutating or validating admission controllers. And of course, Kubernetes comes with a bunch of all these modules built in, such as the OpenID Connect module for authentication, RBAC, of course, for authorization, and a bunch of modules for doing admission control as well, both mutating ones and validating ones. But what's even more interesting, at least from an OPA perspective, is the option to use webhooks, meaning that rather than using one of these built-in modules, we can reach out to an external service and ask that for advice. And of course, reaching out to an external service for advice or a decision, that's what OPA does. So for both authorization, mutating, and validating admission controller, you can configure a webhook to reach out to OPA for a decision. Let's zoom in into one of these cases, which would be the validating admission controller. And why the validating one? Well, it's by far the most popular module to extend, or the most popular webhook to use. And the reason that is, it's because it allows us to build policy-based guardrails around our clusters, meaning what we can do and can't do. So some common policies that we can enforce include the use of only an internal Docker registry and other type of image constraints. We might not want to use the latest tag, but also always have a specific version, for example, requiring labels or annotations on resources. You might say that any resource deployed must belong to a team, and that must be clearly marked in a label or an annotation. It could be a call center, it could be a department, it could be many things. Ingress or host path uniqueness, that's another common thing you might want to enforce so that if you have an ingress deployed and that one is deployed to a specific host or path, you can't deploy another ingress resource to overwrite that or to conflict. So you check what ingresses are currently deployed and then you compare that to one that is about to be deployed. You might want to enforce TLS for services. You might want to add in some attributes, like host path volume mounts. You might want to put resource or limits on resource allocations, pod security policies, which are replaced now by things like OPPA and other validating admission controllers. And you might want to do some even more like out there stuff, like deny modifications. Maybe you don't want to allow deploys on Fridays only when people are on call and so on and so forth. Really anything. So validating admission controller, when we talk about like admission controllers, it's almost always the validating one we mean. The mutating one is certainly useful to some extent, but it doesn't really build these guard rails. That's what the validating admission controller does. It provides us a safety net around our Kubernetes clusters. So with that, let's look into what Rego policy offering and policy testing might look like and also how we could deploy OPPA and use Rego for Kubernetes validating admission control and use Rego policies to build these type of guard rails. Okay, so for the demo part here, we're going to do two things. We're going to look at some simple Rego policy offering and testing and we're going to take a look at an example of how we can use Rego rules to enforce our policy in a Kubernetes admission control scenario. First off, let's consider a scenario where we have a service that queries OPPA for a policy decision and in this case we want to know can the user access an endpoint or not. And the rules we're going to work with here is that anyone can access the paths here that are part of our data source. Remember that policies are both policy and their data. So if we have a path inside of our data source here called public paths and corresponding to the path provided in the input and at the request method is get we don't want anyone to write or modify these resources but anyone should be able to read public resources. But if the path is not in here, we're going to require the user to have the admin rule. So let's get started. I'm going to start here creating a policy file. I'm going to put it in a package. All policies have a package. It's sort of like a module or a namespace or what have you. So I'm just going to call this policy and you can name it whatever you want. And when it comes to naming, we're going to create our first rule and rules always have a name but the names don't have a meaning to OPA. We call them allow or deny, warn and so on. But it's really just a convention. The way rules work and rule evaluation works is if all the conditions inside of the body, the body is what we have inside of the curly bracket. So if all the conditions here are true like this, then the allow rule is true. So if you were to write this in another programming language, you might say something like if this and this. And after each line, you'd say and. So all the lines are evaluated in order and all of them need to be true in order for the allow rule to be true. And when I say it to be true, that's just the default. We could actually specify a return value like this. If we want to return a string, we could return a complex object or a list or whatever. But of course, we just don't want to compare constants here. What we want to do here is we want to use data from the input, specifically the request method and the request path. So let's start there. And to reference values from the input, you just say input. So input requests method is equal to get. So this is our first assertion. If this one is true, we're going to move on to the next line. If this is not true, allow is going to be not true. But not true in rego is by default undefined. So what we might want to do here is we might want to say that by default allow should return false and not just an empty response. And the next line, we want to see that the request path. And in this case, we can't just do a direct comparison because we have a lot of values or we have a few values here. So we have a few paths. And if one of these are equal to, to the way we reference here, that is just like what input, but instead we say data. So data, public paths. And again, if one of these is equal to the path in the input, we allow it. The way we iterate over these is, is like this. We add square brackets. And if we are interested in the index, we could, we could declare a variable for that. But in this case, we aren't a new though in OPA version 34 is that we can use future. Oh, sorry. We can import the in keyword keywords in which makes this a little nicer to the eye. Instead of iterating over these, we just want to check for membership. So we can say if the input request path is in data, public paths, then this should be allowed. So how would we test this now? We have an input object here. We're trying to make a get request to the public endpoint. Our rule doesn't check for the user, but it doesn't have to because we say anyone can access the public paths since it's in the list of allowed public paths. So one way we could test this now is by using OPA eval. I already prepared that some here. So we have OPA eval and we provide it the policy. We provide a data file. We provide the input and we query the data policy allow rule. And we can see that yes, indeed we are allowed. And if we change something here in the input, let's try and access the admin endpoint. And we query for that. We can see that OPA changed the decision to false. We're no longer allowed, which makes sense. However, querying manually like this isn't really ideal. OPA and rego is policy as code after all. So we want to use best practices known from any other language and code in the other context. So what we want to do here is we want to write a unit test. So I'm going to go ahead and I'm going to create a new file. I'm going to call it policy test of rego. You can put that in the same package. And unit tests are just regular rule. Their name just start with test. So I'm going to say deny or allow if public path. And the way test works, I'm not going to evaluate the allow rule from here. And the way it works is since I can't provide like the input file here, I'm going to have to mock it. The way I mock things here is I say with input as and then I can provide the input. So in this case, I had a request and the method was get one and the path public. So I think this should work. The allow rule here should evaluate to true given an input like this. So let's try it out. So rather than running opa eval, we can now run opa test. And we can see that indeed that actually works. So let's add a negative test as well. So we're going to say test and we're going to say that deny and we're not allowed if not public path. So if we change the path here now to admin or really anything which isn't listed in the public paths, we're going to have to change this to not allow. And we run the tests here again. We can see that now we have two passing tests. So that's great. We can start to build some confidence in our policy. And this is of course a very simple rule. So we have one more condition to take into consideration. We wanted an admin user to be able to access any path really. So the way we can do that is we simply add another rule. We call this allow as well. And the way it works, if you have many rules with the same name, opa is going to evaluate them. And if one of them is true, then the rule itself is true. So any of these conditions can be true and then the allow rule is true. So in this case, we had a user and the user had roles and we can iterate over them again. But since we have this new in construct, we can simply do this. If admin is in the list of user roles, allow should also be true and we can stop there. We don't really need to test the method, the path or anything because we say the admin should be able to access anything. But of course, we might want to add more conditions in the real scenario. Allow if admin. And I'm just going to say allow here with input as. And I'm going to copy this here. Just change. And we're going to add the user roles. And we're going to have the admin role here. And I'm actually going to change the method just for fun here to see that the admin can even delete the admin endpoint. It might be a bit too much. But anyway, it's a fun test. So let's see if that works. Yeah, the admin can do that. So that makes sense. So that would be some really simple rego policy with two rules. One to allow anyone to read from public paths and the admin to do pretty much anything. So if we hop over here to the Kubernetes folder and take a quick look at how rules might look like for Kubernetes admission control. It could be something like this. The input in this in this case is of course the admission review object. So the API server sends the resource about to be persisted in a CD to OPA and OPA reviews that and then makes a decision. And in this case, we named the rule deny and rather than returning just true, we return a string with a reason. If we choose to deny something, the user likely wants to know why. So in this case, we have a rule which says deny because you cannot create resources in the default namespace. So if you try and update or create anything in the default namespace, we're going to deny. If we try and deploy or to deploy a deployment and that does not have a team label. That's an example we used before. I'm sure. So if we don't have a team label present in a deployment, we're going to deny it as well. And here's another thing we could also just choose to warn. So in this case, if we try and deploy a deployment and the number of replicas specified here is less than two, then we can simply just send a warning. It's fine, but we recommend you to deploy with at least two replicas. So let's try that out and prepared here deployment of engine X, which it looks fine. We can try and deploy that now and we see what's happened. So I'm going to see the into that. And I'm going to say keep CDL apply resources engine X. So now the Kubernetes API server received that request and it forwarded that to OPA. And OPA, of course, we had two policy violations here. One of them is just a warning. So this wouldn't fail, but we would just print a warning. And that was the one we saw minimum of two replicas recommended, but one of them is an error. So this resource was actually not deployed because we require a team label to be present. On any resource. So let's see if we can fix that. We're going to say team, team OPA. And we might as well fix the warnings while we're at it. So I'm going to try and do that. So let's see if we now can apply it. Yeah, that worked. So this is what we mean when we say that we build guardrails. We add rules to... Oh, sorry. We add rules that determine what we want and what we want want to allow inside of our clusters. And using a policy engine like OPA, we can be exactly as specific and detailed as we want. So that's all very well. How do we get started? My suggestion, when you start looking into OPA and Rego, it's always to start small. Kind of like we did in the demo, start simple. Just add a couple of rules, a couple of tests, and go from there. Build confidence in your policy. And again, the OPA documentation is a fantastic resource. So make sure to use that. Get a feel for the basics and the built-ins and work your way from there. And another thing I think is a good idea is to kind of start where you stand. Like, look at the applications that you have close to you and that you have previous experience with and try and identify what policies are already present and whether we can extract some of those to OPA. Because once we can do that, you can start to delegate the responsibility for policy decisions to OPA. And again, you can start small. Maybe just take a single endpoint, a single user or something like that and start to, for that endpoint, you delegate the decision to OPA. So you deploy that and you build experience. And once we have something like that up and running, we can start to scale up. OPA offers a bunch of features for management such as like decision logging, the bundle servers for storing policies externally and so on. There's the Styra Academy, which is a great resource for learning rego. It's a free resource. And there's the Styra Das, which is a control plane for managing OPA at scale. And there's a free edition, which you can try out. Finally, I think you should join the OPA Slack community. There's over 5,000 OPA users and it's a great place to ask questions, discussions and anything that goes on in the OPA community. So with that, I say thank you and I'd be happy to answer your questions after. So thanks everyone.