 Great to see so many people made it here. I think we have a full house today so topic for For our talk is going to be open policy agents An introduction and a deep dive. I'm Anders. I work for Stara the creators and maintainers of OPA or one of the maintainers I'm Xander. I work on open source at Microsoft Yeah, so what we're gonna do today, I'm gonna give an introduction to OPA for those not familiar and And followed by some updates from the project and then Sanders gonna cover the OPA gatekeeper project So let's start with an OPA introduction And for those not familiar what what is even when we talk about open policy agents might be good You know what what is policy in the first place? So policy is basically a set of rules And these rules can be basically anything that a real life rule could be or a real life policy could be So it could reflect things like organizational rules How many people are allowed in a in a room or things like permissions should this user be? Should be allowed to perform a certain action given some conditions It could cover things like infrastructure or Kubernetes manifests or whatnot wherever you can imagine having rules You can use OPA But that just rules and policy well what we really want to do here is we want to treat policy as code That is really what the kind of movement that I'd say OPA started What we want to do is we want to work with policy as we work with any other type of code We wanted we want to test our policies. We want to work with code review Linting and whatnot. So basically all these Benefits that we have all working with anything as code. We want to we want to work with policy the same way We don't want that to be in locked into a PDF document or a Word document or whatnot But we want to we want to be able to collaborate on the rules of our system as well a Key concept of OPA or policy endings in general is that we want to decouple policy from our applications So that we can do all of these things that I mentioned We don't want to kind of couple policy with other type of business logic We want to be able to work with them as an isolated and first-class concept That's policy. So what's OPA then it's an open-source general-purpose policy ending And general purpose is important That's basically one of their kind of core promises of OPA We should be able to work with the one technology to cover all of these use cases As of February Three years ago, it's a graduated project. I think it was submitted in 2017 or so, so it's it's been we've been around here for a long time What OPA does is provide us a unified tool set and a framework to work with policy across the whole cloud native stack and Again, we decouple policy from application logic and we try and we separate What we call a policy decision that's basically what OPA does it provides us decisions It doesn't enforce those decisions that will be up to your application and what that means Is highly specific to the context of that application? So OPA will tell you know this user should not be allowed and then how you choose to enforce that that is up to you Policies to written in a declarative language called rego. I think we'll come back to that in a bit First some word on our community. We've been around for a long time. So we have a vibrant community of users and contributors There's over a hundred integrations listed on the OPA ecosystem page So pretty much any technology you choose to work with you can probably integrate OPA in some shape or form We have 9,000 get-up stars 8,000 slack users and hundreds of millions of downloads And there's not just the OPA the core project, but there's a bunch of projects that kind of spun out of OPA such as OPA gatekeeper comp test and so on and If you if you want to offer some rego write some policy, there's integrations for pretty much any editor out there So how does it work? How does all these different technologies integrate OPA? It's a fairly simple model actually we call it the policy decision model It basically works that you have a service and when we say service we have a quite broad definition of that it could be It could be a Linux palm module a Kafka broker a micro service Basically anywhere where you have a request From a user or other service rather than trying to make a decision on that On that request yourself in your service you delegate that responsibility to OPA and you say Should this request be allowed or not? Should this deployment be allowed or should this or that? Please provide me a decision on that and that's basically what OPA does and The query we send to OPA is just Jason and what we get back in in the form of decision It's also just Jason so it's Jason in and Jason out so basically any application or stack where you can deal with Jason and Possibly an HTTP request which is probably almost all technologies written in the last 20 years Which explains why OPA is so prevalent if we zoom in a bit here on on Actual policy evaluation it might look something like this you have a Jason document in this case It looks like an HTTP API Sending us a query asking should this be allowed or not? We have a request we have a user and We have a policy in between and this policy says that by default Allow should not be or allow should be false. That's a fairly sensible default for an authorization policy Don't don't allow anything on this unless we approve it The next condition says allow if admin is in the input user role So if we have an admin we should allow it regardless of any other parameters That's not the case here. So we're gonna move on to the next allow condition or the next rule In which we say allow Read requests or get requests to if the first path component is users so anyone can read From users, it's not a get request though. That's not what we have in the input So we're still not allowed The final allow rule says if it's a put request which this is Meaning someone is trying to modify a user Then the name of the user must name match the name of the user in the path and Again, it doesn't this is Peter trying to modify Anders So this would either this would not be allowed to do so the end result here is Opa says this is false. So that's basically a crash course in to rago and and policy evaluation And of course again, we're not gonna have to go straight through all of this here But in this case we have a Kubernetes admission review object But to opa this is just Jason none of these things actually mean anything. It's just A bunch of structured attributes so we can make policy decisions on anything and this could be a terraform plan or Or what have you? So why is opa? Can't we just write this in Python or Java or whatnot? It's a very common question and it's a valid one as well So again one of the ideas behind opa and I think The problem becomes apparent when you start to have one team that says can't we just do this in Python? And in the same organization you have another team says can't we just do this in Java? Can't we just do this in C sharp and? And now you have policies gathered all across the organization and it's written in different languages It's very hard to audit. It's very hard to control from a central point or place Yeah, Wang from Bloomberg did a great talk on Istio and opa the other day where she said and then somebody asked her Why can't you just use the authorization mechanism provided by? Jupiter hub or I think it was And she said yeah, we could do that but opa provides us a generic way to apply policy consistently across all of our services and systems So the policy they used for Jupiter hub They could use reuse for other use cases and that's kind of the point of of using opa and rather than Then trying to solve this just as in something you saw isolated in your team, but trying to kind of Try to solve it from for a whole organization Some some more on that So what would you use native or opa over native authorization systems? It's of course this idea that you can't share policy across teams or organizations or departments And this the coupling of policy where you no longer write Your policies in the same language as your applications that means that we can also We can centralize Management of that we might have a security team that wants to do audits or reviews of that And they're not gonna they're they might not know all of these programming languages or frameworks But if they learn one language like rago they can work with us across the whole organization Auditing and particularly in regulated industries You need to know and this is one feature of opa all these decisions are logged and they're logged in a uniform format too so we will we will have one place where all these logs are stored and and we we know where to go to see what's what What decisions were made in our systems? And of course going back to policy as code We want to be able to test our policies and we want to test them in a single way as well another benefit of this the coupling is that we can We can do policy updates without having to recompile or redeploy our applications And we don't we don't really need to reach out to to the teams are responsible for these Applications if we treat policy as a concept of its own So that was a crash course to opa No for some project updates So the big thing going on and we've been working on for I'd say like the the last six months or so is The an upcoming 1.0 release Where we try to resolve some ambiguities And other things in the language. I think the first commit was 2015 and we've tried to never break Break the contract or backwards compilaties ever since And I should say right away I don't think this is going to be like a huge pain for anyone We have good tooling in place and so on but but but it's basically what we're what we need to do for the first time We're gonna have to do some changes in the language Some of these changes improve readability via some syntax sugar I'll show an example of that soon, but it means the if keyword the contains keyword are gonna be mandatory for for rules future keywords are Imports are no longer future that future is now. So that will just go away Some built-in functions are no longer relevant and will be deprecated or are deprecated already will just be removed If you run opa with strict mode today, you basically have this already Some things like duplicate imports Using input at data or data as variable names Or using deprecated the functions, I would say very few of you are doing this So should the impact should be for the minimal To provide an example what a regal policy would look like today. We have Following the package we have some imports of future keywords. Those are no longer necessary we will see allow and Then follow like in the old way would just be followed by the curly braces Now we can write one line of rules if and we can skip the curly braces Same thing with the violations rule, which is a rule building a set a set of strings This case we will now The syntax was be changed to use contains as you can see a Bit more compact If you so want you can still use the curly braces and so on But I think all in all a more readable way to present a regal So what what can we do to prepare? There's really no need to wait for for that release Basically, all of these things are already in opa today What you can do is you can say import regal Dot v1 in all your policies and the parser will parse that as A v1 policy and if there's anything that not conformant or compliant it will complain about it You can even say to the opa format or opa fnt regal v1 And it will reformat your source code to to use if and to use contains and to use this import And the only time where you will need to do some manual modifications Is probably if you use input or data as variable names those are reserved keywords from now on or Using one of the deprecated functions And if you're unsure am I using a deprecated functions you you probably are not they have these these functions have not been part of the documentation for four years and That's it basically And just as a final thing we will provide a sort of legacy mode if you if you do need to run Perhaps mix deployments of modern regal and older For for for the foreseeable future you will be able to do that We'll just have to say that this is this is older an older version I can't update right now for for whatever reason and I wrote a blog about it if you're interested to learn more about what opa 1.0 entails But it's it's not a whole lot of more than this, but if you if you prefer that in written form Go check it out Okay with that without out of the way. What what else is? What else is ahead of us? So a few features on our on a roadmap runtime schema validation of input Meaning if the input data does not conform to what you expect opa can fail without even evaluating a policy configuration options just today if you do a typo in your config that That might be like silently ignored or so. We will be better on on failing early on in that cases Open the test runner will be better will be improved to to actually today We'll mostly say like this test fail and you'll have to figure out why We'll try and improve on that. So we'll actually say this is what you expected and this is what you got We're also looking at some built-in or Language addition to better deal with undefined values inside of rules exactly what that will look like remains to be seen Rule level tracing is another common request You have the decision log which will tell you what what opa decided But it can be hard to try and figure out like how did it get to that decision? So that is something that rule level tracing could help it And the lipses operator to help with things like pattern matching I think that's kind of would be a Hard to explain exactly what how that would look like but do check the opa issue board There's an issue to to that describes that Finally, we're also looking at dependency management or package management or what you want to call it to see like how Can we improve the situation where where you can actually share code and work with libraries and so on? She just mentioned of course like as with any road map. This is a best guess and We'll see what happens But this is these are some of the things we we intend to work on if if there's something you Feel strongly about let us know finally Just two cool projects in our and the ecosystem right now There's a project from Microsoft. It's a regarus, which is basically opa written in rusts I Think they have most things covered There's still a few built-in functions or so that are that might not work is expected But it's a very cool project. So so do check it out check that out. There's also bindings to a lot of other languages There's regal. I don't know if you all use it already. It's a linter for for regal, which is also written in regal itself But that's a cool project. I have been Involved in that myself. So I guess I'm a bit biased and that's it for for me Alright, so I'm gonna talk a little bit about gatekeeper and I'm gonna structure this very similarly to the first section here So those of you that are already avid gatekeeper users, there's gonna be some repeat information in here So just bear with me and we'll get to the the upcoming features towards the end of this So what is gatekeeper at a high level? It is a policy controller for Kubernetes powered by open policy agent That's the backing engine. I don't know how many folks in this room have ever tried to write Kubernetes validating admission webhook. I have it really didn't go well. It's a challenging thing and That's not to say that I'm super great at writing regal either But one of the other features that tackles that is there is a robust policy library that accompanies gatekeeper So you can find that at our docs So some of the core features that we really target with this project first one being admission control So you don't have to write those webhooks anymore We find that like a lot of folks so like in an ideal world, right? All of your workloads are running on Kubernetes there's no other form of compute that you have to think about whatsoever and I think for me at least I know that's not the reality And so having a system that we can do admission control with in our Kubernetes clusters That also ties in to you know opa in our other systems You know utilizes that same ecosystem is a huge benefit to users So what does admission control look like this is one of the CRDs the the core primitives that we use in this And this is a constraint template. This is a cluster scoped object and this Defines what the policy is so you can see we've got the properties there which are the the input and The bottom that's the regal That was shown a little earlier So this is where the policy definition lives. I kind of like to think of this as like an interface From there we go to an actual constraint And if the constraint template is the interface, this is the implementation of the interface This is a namespace scoped object And this is what defines how we want to apply and enforce that policy So in this case, we're saying that we want namespace objects to be subject to that policy that we previously saw another large feature here is mutations, so Just admission control based on the object that's trying to enter the Kubernetes cluster that that's one facet of it But occasionally you will want to change Some shape of the data on this object We see this with things like wanting to have a specific label Applied to pods entering a cluster. Maybe what team owns it or something like that So that's where the various mutation operators come into play. They can enact those changes on objects This is the assigned metadata Mutator and this does exactly what I alluded to on the previous slide. This adds an owner annotation to all pods Or matching the name nginx-dash you get it and Lastly, this is one That we saw a huge gap in when looking at other policy systems initially was Auditing so and this is a gap that you know validating admission web hooks themselves leave. You know What the shape of things coming into your cluster looks like but Having stood up a few kubernetes platforms in my day. I I know that folks don't always start with policy Usually it's standing up the cluster getting the initial compute going and then there's then there's those early workloads And then policy occurs somewhere down the line there How do you have information about all the workloads that are currently? Running within the cluster and that's where auditing comes in it will run through and make sure That's all of the resources in the cluster are compliant with the policies that you have in place So here's what results from auditing look like there's a few different ways you can consume these There are some Prometheus metrics that are exposed if you are interested in audit violation counts at a high level And then beyond that the audit violations are also Added to the status object of the constraint that the object was in violation of This approach has introduced some challenges to those so a little preview to some upcoming features So now that you've got the very basics down, we're gonna talk about what's new So in the past when we were putting all of those audit violations on the the status of the constraints Folks that run large clusters with a lot of policies and like I know that these days There are a lot of companies and teams running huge multi-tenant clusters You start to run into at CD limits and performance implications with that So one of the newer features That launched is pub sub support for these audit violations So currently the the driver that supported is dapper backed by Redis And this is what the structure of that data is going to look like and this way It's a lot more possible to tie into whatever other systems that you're utilizing if you're just subscribing to a topic On on Redis You can follow up, you know that information with whatever additional automation you you have in place Beyond that is integration with the new native kubernetes object validating admission policy So I don't remember which kubernetes released this this one alpha in But it's a newer native object for admission and This way, you know, we've gotten some questions on like Why supports this this native object, you know, isn't gatekeeper kind of a competing Paradigm to this and not really at all. I think this provides folks the opportunity to work with Cell language or common expression language that that Google helped develop If if you want to use that instead of rego, this provides you that opportunity This will just create those underlying Admission objects that are native to kubernetes Another thing where this comes in handy is gatekeeper has the concept of like external data So you can use gatekeeper to consult other systems if you need outside data to make a decision And being able to then tie in that outside data with validating admission policies is something you wouldn't be able to do strictly with the native object So this is a very new feature We will have an alpha of it in our next gatekeeper release But the the initial PR has been merged on the main branch if anyone wants to go look at it There's a demo in in the main repo right now with instructions on how to run that if you'd like to see things actually working QR code there and a little bit li link check that out It's there was a lot of work that went into this over the last six months And I think hopefully folks are excited to try it out. I think, you know, we on the team were we're really excited about it And then this is the the new new one that is not actually merged at all yet This is very much a work in progress and we're hoping to have an alpha in our next release But as always time will tell We heard some feedback for folks that were using gatekeeper and writing policies that they wanted a little bit more granular control over how the policies are enforced and I think the The request that actually kicked this discussion off was What if I want to exclude one constraint from the audit process? I don't feel the need to audit this constraint and Looking at that it opened the door to a lot more granular control over how these policies are enforced So going forward we have this feature called enforcement actions And it is going to allow you to define how each action how each constraint is enforced at various entry points like the the webhook or the gator CLI And each you will be able to take different actions per point Hopefully this example here you can see we've got a deny on this constraint Which is only enforced at the webhook and the gator level so there wouldn't be any kind of auditing associated with this constraint and This is not a demo But if you want to see a very detailed design doc of how this is going to work go ahead and take a look here I started this doc about six months ago, and then our engineering team came and made it not bad So I think it's in a pretty good space right now There's really good detail on how the feature is going to work and what backwards compatibility is going to look like And if you want to get involved and I add this one particularly because we are interested in having folks drop by the community Meeting and providing feedback on the enforcement actions design We are early enough along in that process that we would love to hear input on if this would be useful feature to you all and if the current API that we have designed actually Fits what you would expect from something like this so we have a gatekeeper channel on the opus slack and then On the github repo you can find information on when our community meetings take place They're also on the CNCF calendar That's what we got Yeah, if there are any questions or if not, yeah, there are questions good. I just want to say like after If you'd rather come and talk with us later, we're moving the opa Kiosk in the in the project Pavillion so we'll stick around there for a for an hour or so after this So come talk to us there, too. Yeah, so a brief question about mutations So do you see any improvement in that area? I'm actually asking specifically because you meant that the constraints will be better targeted But maybe there's also some room improvement regarding the mutations So let's say I want only to run mutations on create events to set some defaults But we have some some issues with immutable fields. So they will trigger during updates, but never work Sure. Yeah, I I don't know of any specific improvements plan to to mutations The mutators at this point, but definitely would love to have you open an issue on that one on the repo and We can talk about it with the team. I think there's definitely a use case there Thanks. Sorry, I also don't know where you are because the light is like right in my eyes So if someone asks a question and I'm looking in a completely different direction like it's nothing personal, I swear Any other questions? Hi Maybe a slightly basic question given that I'm a new beta opa, but can you give an example of a An implementation how this would be worked in let's say to traffic with it and Kubernetes cluster So there's a couple of different options. I can think of about interception and rerouting into a PA But can you give a typical example in what works well? Sure, so So, yeah, there's a there's a couple of ways you can integrate opa if we're talking about something like Offer station or app level offer station the normal deployment pattern Kubernetes would probably be as a sidecar So opa runs on the same Hosts or in the same pod as your application meaning when you query opa, it's like it's a request to local host essentially making it fast a low latency and In some cases you would modify your application to somehow intercept the request and send it over to opa and then deal with the response there's also The option to put a proxy in front of your application maybe a service mesh like and more you still Where where those would be responsible for calling opa and deal with the result and that is That would be preferable if you don't if you don't want or can't touch the code of your applications Perfect. Thank you. All right. I think that's it and if you have any other questions, just come find us in the opa kiosk After