 Hello, welcome. This is customizing OPA for a perfect fit authorization sidecar. I'm Patrick Yeast, OPA maintainer. You can always find me on the Open Policy Agent Slack. My username is just Patrick Yeast. Nothing exciting. And it's my Twitter handle. Feel free to tweet at me if you have questions or want to get in touch. So let's start this off with a little bit of like base knowledge. Hopefully everyone here knows about the Open Policy Agent, knows what it is, otherwise the session will be very confusing for you. If you don't pause now, go look at the URL, come back when you're ready, just to make sure we're all on the same page. General Purpose Policy Engine. Most people use this as a library or an HP server. The use case that we're kind of talking about today a little bit is more in the sidecar model where you have your application, requests come into it and then running on the same host whether that's like, you know, like literally like a Linux host. It's another service or like Kubernetes. It's just another container in your pod. You have like OPA running. The application makes a, you know, local host call through the HTTP API to OPA. Comes back, you know, usually it's something like, is it allowed or give me reasons why it's denied or something like that. Good news is for that model, OPA is perfect. Usually that's okay, fine. Except when it isn't, there's a lot of like requests that come in for like the open maintainers for custom built-in functions or like new functionality that's not there in the language. So something that we hear about from time to time is like, I have this authorization use case, I'm using these Jots and I need some special behavior to get my keys. They're, you know, using their own kind of like homegrown system or for like the demo I'm going to show here, I'm using Auth0, but you know, it needs, the problem is that they've got like, they've got their keys somewhere, it's remote. They can't just hard code it in their policy. They rotate it, whatever. I need just the right error handling caching for my special system. So the solution obviously is just added to OPA. You know, this is like, it's a problem I have. It's a problem other people probably have. So let's make it part of the language. That's great. You can file an issue. We have a lot though and you know, the problem is that you're going to find some pushback usually for like, we don't necessarily want OPA to, you know, be a kitchen sink, have everything. It's got to be like balanced with the amount of overhead for like pulling in additional libraries. How many people are going to use it? Who's going to maintain it? So unfortunately you'll see a lot of pushback on issues for new extensions to the language for built-ins, especially if it's somebody's, you know, like special service they have and it's like, you know, it's sort of a one-off thing. So what's the answer? Obviously, you know, it's great. I'm going to go make my own OPA with my own built-ins. So the behavior that we want for this demo, the example use case that we're going to talk about is that I want to be able to fetch keys from a remote URL. We're going to look at the open ID configuration. It's a well-known path. Pull it down. Check and see if there's a key right specified in there. If there is, we're going to go download it. Otherwise we'll fall back to trying this other, just like well-known keyset URL. If you guys are familiar with OPA and writing policies, you're probably like, hey, Patrick, that's done. You can just use HTTP sent. Two things. First of all, it's a demo. Don't read into it too much. Second thing is that this is a legitimate problem for error handling with HTTP sent. It's not always so easy in the regular policy to get exactly the right behavior for the error handling as well as caching and other things. Plus it's a lot of complexity that you're going to put in your regular policy that you may or may not want to actually deal with there. Sometimes it is easier to just write some of this stuff as built-ins with a little bit of a native general purpose programming language versus policy language because frankly fetching something from a URL is not necessarily what a policy language is designed to do. First of all, go read the docs. If you ask me on Slack how to add something to OPA, I will point you into this URL. Most people, though, read it and are like, oh, crap, this looks hard. It's a wall of text that leads you right into a bunch of go code and go code isn't always the easiest to get started with. So now the fear sets in. You're like, wait, I have to extend OPA. This sucks. I'm going to need to make a custom image. I can't just use the OPA one anymore. I have to learn go-ling because I have extra build tools and stuff. Plus I have more code. I have to fork OPA. This sounds like a headache to manage. But in practice, good news, it's actually really easy. There's not that much code. I think it's three lines. I'll show you in a minute to just get the base forked, not even forked, customized OPA entry point. The dependency management is really easy. You can go module. You don't have a separate fork. You don't have to sync your OPA with getHistories and all sorts of stuff. You really just like, when you're ready and you want to update to a different version of OPA, you just go get that version and upgrade it. And as far as the tools and having go-ling and all that and the custom image, it's really easy to make your own. These multi-stage, you use the go-ling image to build and then you put in whatever company's BLAST base image that a security team has decided is OK. And just copy the binary into it. It's really, I'll show an example of one. They're super straightforward. So let's get this started. All you have to do to get your custom version of OPA off the ground, start a new project, get in it, go mod in it, sort of like basic starting point for most go-ling apps. You need a main.go and the actual implementation of this, all you have to do is call that command.rootcommand.execute. That's the main entry point that OPA uses for its own CLI. This will give you access to basically everything that OPA can do and it lets you then control any other stuff that you want to do on top of OPA. So let's take a second and go look at some code. I have a project started. This is that main function we're just looking at. There's not a lot to this. I'll show you the actual usage in a minute. Dockerfile, I promised I'd show you guys this too. All you need to do is pick a go-ling version that you're comfortable with. I use 115, it's the latest. As of recording this, it's the latest stable, I think. But it doesn't really matter. OPA is pretty compatible with most versions that are relatively modern. And then you just go build your OPA in this case I'm making an OPA++ binary. And then pick whatever base image you want and copy OPA in. And this will behave just like the regular OPA images. You can slide it in, use it with code management, whatever, it doesn't really matter. It'll just work. So back to our main.go. Let me get the terminal up and prove that this does in fact work. So we've got a project here. I'm just going to go run main.go. It's doing a little compilation step here. So it's, yeah, there we go. It takes a second when you run with like go run versus like building and running the binary. But this should look familiar. This is the OPA like command entry point. You've got everything here. So I want to run like OPA as a server. I do run dash s. And you know, there we go. So this is our customized OPA off the ground running ready to go. If you need to control the OPA version, you have the go module. So if you're not familiar with these, essentially in here, you have a requirement for dependency on OPA. Whenever you want to change which OPA version you're using, just update this and you'll pull in the latest code to get all the changes, features, et cetera. So that's really pretty easy to manage. It's not much harder than, you know, setting the OPA version and like your YAML manifest somewhere. So we've got our base version. You might be wondering, hold on, where's my built in? How do I do that? There's three steps. So the first thing I have to do is decide on a signature. This is for compile time for OPA to be able to do like type checking, ensuring that like nobody has conflicts that define some other like rule that has like the name that conflicts or something like that. As well as for the runtime, we have to have the implementation of our function that you know, have code that actually does what we want to do for the built in. The last piece is pretty crucial. So you have to register the built in using one of these functions that I'm showing you on the right. You have to do this before you start OPA. Like in our main function, we do that that execute on the entry point. So we need to do is register and then call the entry point execute. So let's go look at that. A little more code. We go, I'm going to cheat and jump to a version that already has this implemented. Save us all a little time. Okay, so I may not go has been updated to now have a built in start register function. This thing is a function that I added spiltons package is new. It's part of the example code that I'm showing. I like this kind of pattern basically just to minimize what's in the main function. We don't want to do too much in here. You, there's no reason that you couldn't have your entire built in everything in this one file, but it becomes a little unwieldy over time. So I've got another built in package where I'm going to put the built in and the implementation for it. Our built in package, we have the one function here. This is the one we're calling in main. The call here is the one that we saw on the slide a minute ago. We're calling built in one, the one just means there's one argument. There's a handful of them or a dynamic one if you have, you know, like a variable list of arguments. The declaration here. This is that that signature that I was talking about. So in here, we're going to define what the function is called. So this is the name. This is like literally the string inside of the, the rego code that, that will, you know, like be the function. In this case, I'm picking custom dot fetch jwks the actual like signature portion of this is a little bit cryptic. Take a look at the, the OPA like AST package documentation. And there's a ton of examples in there for all the built-ins in the language. If you get confused on how to do one of the more kind of complicated signatures, there's lots of examples in there. This one's pretty straightforward. We're saying that there's arguments of a, you know, a single string type and a return of a single string type. The last piece for this registration is pointing to the implementation. Implementation is a function. Not too complicated. You get it built in context. This thing's pretty cool. I'll talk about it in a second and your arguments. So in this case, our argument single one, we translate it from the, the AST, like the, the rego type to a just a regular going string for our implementation. We're going to have like effectively a base URL. We're going to prepend some well known paths to it. Like, literally, like, you know, that well known in the, in the URLs, just noticed a bug in this one. Look at that fixing bugs as we go doing it live. So we have a base URL argument. We're going to be returning back a single AST term. So this is going to be our key. We're going to return it back as a string. The built-in context is pretty cool. So make sure that you take advantage of this. All the built-ins get called with this context on it. There's a cache that lets you cache values. So you don't have to recompute things as much. The nice thing here being that, like, if in some policy evaluation, this, this function gets executed or evaluated more than once. You don't have to do the work for it each time. You can have a consistent result every time. Don't worry about if you had like, maybe errors or something like that. So what we're going to do first thing is just check and see if this, you know, we already have this in the cache. If we do, then we're going to just return back whatever was cached. Don't worry about it. Like assume it was valid. If we don't find something in the cache, we're going to go off and fetch the code, fetch the key. I'm going to skip over some of this code. I don't think it's super interesting. Basically, we're just looking for like an open ID config. We'll check. We'll go and like fetch that. If it's there, we try to parse out the URI. If it's not, you know, if we, if any of that fails, we fall back to this just like default, well-known JNVKS thing. All of this is sort of a hypothetical, you know, workflow. It doesn't really, you know, like usually you kind of have one of the other or something. But, you know, for the sake of example, that's what we're going to do. The last part here is that we're going to fetch the URI for the key. Whether that's the one that we found out of the config or just the one that we're guessing and then return back the value as a string. One thing here is another bug in the code. So we need to add it back to our context cache. So I defined earlier and our string. So when we do this, what we're going to have is a new string term, which we're going to save in this in the cache. That way we can just blindly return it later. And then we're going to return this. This is going to be the string of the key, the actual value that comes back out of the function in Rego when we use this. And then the last bit here is if any of this fails, we're going to fall through. You'll notice there's a lot of like Eric's NIL. So we're only like happy path, like checking our way through this stuff. So there's any errors we're going to fall through and return back undefined. Undefined being that there's no error and then there's a NIL value for the AST term result. This makes it really easy in the like policies that are calling this code to do like air handling and sort of reason about like when you do like a rule for like validate that the key was okay if part of that is fetching the key. Then if it comes back as undefined, that rule is going to be undefined. You know, your allow step presumably is like calling to get this key that will also be undefined. So like it should bubble up in kind of a pretty like sane way. It's easy to reason about how they recommend using this as like your kind of default pattern. Okay, so anyways, got this implemented. Let's go look and see. When we run this on the terminal, we're going to run this one in the the repel just for example purposes. And what we'll be able to see is that our function is recognized. And it knows about the syntax of it. So if we try to pass in like 123, it's telling us, Hey, you give us a number but I was expecting a string and I want to return back a string like this is invalid. You can't do that. So like, there we go, we verified that our signature is being registered correctly. Now let's try and fetch with an actual string the URL that will work. And sure enough, we get back something that looks good. This is like the key for my my dev hot zero account here. Feel free to steal this URL play with it. Leave it up for a while. But there's nothing secret in there. It's just a demo one. So yeah, so there we go. So our built in is working. We've now extended the language. We have this new custom function. We're all set. Now, let's say we have a new requirement. We have decision vlogs for open people like it, but they don't always like having to have like a special server for it or something. So, you know, a lot of times we'll have some other cool info. Let's say hypothetically, you've got Kafka running. You've got a sweet production cluster. You know, you'd much prefer to just dump all your decision logs into it. So yeah, let's do it. Why not? What we're going to do is make a new plugin. So open plugins are pretty straightforward. You need to define a plugin name and a struct. You implement the plugin interface with your struct. And then you have to have a factory that instantiates your plugin. None of these are too complicated. Start, stop reconfigured exactly what they sound like. The factory validate is essentially you're given like a raw chunk of the OPA config. And it's up to you to parse it and decide if it was valid for your plugin or not. So it makes it really flexible. You can have as complicated config as you want without having to like do any sort of like config registration or anything like that with OPA. And then the last piece here is actually registering your plugin. Same pattern as the built-in function. You have to do this before actually like executing the OPA entry point. So let's go look. I have some code set up for this should be easy. We will jump ahead to a version that actually has the plugin. Save us all a little time. So notice here, I've got a new package called plugins. I've got some changes in the main dot go plugins register using the same pattern as I did with the built-ins where I want to try and minimize main code. So let's go look in our new package. In here we have a register. I split out into a separate package for the logger. So we have a plugin name defined for it. And a factory to instantiate it. The plugin name is important. Let me go look in the logger here. So the plugin name for us right now is Kafka logger. This thing is actually kind of like you almost think of it as part of like the API for your plugin or something because it's going to be the string that shows up in the configuration. And it's going to be as far as like other plugins ability to find your plugin and use it. You have to configure them and like like this thing. It's more than just an arbitrary string. You know, you should pick something and not change it. It's like our migration implications if over time you change the plugin name for something. So anyways, like I said before, we've got our plugin name. We have our actual struct. This thing's got a manager reference. You're going to want to hold on to that. You get it as part of the factory initialization step. We have a mutex that we use to protect our config during reconfigure. And then we have our config, which in this case for the Kafka plugin, we're going to have a host and a topic. So this is just telling it where it should publish information. The factory, nothing too exciting. It returns a new instance of the logger. The only thing that's maybe of interest here are the plugin statuses. So the plugin manager has API calls on it to set the plugin status. Again, the plugin name gets used everywhere. So you're going to want to use this as extensively as you can in your plugin to give accurate information back. This is exposed via like the health and status APIs for OPAs. So you want to make sure that if your plugin is in a bad state and you need to get OPAs to stop getting requests for authorization, you have to make sure to set your plugin to an error state and make sure that your health probes and other things like that are configured to look at plugin health. So don't skip these. They're important. They're relatively new. So if you've seen older plugins or you've built an older plugin, these might be a new API for you. I'd recommend going and adding them. The last piece here is actually like being a decision logger. So there's a little bit more to the plugins. To actually be a decision logger, you have to be a plugin, but you also have to implement this log API. For us, it's going to be, you know, you get in a context, you get a event object, which is like the decision. And then we want to basically send that to Kafka. So if we go back to the code, I'm going to jump ahead to a version with this log function implemented to use Kafka. And we go take a second for the ID to catch up. Our plugin struct now includes a pointer to a Kafka producer. We're using just the confluent Kafka go library, nothing to fancy there. When we start up, we're using our config now. We set up config saying that our bootstrap server is going to be that host option that we provide. When we start and stop, we start the producer, we close it on stop, reconfigure, we effectively stop and start it again, setting the states appropriately. And then for log, then, you know, like, as expected, we're going to take the event. We're going to use it's built in like JSON marshaling stuff. This will give us a string that's basically the same payload that you get with the HTTP decision log service. And we're going to just do like produce on our producer with this. We're going to specify the topic that was configured and any partition. So it'll just get, you know, thrown into that topic. Anybody can receive it. And that's pretty much it. The only consideration that you might want to think about when you do these decision loggers is whether you want to do it. You want to do it. So if you do a synchronous one, you make sure that that decision is logged before the decision is like returned back to the client. Right. So somebody says, is this allowed? You have it's sort of up to you and your internal policies. Most people want to have that decision logged before the client gets back. Yes or no. The reason being that if they get back, yes, but they weren't supposed to. But if you lost that log somewhere, you may not know what happened. So it's like a security, I wouldn't say like risk, but you know, typically you're going to want to make this like a synchronous thing. So when we do the produce, we sit here and wait on the delivery channel until we make sure that it worked. And if there was an error, then we go ahead and return that back versus returning back no error. There's an error in the log. The server will actually return like a 500 or something back to the client. It won't, it won't give back a, you know, like okay status with the result. So there we have it. We have our plugin. We should be ready to go. Let's run this thing. The first thing we have to do is configure it. So let me set up a conf.yaml real quick. We have here the decision log plugin configured to use our Kafka plugin. And we have Kafka plugin plugin settings. It's plugins all the way down. The, the topic we're going to set to open the host is going to be local host 1992. I'm just using the like Kafka quick start thing look running locally. So if you if you go Google that, like you follow the instructions there, you'll have the same setup that I do. And now when we run OPA, I'm going to specify to use that config, use it in debug mode. Run it as a server. We should see this thing up and running. There we go. And now if I make a request to OPA. So this is going to trigger a decision log event. And we can see in our debug log that yeah, it delivered the topic or delivered to the OPA topic. We can verify that over here. I have a script queued up. Here we go. So we're going to read from the, the topic as a consumer and pointed at the same server. And we will see the yes indeed. There it is. There's our decision. So success. We're, we're now logging into Kafka. Let's go back to slides because surprise, surprise. We have another requirement. So we have this thing as a sidecar. We want it to be lightning fast. We don't want to sit here and wait. I was HTTP JSON overhead. You know, like, come on, let's slow. Let's, why not use gRPC? And the answer is like, yeah, why not? Let's do it. So how are we going to add a new API to this? You have everything in yourself. Like, okay, we can have built in function, but like that's weird. That's not going to seem like the right way to do it. Well, the answer is a plugin. Yeah, we have one, but you can have as many as you want. So let's keep doing it. Let me jump to the cube. And I will skip ahead to our base plugin. Now, what you'll notice, I've just added a new directory here. API in our plugins. I'm registering one more same pattern. Same, you know, everything kind of looks about the same. And then in here in our API, I'm defining a protobuf, you know, like API definition here for a authorizer service with an Authz API, it takes in a request that just has a string jot and returns back a boolean yes no. So super streamlined, you know, as minimal as we can get here. Generated the, you know, like boilerplate code that comes from that with the regular doling protobuf stuff. Our plugin looks very similar. We define our plugin name. We have a config struct. This time we only need a like host or like an address to listen on. So this is going to be what where we set our like TCP listener for the gRPC API. We have our server struct, which is our plugin. Same, you know, the usual gangs all here, we've got the plugin manager. We've got our lock, we've got our config. The new parts are the little bit of boilerplate we need for doing the implementation of the authorizer API, as well as a pointer to a gRPC server that we're going to start. So start function is, you know, you probably imagine we start the gRPC server. I'm not going to go too far into this code, but you know, it's all it's all available online. So feel free to dig in, take a look. The important part here is the actual implementation of that API. So we have this gRPC API endpoint for Otzi. The thing we get in is going to have JBT on our request. We're supposed to return back a Boolean. But then like wait, how do we evaluate the policy? The good news is that there's this pretty good like OPA API documentation for how to evaluate policies. So here we go. I jumped the code ahead a little bit. Fast forward. We're going to use the rego package. We're going to do like a prepare for eval and we're going to evaluate our input. You know, we already got that figured out. We're going to expect the JBT as part of our request. So that's, you know, we're just going to wrap that, put it in a map so that open us that deal with it and evaluate. But wait, there's no policies, right? When you do this rego.new, you have to tell it which policies to use and like say you had external data or something in a bundle. Suddenly like you don't have any of that, right? So what do we do? Do we need to like parse bundles in our plugin or something? The answer is no. What you need to do is use the plugin manager. So there's a little snippet here in the docs that I think a lot of people skip over. The manager, let's see here, gives plugins access to engine wide components like storage. So the pro tip main takeaway manager.store. That's that's where the policies are. That's the good stuff. So let's go back and implement that. There we go. Super fast when you cheat and use get check out. Okay. So our Otzi function now. Same thing. We've got our input. There's a little bit of typing to work around, but nothing new. Same query. Now we start a store chat transaction. We use the manager store. When we construct our rego object, we're passing in the store, the transaction and the compiler. So this is a cool step. If you do this and you're using bundles in the background, you won't have to recompile. It'll use the compiler that's already available. It's, it's done like on the background threads. It's not part of like your evaluation, you know, like hot path. So like definitely stick to this. Give it a whirl. Once we have this, we're ready to go. We can evaluate this thing is going to have all the policies that are loaded from bundles from the COI from the, you know, HDP API. If somebody's pushing policies into OPA, you'll still have access to them this way. So the last piece here is that we do need to call the decision logger. What we do, there's a log API very similar to the one that we implemented, but you just have to get a reference to that plugin. So at start, what we do is go look it up. So this goes back to the plugin name being kind of part of that, like API. You use those names to look up plugins on the manager. Again, you know, manager is kind of the key to being able to do this. Once you have that reference to the plugin, we just stash a copy of it on our, our server plugin and use it in evaluation to do the decision logs. So the end result here is that we're going to have a gRPC API that takes in a job. It does the evaluation logs it to Kafka and then returns it back. So let's go ahead and jump back to the slides here. So thank you. All of this is available on GitHub. It all works. I sort of skipped over the demo for time, but feel free to grab the code and run it. It should have no problems. Thank you.