 Cloud computing is called out in the executive order for cybersecurity and there's a reason for this. Cloud native applications rely on containers and orchestrators such as Kubernetes and these elements represent new attack surfaces. Protecting this application infrastructure requires new methods in order to identify security problems early in the development workflow. In the next session, a GitLab customer shares how they have addressed the challenges of securing cloud native applications. Sandeep Parikh with Google provides detailed resources and how to's for using a combination of templates, constraints and CNCF's open policy agent to automate the application and enforcement of policies within the GitLab workflow. Be sure to bring your questions for Sandeep. Hi folks, thanks for joining. Today we're going to talk about Kubernetes policy enforcement and how we do that using GitLab's CICD workflow. My name is Sandeep and I'm a DevRel engineer with Google Cloud. I spend most of my time helping teams accelerate their software delivery and we do that using the findings from our own DevOps research and assessment program. You can find me most places around the web at CRCS-MNKY, that's Circus Monkey minus the vowels. Now let's start by talking about the basics, starting with policies and policy enforcement. I'll be using some key terms throughout, so let's define those now, starting with policies. Policies are the rules that tell us how we can configure a resource. When using something like Kubernetes, policies can specify things like what labels are allowed on a pod or requiring container images to have specific tags. Now policy management is the mechanism that helps us with the ins and the outs of a policy. So think of this as the framework or the runtime. It basically helps us manage things like external data, packaging, testing, etc. And now the last component is policy enforcement. And this refers to the actions that will be taken along with the scope of those actions. So again, in the context of something like Kubernetes, the actions that we'll take could be things like allowing or denying admission to the cluster. And the scope of those actions will cover the types of objects we're evaluating and the namespaces that we're monitoring those objects within. Now policies are packaged as a set of templates and constraints and we'll talk about why there are two objects a little bit later. Policy management is handled by Open Policy Agent, which is the umbrella project that covers a lot of these elements. And finally, policy enforcement is accomplished using Gatekeeper, which is a sub-project of Open Policy Agent. You can think of Gatekeeper basically as packaging up Open Policy Agent and delivering it as a custom Kubernetes admission controller. Now we'll review each of those three things in depth, but we'll do them a little bit out of order. So first we're going to start with policy management and that's done with Open Policy Agent, which is a part of the CNCF. And it's a general-purpose policy agent, which means it can be integrated into a variety of different applications and services as well as platforms. And it can be used for a whole range of use cases. Now some of the use cases we see relatively often are things like enforcing authorization controls or watching out for data filtering. And as we'll see shortly, admission control. OPA uses a high-level language called REGO to define its policies. And now when I think of REGO, I like to sort of say it's pretty easy to do simple things and very possible to do complicated things. And with OPA, we get to design and implement these policies separate from applications and infrastructure with a very loosely coupled approach. And this is really beneficial, especially as you've got app teams that want to maintain their own velocity, infrastructure teams that want to do the same thing, and DevOps and SecOps teams that want to make sure, again, they can iterate through policy without having to worry about the other two teams. And these policies can also bring in external data. In the case of Kubernetes, this means being able to compare against existing cluster objects when a new object is coming in. And we'll talk through that as an example a little bit later. So the next component is policy enforcement using Gatekeeper. Now as I mentioned, Gatekeeper is a sub-project of Open Policy Agent, and it effectively wraps OPA up and deploys it as a Kubernetes operator and a custom admission controller. The admission controller reads incoming objects and decides whether to approve or reject admission. Gatekeeper also provides the necessary bits so that you can package OPA policies as Kubernetes custom resources. And these resources are packaged as two parts, constraints and constraint templates. Now those constraint templates can be parameterized and deployed like you would any standard Kubernetes object using standard Kubernetes tools. So let's dig into the templates a little bit. Now after installing Gatekeeper, Kubernetes now understands constraint templates. Now these are the templates that wrap OPA policies. And if you look at the example constraint template we have here, we include the rego policy object underneath, and the top half is just the metadata. So think of a constraint template as this entire object, and it's got two components, right? The actual sort of policy rego code at the bottom and that metadata at the top. And that metadata tells us how we'll actually use the constraint template later on. Now the kind field highlighted shows how Gatekeeper constraint templates extend the Kubernetes infrastructure to support new custom resources. So when we write a constraint, we'll use this kind as the type of our new constraint object. And that constraint will let us enforce this policy, which kind of brings me back to my earlier point. Why do we have two objects, constraints and constraint templates? Well, the reason is, is because it lets us separate the rule, which is captured in the template from the enforcement of that rule, which is captured in a constraint that we'll see later on. And this keeps it so that we're not repeating ourselves. If we have a similar template, we want enforced a bunch of different ways, let's say across different namespaces or across different objects. So we get this kind of balance and reuse with this two object mechanism. Now with constraints, you can see the one we've got up here on the slide. In the top half, we have the constraint that matches our previous template, right? I think it's Kate's block node port. And on the bottom half, we have a sample resource to test against. Now constraints can be scoped to objects and or namespaces, and we can customize the enforcement as well. In this case, we've scoped this constraint only to services, but that's scoping could be expanded to any API group or to any matching labels for that matter. Now again, in this case, we've elected to deny admission. So now if that service object below tries to enter the cluster, it will not be admitted at all. It'll get denied right at the door because it violates our policy of disallowing any services with a spec type of node port. But then enforcement can also be set to simply admit objects and audit the violation or just provide an immediate warning at runtime. It really depends on how you want to enforce the policy violation and frankly the severity of that violation, right? For some potentially optional things, it may be okay just to simply audit them or provide that warning. But for really serious violations, you want to avoid completely deny them right at the door. Now going back to our mission control example, if an object that violates a gatekeeper policy attempts to enter the cluster, it will be denied admission, right? That's the way we set this one up. But then this is super important that denial only happens when the object is deployed. But what's so bad about that, right? Well, imagine you have a Kubernetes cluster and you use GitOps to manage cluster infrastructure along with apps and services. If you have a Kubernetes object that violates any policies and it tries to sync to the cluster, again, whether it's a GitOps controller or just kind of a GitOps CD approach, that violating object won't sync to the cluster. And depending on your GitOps approach, it may keep trying and failing to sync that object. Or worse, it might do something unexpected depending on how it handles admission failures. But that's not even the worst part. The person responsible for the violation might be totally out of the loop because this is all happening at runtime or deploy time. And herein lies the sort of crux of the problem. Policy enforcement is great, right? It's great at preventing sort of terrible imperative operations and runtime things. But it needs to happen much earlier as well or else the feedback loop is just way too long. And that feedback loop being too long may also end up leading to sort of unknown or unexpected cluster state, which is something we all want to avoid. So now we know that we can't just do it at runtime. We've got to bring that policy enforcement action earlier. So let's talk about the workflows where we can start to do this process a little bit sooner. Now, this really depends a lot on how your teams develop and deliver software. But in most cases, you're either pushing commits directly to a repo or you're using merge requests to review and accept changes into a repo. In either case, we can use those actions, pushes or MRs as the step where we validate changes against existing policies. Okay, good. We know when to validate. That's first step. Now, how do we actually do the validation? Well, it starts with tools. There are two tools that allow us to run gatekeeper validation against arbitrary resources, complete with policy inputs. The first is kept. It lets you build workflows that work with configuration as data. Now kept includes gatekeeper validation functionality out of the box. Kept is also very opinionated, very prescriptive and is really focused on resource configuration as the primary artifact, not templates or other domain specific languages. So if you think about that, I think one where we're kept really shines as working with sort of hydrated manifests, right? Not things that are before a customized pipeline. I'll admit, Kept's mental model can sound a little bit more complex than it really is. Think of it as a simple workflow mechanism to process resource configuration. So a bunch of files kept can sort of operate across all of them. That's one tool. The next is Conf test. Conf test is part of OPA and it lets you run policy validations in a much more imperative and CLI oriented manner. But Conf test is actually a little bit general purpose as well because it lets you validate against all manner of languages and config data. So you can use Conf test to do things like validate Terraform or Docker files and other types of files as well. So let's do a first quick example using Conf test with GitLab CI. In this example on a code push, we can run during this test stage on a standard GitLab repo deployment. And what we can do is have the push tested against the existing policy objects. Now, so this is from the OPA team. So we know this Conf test image is going to be updated on a regular basis, which is great. And we can also configure this to happen on merge requests only instead of all pushes. It really just depends. But we're kind of encapsulating the approach here using this OPA Docker image and this kind of basic Conf test approach. Now with kept things are a little bit different because again, it's more of a general purpose kind of workflow tool. But the first thing you'll do is you need to take all the application manifests and concatenate them along with all the constraints in the templates to create this one sort of giant pile of resource configuration. Then you take that whole pile of stuff and you run it through the gatekeeper validate kept function. And that lets you find any policy violations that are in that pile of YAML that you just passed through. So that's one approach. The other one that I'm not showing here is one where you could actually use kept to create an entirely custom workflow and then package and export that workflow. And those custom workflows kept makes it really easy to export them into the GitLab CI format. So it's very easy to integrate into the GitLab CI approach. Now remember when we talked about incorporating the validation workflow kind of at the repo level. Well, which repo right because many teams will have many repos. There may be infrastructure repos for base infrastructure that do things like infrastructure as code. All the application repos may exist as separate elements. There's no right or wrong approach here, right? There's a bunch of different ways you can do this. But when we we tend to see policies treated as similar to infrastructure objects and so they'll end up in things like infrastructure repos. In that kind of setup, there'd be a built-in validation workflow, whether it's triggered by a push or merge request. And that validation workflow would accomplish the same steps we had in the previous slides, either using Conf test or kept. And that infrastructure repo is typically going to be synced to your Kubernetes clusters using your favorite GitOps approach, right? Whether it's a controller and cluster or a CD approach directly from the repo. But now if we've got application repos and infrastructure repos, what happens? How do we kind of make sure those two things sort of stay in sync? So application repos, we all also see synced to clusters directly, possibly using a different GitOps approach. Again, CD or controller. And in this case, you want the same validation workflow to run at the app repo level. But the first step in that workflow is going to be to clone the policies from the infrastructure repo into that validation workflow. And then run through the normal validation based on the new stuff that the app repo contains in it. Now with the policies in the cluster, we're protected against violations from imperative coop CTL activities, like we mentioned earlier. And using these two workflows with infra and app repos, we're protected against any violations getting committed to synced repos. So we've taken something that happens much later at runtime or deploy time. And we've moved further up into basically the testing phase or the validation phase. So it's happening a lot sooner. So we're basically doing the same thing twice that we're really covering our bases. And that's what's important. Now, how do you get started with all of these tools and with policy enforcement? So the Rego playground is a great place to start. In fact, I used it quite a bit when I developed a bunch of the policies I was working on. It's free, it's easy to use, and it lets you test and debug real easily. And this is nice because it's a really low barrier to entry, right? You don't have to install anything. You can just start working right away. There's nothing to install or set up. But if you need complex inputs or something that looks a lot more like in cluster testing, it's only going to get you so far. The typical next step I see, and this is what I've been doing is integrating OPA into my favorite IDE. In my case, I'm using VS code with this OPA extension. Now, depending on your IDE and the plugin you find, you may have support for syntax and queries and coverage. But this approach generally gives you a little bit more control and flexibility. And because it's part of your normal workflow with the tools you're already using, it can be a little bit faster again to iterate through. And if you're writing policies on a regular, this is probably where you're going to end up anyway. But don't forget, and I can't stress this enough, don't forget to test against Gatekeeper itself. You know, I know that Conf test and Gatekeeper validate that functionality from Conf test and from kept, you know, they're basically the same thing as Gatekeeper, but they're not exactly the same. And you know, there's always something different when you test in the similar environment to what's deployed, right? When you test with Gatekeeper in a Kubernetes cluster, you really get a feel for managing the constraint templates and the constraints and checking status and enforcement and the scope, along with a view of which objects have been found in violation. And the nice thing about testing with Gatekeeper itself is, you can test out different combinations of namespace or label matching and for catching policy violating objects. It really is the right way to kind of finish up your workflow. Don't sleep on testing against Gatekeeper itself. Now, one of the things that helped me the most was to learn from what the community had already built. And I find that examples helped me get started really, really quickly. Now, the ones that I use were from the OPA folks. Specifically, they have some really simple examples in the Gatekeeper repo, and these are owned and delivered by the Gatekeeper engineering team. But beyond that, there are many more examples in the Gatekeeper library repo, which is community out. And now there's some overlap with Gatekeeper examples and the Gatekeeper library, but there's also a number of other useful policies and a whole host of pod security policies re-implemented as Gatekeeper constraint templates and constraints. And I've got a few other resources you can dig into as well. Now, most of my work using Gatekeeper has really focused on evaluating Istio objects for policy violations. In that first repo, I've got examples that check for things like MTLS and port naming. And then the second repo, I focused on a few more sort of enterprise-centric use cases where I was looking for things like fine-grained authorization controls or rules around service-to-service peer authentication. So I would take a look at both these, and we'll make sure we get these links to y'all after the sessions are over as well. Now, we covered a lot of material in a really short amount of time, but there are just a few quick tidbits I want to leave y'all with. Complexity is really a big part of this. Now, be aware, there is some complexity around getting this policy stuff just right. A big part of that is scoping, right? You always want to make sure that you're focusing on the right objects, the right namespaces, and the right labels. If you don't get that part right, you potentially overburden the Gatekeeper controller by making it evaluate too many objects. And it's also important to be able to distinguish between things like failing open versus failing closed. For example, if you fail open, that means in the face of a policy violation, it's still more important that the resource be admitted to the cluster. Now, depending on what your policies cover, this could be okay. You could just throw a warning or an audit violation instead of an outright denial or rejection. And as you'd imagine, failing closed is the opposite. This could be for when you have policies where any violation needs to be a total rejection. And in that case, it may be that cluster safety is more important than anything else. And you want to make sure also that the resources you need to evaluate are synchronized to the Gatekeeper namespace. For example, if you have a policy against a service name collision, you'll need to make sure you're replicating service metadata to Gatekeeper. That's an important step so you can compare those things. And finally, don't hesitate to set very specific and fine-grained RBAC controls for constraints and constraint templates, because you really only want the right personnel having the ability to create, update, or delete policy objects. Thanks again for joining the session. Please find me on Twitter if you have any questions, comments, or concerns. Thank you.