 Hello, everyone. Welcome to Design Patterns for Extensible Scalable Kubernetes Extensions. I'm Rita Jain. I'm a software engineer at Microsoft. And I'm Max Smyth, a software engineer at Google. We are both maintainers of the Gatekeeper project, and today we wanted to take a look at some of the design patterns underlying Gatekeeper and how they allowed us to create an extensible scalable Kubernetes extension. Because there are a lot of topics to cover, this talk will mostly cover high-level concepts. But first, what is Gatekeeper? Gatekeeper is a customizable Kubernetes admission web that helps enforce policies and strengthen governance. Let's take a look at the way it does this before we dig into how the pieces work. Gatekeeper supports both audit time and admission time checks, but today we're going to focus on admission time, specifically serving the Kubernetes admission web hook. This diagram is a high-level model of how Gatekeeper serves admission requests. When something makes a request to the API server, so for example, a user making a cube cuddle request, the API server receives that request and sends an admission request to Gatekeeper's validating web hook. It's asking Gatekeeper, hey, should I allow this request to proceed? And Gatekeeper must answer either yes or no. And to do this, Gatekeeper consults its policy configuration, which is the large gray box at the bottom of the slide. The green objects that are labeled constraints tell the web hook what checks an admin wants to perform, and constraints rely on templates, which are the blue documents, to tell the system exactly how to perform the check. For example, they might say labels live in metadata.labels, so that's where you should validate labels. The notions of Gatekeeper, the web hook, constraints and templates underlie Gatekeeper's philosophy that policy is a team effort. Separating these roles into discrete areas allows for specialization where work in each area benefits the others, increasing its scale. For example, a change to Gatekeeper's platform has the potential of increasing the utility of all templates across the board with zero extra work on the template author's part. We rely on duck typing to give us this separation of concerns, so let's take a brief look at duck typing, how it helps, and some of the patterns we found helpful in our implementation. Duck typing has been covered in previous cube cons, so we won't get into the details. But generally speaking, if it walks like a duck and talks like a duck, a system should treat it like a duck. This is similar to polymorphism in object-oriented programming. Our ducks are constraints. Our goal for Ammons is that they're able to define different types of constraints that share universal behavior. For instance, all constraints can use label selectors. We do this by creating a parent class for the constraint that implements universal behaviors. Template authors are able to create subclasses by injecting their enforcement logic and its function signature by declaring a constraint template resource. The constraint template is combined with this universal behavior to create a new constraint kind. How does this look inside Gatekeeper's code? When Gatekeeper receives an admission request from the API server, the first thing it does is find the matching constraint using the match criteria as defined by the admin. It then passes off execution to the template logic backing each constraint, which tells Gatekeeper whether the constraint was violated or not. Gatekeeper then aggregates the results and uses enforcement actions on the constraint to tell the API server how to proceed. Now, constraint templates are CRDs and they create constraints, which are also CRDs. Really, we have created a system where CRDs can create CRDs. There are a few challenges with this paradigm. First, we need to have a generic controller to handle constraints. We wrote ours using unstructured resources. Wherever we needed to have strongly typed sub-schemas for things like status, we serialized the JSON subtree we're interested in and then deserialized it into a strongly typed go-length struct. We also need a way to reliably merge partial JSON schemas for the template arguments and the universal behavior into a fully realized constraint schema. Keeping the two different sub-schemas like the template schema in a special route avoids the possibility of collisions when there's overlap between the two schemas. The most difficult problem is handling dynamic watches. Originally, we did this by creating a sub-manager that would restart every time the set of watch resources changed. There were two big problems with this approach. The first was that we had inefficient memory usage because the controller run times watch cache was duplicated across the two managers. And a larger problem for users was that we needed to use finalizers so that we could catch delete events that may have been missed because the sub-manager was restarting. And we've since been able to move on from this specific model because Oran Shamrun wrote a dynamic watcher that allows us to change the set of watch resources without restarts, cache duplication, or of course finalizers. Let's zoom in a bit and discuss how we interact with this dynamic watch manager. Daykeeper has two sets of dynamic watches, one for constraints and the other to sync arbitrary data. For each one of these, we have a main controller and a dynamic controller. The main controller is responsible for telling the dynamic controller what to watch. The dynamic controller is a generic controller that can understand a duct type resource. We have two main controllers. One watches constraint templates. It tells the watch manager which constraint kinds are available to watch. The other watch, the other one watches the config resource as that's where users tell gatekeeper that they want to sync those resources. There is a potential problem here. What if two main controllers are watching the same resource and one stops? How do we prevent one dynamic watch from interfering with another? We developed a register pattern to provide isolation. Here's an example of how a main controller uses the registrar. First, it would request a new registrar from the watch manager by providing the controller name and a channel by which watch events will be sent to the dynamic controller. When the main controller wants to add or remove a group version kind from the dynamic controller, it simply calls add watch or remove watch. Each registrar is namespace to the controller and is capable of adding or removing an intent to watch a GVK or replacing the set of watch GVKs altogether. The watch manager can then take the union of all the intents across all registers to figure out what resources to watch. By adding a layer of indirection and namespacing intent, we have made it significantly easier to write multiple dynamic controllers that watch potentially overlapping sets of resources. Now, we've touched a lot on meta topics, right? Controllers that control controllers, CRDs that create CRDs, ducks that walk like constraints. We could take this process to the logical conclusion, right? We probably should. We should go full meta. So let's take a look at the policy enforcement as a phenomenon, right? It follows a pretty standard pattern. Generally, it's just looking at an object and returning it, yeah, that looks good or no, this is not good, right? And is there any reason that this must be done as a Kubernetes admission controller, right? Do the resources even have to be Kubernetes resources? This is probably not true if you walk one rung up the abstraction ladder and duck type the decision process itself, right? So let's see what that might look like. So here we have the constraint framework. The constraint framework is the library that underlies Gatekeeper. It coordinates all of the dock typing logic we've covered so far. It provides the execution flow Gatekeeper uses to render a decision to the API server. It also provides two abstractions that allow us to define constraint templates and constraints, enforcement points and targets. There are a few critical behaviors the constraint slash template abstraction relies on. Some kind of match criteria schema and logic, enforcement actions to tell the system what to do when a constraint is unhappy, and an interface on which the enforcement logic can rely. Here we have the target. A target abstracts the notion of a platform. What do objects look like? How are policies bound to them? What request metadata do I have? Targets give us a match criteria schema and logic, such as what a label selector looks like and how to test if a label selector matches. Targets also provide constraint template authors with the information they need to evaluate a request. Like what does the object I'm validating look like? What kind of request metadata do I have? Such as requesting user. An enforcement point is the system that asks for a policy check and knows what to do with any violations. Gatekeeper's web hook is an example of an enforcement point. Gatekeeper's audit process is another enforcement point. Putting all of this together gives us a model for abstracting policy enforcement itself. We can use the higher order abstractions of the constraint framework, targets, and enforcement points to take the notion of constraints and templates to other venues. For example, we have gatekeeper, of course, which uses the constraint framework to serve validating web hooks and for audit. CAPT provides a Docker image that can be used to validate Kubernetes configurations at rest or as part of a CI CD pipeline. A cloud config validator wraps the constraint framework and a target that understands Google Cloud. It creates a library that has been used for a few things. It can validate GCP resources as part of a for SETI server deployment. It can validate GCP resources or snapshots at rest using CFD scorecard. Also, it can validate Terraform plans via a project called Terraform validator. Abstracting constraints and templates into a library has made it faster to bring them to other platforms and other policy enforcement points, which allows devs to give users a consistent experience and helps users execute the same policy in many places. So duck typing has allowed us to bring Kate's style policies outside of the Kubernetes cluster, expanding the potential for cool user experiences like rejecting bad commits to a CI CD pipeline at the pre-submit stage, which provides defense and depth. All right. Now we're going to take a 90-degree turn and talk a bit about Gatekeeper's infrastructure. Let's talk about Gatekeeper as a web controller. So what do we mean by web controller? Well, Gatekeeper is both a web hook and a controller at the same time. And these two things are usually very different beasts, right? When we think about web hooks, well, web hooks' main job is to serve requests. That means they need to be responsive and therefore generally intolerant of downtime. And web hooks also scale their availability and serving capacity by increasing the number of serving pods. And each of these pods are peers, which means there's no leader or follower relationship. And everything is a flat hierarchy where every pod is equally able to serve any given request. Controllers, on the other hand, observe and reconcile resources and come together to create an eventually consistent system. Because controllers are background processes, they are a little more downtime tolerant because the system only needs to converge in a reasonable amount of time. And controllers are generally singletons. They can use leader election, but that doesn't really add capacity. What we do there is extra pods and leader election usually approve availability by having hot standbys that can take over if the leader becomes unavailable. Now, because GeekKeeper is a web hook that serves results based off of observed resources, it is a little bit web hook and a little bit controller. Therefore, web controller. Because of this tension between how web hooks and controllers usually scale, it seems like they may be incompatible models. This apparent conflict can be resolved by observing that idempotent controller processes don't need to be singletons. If more than one controller is watching the same resource, and they both agree on the end state, the first controller to write the output will win the right. Other controllers will either have not yet processed that resource or will have their rights rejected. On retry, those controllers will see the correct state and will not attempt to reconcile further. This is similar to how Kubernetes leader election works. This can lead to some extra traffic, but only when controllers need to write a change. So, with this observation, we can create web controllers that leverage what we call leaderless horizontal scalability. In this model, a web controller scales horizontally like a normal web hook. Controllers are idempotent, and they use this first write wins model that we just talked about. Each pod manages its own internal cache of constraints, templates, and data, which makes every pod appear of every other pod. There are some limitations this model imposes on us. We need to be sure that non-idempotent operations like audit run in a separate singleton pod. We also should avoid scaling write contention quadratically with the number of pods, which can happen if each pod needs to write its own pod specific state. One thing we need to be sure about also is that our controllers are side effect free, as there's no guarantee that side effects are idempotent. That was easy. Multiple pots on. Now what? I think, are we done? Can we go home? Not quite. People probably want to know the status of their policies, whether they're enforced or not. This is hard because Kubernetes is eventually consistent. So multiple pods means multiple possible enforcers, and policy is only as strong as its weakest link. If we have three webhook pods to enforcing a new policy, one not, and an API server that's going to choose a webhook pod randomly, the policy only really has a 66% chance of being enforced. And in order to reason about whether a constraint is enforced, therefore, we need to know whether it is recognized by all pods. So to do this, we implemented a bipod status sub-resource. This resource tells us which pods have ingested a given resource, and what roles, like audit or webhook, those pods perform. We also track the observed generation of the resource to make sure each pod is enforcing the most current generation. The UID of the object to detect if we are actually seeing the status of a deleted object that hasn't been recreated, and any errors a pod may have encountered ingesting the resource. We now have more information about the state of the system, but how do we interpret it? Pessimistically, of course. So if a pod's entry is missing, we assume that that pod has not yet ingested the resource. If a resource's deletion time stamp is set, we assume that pods have already processed that delete. And if the number of pods reporting status is equal to the number of pods serving the webhook, well, then we can assume that all pods have ingested and are enforcing that particular constraint. So in order to make these observations more meaningful, we need to enforce some invariance in our project. One, if we expect to have n pods, we must never have n plus one pods. Otherwise, counting n observations leaves the possibility that there is still one pod that has not observed a new resource. Two, pods cannot serve until they have bootstrapped all resources present during startup. Otherwise, a new pod would put the assumption that n observations means n enforcing pods into question. Lastly, we must design our resources such that a missing resource has a known impact on the system. In this case, a missing constraint means that policy is enforced more loosely than it would otherwise be. We should also know here that writing referential constraints, which rely on cache data potentially violates this principle. For one thing, whether data has been cached is unreported. For another, it's impossible to know the significance of missing data without knowing the specifics of a constraint logic. The use cases for cache data are too valuable to ignore, but it's worth calling out that such templates necessarily have imperfect enforcement at the webhook level. There are two potential problems with reporting by pod status, which are write amplification and the possibility of zombie status. Write amplification can occur when all pods want to write status at the same time. Because of Kubernetes' optimistic concurrency, each pod may try to write the status at the same time, and one will win leaving the other pods to retry, which means that in the worst case, there will be n squared total write request where n is equal to the number of pods. Zombie status can occur if removed pods don't clean up their status, and so you'll start seeing that old status and it will just be there perpetually. Gatekeeper solves these problems with a layer of indirection. We create a separate constraint pod status resource, which is where there's one of them per unique pod slash constraint tuple, and each pod will write to its own constraint pod status resource. There's then a Singleton controller called the status controller that aggregates these statuses and writes them to the constraints. This model lowers the worst case scaling of write requests from quadratic to linear, and because each constraint pod status is owned by the corresponding pod, we can rely on Kubernetes garbage collection to clean up any zombie data that's left by old pods. And that's how we implement a web controller, all the way from scaling webhook pods horizontally to managing how they report status. Let's take a look at how the reliability and performance of the system may scale with the number of pods. If we assume one serving pod is sufficient to serve all inbound traffic and that each pod fails independently of every other pod, the probability of downtime decreases exponentially with the number of running pods. On the other hand, because we only consider a constraint as being enforced when all pods have observed it, we lengthen the mean time to enforcement as reported by the system as a whole. Unfortunately, without knowing the exact distribution of ingestion times, it's hard to say by how much. To sum up, we have covered design patterns that have helped us create the web controller model, which include dock typing and various patterns we found helpful there, along with infrastructure and interface development that helps us both serve and reason about our system. So hopefully some of these patterns will be useful to you in your own projects, and if they do solve an issue for you, we would definitely love to hear about it. And with that, thank you for attending this session. We just want to thank the Gatekeeper community, all the users, all the contributors for your feedback and all the future requests to make Gatekeeper the project it is today. And we also want to extend our thank you to the Cube Builder controller runtime community for all the awesome work that has boosted up the Gatekeeper project. And last but not least, our wonderful audience for attending this session. Thank you guys very much.