 Thank you for coming to our talk today. My name is Jay Pipes. For the purposes of this particular session, I'll be playing the role of Bob the Builder. And this is Amin Hilali, who's going to be playing the role of Spud the Scarecrow. And we're going to talk a little bit about BeyondCubeBuilder generating Kubernetes custom controller implementations. Go ahead. So we have, because of a variety of needs of this set of Kubernetes controller projects called ACK, we've had to build our own factory for producing Kubernetes controllers. And we've done it over the last couple of years. And we've learned a few lessons over the last couple of years and wanted to share with you some of the tools that we've built to help generate controller implementations and maybe give you some inspiration for how you can implement your own custom Kubernetes controllers. So come on in. There's lots of seats over here. So yeah, go ahead Amin. So just like any other factory, you will need hard hats. And I'm not going to put this on. I already look like too much of a dork. But anyway, we'll be giving these away at the end of this particular session with lots of signatures from Kubernetes community members and all the stickers and stuff like that. So yeah, again, we'll not be wearing my Bob the Builder hat. Like I said, we had to build a factory that produces Kubernetes controllers. So most of you are probably familiar with Kube Builder, which is the upstream project that wraps a bunch of code generation tools and templates and cookie cutter stuff for creating Kubernetes custom controllers. So we have had to go beyond Kube Builder and build a whole bunch of automation and additional code generation tools to deal with our specific problem that we had in the ACK land. But first Amin is going to give a little bit of background on Kubernetes controls. Yeah, so before talking about building a Kubernetes controller factory, let's talk about what is a controller. So as almost all of you know right now, controllers are processes or programs that constantly or actively try to reconcile Kubernetes objects. It's an infinite loop. For example, it's going to go and query what is the desired state, what is the current state, and it's going to make actions to make the changes and move our current state toward the desired state. We have a lot of controllers in Kubernetes. Almost everything in Kubernetes is just controllers. It's a smelting part of controllers. Each one of them is reconciling a different kind of resources. A good example is deployments and endpoints. For example, whenever you create a deployment, you have a deployment controller creating deployment objects and the Rekaset objects and then behind the scenes, you have a lot of other controllers that will go and create your objects, including the pods. We have also job nodes and even name spaces that are created or managed by Kubernetes native controllers. And then you can also build your own custom controllers. For example, you can control your home smart bulbs. You can have a CRD that will have in a spec whether a lamp is turned on or turned off and you can manage that through your Kubernetes cluster. You can also order pizza. Shout out to Michael Helsenblass for creating the pizza controller. So right now there is a controller where you can order pizza using CRDs. Good actually, I'm done. And also you can manage cloud resources. For example, the AWS S3 or Google Cloud Storage or Azure Blob Storage. You can manage those through Kubernetes CRDs. And even you can manage KubeCon CFPs because why use UIs when you have KubeCuttle? It's always better. So if you're interested in learning how to manage CFPs through Kubernetes, please go see the tutorial How to Build Controllers at 4.3, 4.2 TPM. And yeah, next is your toolbox. If you ever want to build your own controllers, you for sure are gonna see some, one of the known tools in the Kubernetes world. One of them is KubeBuilder. KubeBuilder is a tool that helps you in scaffolding your controller project and behind the scenes it uses tools like controller gen and controller tools and other examples like operator framework and controller run time. For example, controller run time is gonna contain a set of tools and libraries that will help you manage resources in your Kubernetes cluster. And for the bravest, all the old fashioned, you can handcraft your own controllers using some libraries like Client Go, Informers, Shared Informers, Work Use. Good luck with that. Fun fact, deployment controller and replica set controller, I think also the stateful set one are all written using these basic components. They are not built using controller run time and in the latest news, I asked one of the people from the SIG app whether they want to move to work controller run time or not. They were not. They will stay in this library. But if you want to build your own, it's better to go to use controller run time because it does a lot of work for you. Okay, so now we know the tools. Let's define the limits or the boundaries between what developers need to do and what the libraries and the tools do for you. For example, the tools and the libraries will help you generate the Go types, the CRDs. It will also help you generate the Go clients if you want to interact with those, with API server to query or modify those objects. It will give you good libraries to handle login and leader election, rate limiting, webhooks, and stuff like that. However, on the other hand, as a developer, you still have to write your own reconciliation logic. You will still have to write the validation and the mutation webhooks. You also have to write unit and end-to-end tests. And from time to time, you will have to fight with your own ProBots because ProBots are not good the best. You'll also have to maintain control images and maintain health shots and documentation. And so you're gonna come back and fight with your own ProBots and do a lot of slash retest and slash LGTM at the end. So as a developer, there's a lot of logic to write and to implement. And yeah, this is just a global overview on what's the difference or what are the boundaries between what the tools do and developers have to do. This is for you? So you just saw the list of things that developers still need to do, even though we've got all this great tooling with KubeBuilder and controller on time and controller gen and all that kind of stuff. In the AWS controllers for Kubernetes project, we had a big problem, right? We had to create a Kubernetes custom controller for each of the AWS services. And the thought of having to hand build and manually maintain and develop and publish artifacts for 200 plus AWS services was not something that we were really keen to do. And so about two years ago, we set about to build this sort of like controller factory with a bunch of code generation tools and automation that we put out in open source. One of the larger problems that we hit when we started to generate controllers that interfaced with individual AWS service APIs were that, well, APIs change, they evolve over time. They're not static, right? And so we needed a way to integrate changes to the upstream service APIs in a smooth and consistent and reliable fashion, right? We realized that we could go ahead and generate a controller, a full controller implementation for something like S3 or RDS. But next month, the RDS team or the S3 team can come out with a new field in a particular resource in their API or a new resource entirely, right? They could also change the behavior of how things happen on the service side and that behavior change would then roll out and cause conflicting damage to all of the controllers that were generated with a past schema or model for that API. And so we had to build a bunch of automation and code generation tools to continually keep the controllers that are generated up to date with those evolving API schemas. So we're gonna, well, Amin is going to give a tour of our little controller factory. The inputs to this sort of like controller factory are the AWS API models, right? The definitions of the APIs. They contain things like descriptions of the operations that each of the service APIs have, you know, for S3 like create bucket, et cetera. And then also descriptions of the shapes, right? You can kind of think of, there are these things called the Coral API models in AWS, but you can kind of think about it like an earlier version of Open API schema, right? But it's just sort of like AWS specific. Anyway, these API models and a particular generation or generator.yaml configuration file contain some instructions to our code generation framework and tools to how to generate the controller, how to identify what resources are in the API and that kind of thing. But the end output in our factory is a full controller implementation, right? So if what you think about if you're used to KubeBuilder, what you get when you generate a controller is basically just a stub, right? The reconciler logic and everything, you still have to go and do your thing. You have to implement it. We didn't want to do that for 200 controllers. And so we had to build something that could do that for us, right? So the first step in this controller factory after giving the API schema and the generator.yaml configuration file is finding what are the resources we should be managing. For example, on the right here, you can see that we have a small snippet from the API schema for DynamoDB. There is an operation called create stable. So the first thing we do to find the resources is to find every operation that starts with create, strip the create and keep the rest. That's our resource. So whenever we see create x or y, we know that x and y will be resources. It's as simple as that. For some APIs, you have, I don't know, 20 create operations. We know that we have 20 resources to manage. I'll just interrupt and say, I mean, it's just as simple as that. I wish it was as simple as that. Any of you who are familiar with AWS APIs know that they suffer from a glorious, like idiosyncratic inconsistencies across those service APIs, right? Especially the behavior of update code pass and things like that. Because all AWS services, they're like little startups inside of AWS and so they all kind of do their own little thing. So anyway, it may look simple, but it's actually not. So after inferring that table as a resource, now we need to find what are the fields of this table resource, which fields go to the spec and which fields go to the status. And for that, it is not as easy as go into the create table input and create table output. And whenever we see a field appearing in both of the operations, we know this is spec. Whenever we see a field that's only appearing on the response, we know this is status. For example, table status, it's gonna be on the status because we do not give that field in the input. We only receive it in the output. However, table name, it's gonna be in the spec because we see it on the input and the output at the same time. This is the default way of inferring what should go in the spec and what should go into the status. So after knowing what are the resources and what are the fields, we write all this information into API slash V1 alpha one and you can see here that we have inline 24 table spec, 161 table status and of course, the objects that Kubernetes or QBuilder needs to know that these are Kubernetes objects is table and table list. You can see that we have a objects metadata, a spec and the status. Okay, so now we have the objects, their definitions. We're gonna do a quick round of control agent feed our QBuilder. So as you see here before, we have a few markers, QBuilder, objects and sub-resource. Those comments will be used in control agent to, for example, know how to input the printer columns and the validation patterns and the role scoping. We're also gonna leverage control agent to generate like files like ZZD copy. If you ever written controllers, you know that multiple times during the runtime we need to keep two copy objects and pass them to other functions. So at this stage, we did all this to only have the controller skeleton. There is no reconciliation loop. There is no logic at all. It's just the definition of the objects and this is almost the same level where QBuilder is gonna put you. It's gonna help you only manage those CRDs or like the types, the spec and the status. So this is where the real fun starts. So earlier I showed you a little bit of some snippets about those API model definitions that we have for the AWS service APIs and how we take those model definitions and determine what are the custom resource definitions that that particular controller is going to expose and what are those custom resource definitions look like. What fields are in the spec and the status and all that kind of stuff. We then use controller gen and some of the other like deep copy gen and some of those tools to generate some of the API objects and infrastructure. But now like I mean said, we're at the stage where QBuilder basically has left you to implement your own controller. So that's where the fun starts for us. We need to go ahead and develop a translation layer between the Kubernetes representation of that resource which is the CR, the custom resource and the AWS SDK GO representation of that same resource. So much of our controllers are just translating between the Kubernetes representation of something and the AWS representation of something. As you go ahead and develop your own custom controllers, what a lot of times you'll be thinking is there's a backend integration API and you need to sort of map between representation of resources in that backend API and the Kubernetes API. So that's a lot of what our controllers are is that translation shim between the Kubernetes world and the AWS SDK GO world. We write our controllers in GO and so the AWS SDK GO is the library that we use for communicating with backend AWS services. What I want to show here and it's a little difficult to see, I understand. You can go download the slides afterwards but all of this code that you see is entirely generated. We have a method in our resource manager interface called SDK Find and it essentially maps between the Kubernetes resource and the AWS SDK GO resource. All the code that you see on the right is incredibly tedious code to hand write. It also is extremely error prone if humans are manually custom writing this stuff. It's super annoying to read and that's why we code generate everything that we can. What you see on the right is essentially processing a response, an HTTP response from the AWS SDK GO, finding which field in that response matches the field in the spec or the status of the CR and then setting that value. And it's super annoying to have to hand write all this kind of stuff, trust me, we tried first and yeah, we said no way. So this is all the kind of code that we generate, right? Which is a large percentage of the controllers, right? It's just this mapping, I call it the SDK bind code. You're just binding between SDK resources and custom resources. Another area that we rely heavily on code generation is determining when a resource has changed, right? So much of the reconciliation logic inside any controller is just here's your desired state of a resource. You have to do a fetch to get the latest observed state of that resource and then determine what's changed, right? In between the latest observed state and what you have in your desired state which is coming from the Kubernetes API server. Generating that code that determines which field has changed within a resource is also very tedious and time consuming which is why we generate all this kind of stuff. So yes, you can see all this gobbledygook code that's terrible and hard to read and yes, we generate all this kind of stuff. All right, do you wanna take over this one? So one of the functions we also generate is called manager for. So what you've seen before, the delta function and the SDK kind functions are just methods part of an object we call the resource manager and the resource manager is just a ghost track that helps the controller manage resources in the cloud. So this object AWS resource manager has all these eight or nine methods. It's got instructions on how to read a resource, how to create one, update one, release one and a bunch of other stuff we use to manage the resources. At this stage you're wondering but what is the reconciliation? The logic must be somewhere. Well, it is in a different package where we wrote a generic reconciliation function that takes a resource manager and reconciles that all the time. So it doesn't really care what resource are you going to handle or to manage in the cloud. All it needs is just an interface or like an object's implementing an interface and it's gonna use that to reconcile all the resources. So every resource that we generate a controller for has a resource manager and uses the same reconciled function that we use across all the controllers. And that common runtime, this ACK runtime what it does is makes the reconciliation logic for all resources and all of the controllers consistent. And that logic is, I mean to simplify it a little bit it's like just you get the resource from the Kubernetes API server, you determine whether or not it's a new resource or whether it's being modified or deleted and all that kind of stuff. Then you do a call to get that latest observed state for that resource and if you don't find one then you call create, if you do find it you call update if there's any changes and that kind of thing. So we've just made all of that reconciliation logic the same for all of the resources. Also with that we have a metrics common library that reports metrics to whatever your tool is used to collect metrics. We have also like common error handling library. We have common utility for conditions and we have some common CRDs that we use across all the controllers. For example, whenever you want to adopt a resource you use the same CRD across all the controllers. So yeah, I'll let J-Pypes tell you about some numbers. Sure, so yeah, so we have, right now we've generated 21 controllers that are actually published on our ECR public repository, 11 of which are in, or maybe it's 12 that are in G8 stage. We have quite a few more controllers that are up there in our GitHub repository that are in the process of being built out by AWS service teams. A huge percentage of the code in those controllers is generated, roughly 98% of the code that's in those controllers. Well over a million lines of generated code in just the controllers that we've put together so far. I'd like to add that for those of you who are familiar with the crossplane project, the code generator inside ACK is actually used to generate the native AWS provider inside crossplane. And we made a decision to try as best we can to make these code generation tools and utilities that we created for ACK to be useful, importable and applicable to other projects that are in the sort of controller ecosystem out there. Our goal is to have these resource managers that we sort of showed you some code for earlier to have those be go packages that can be then imported by crossplane or Terraform, right? That all they do is just handle appropriately in a very declarative Kubernetes-like way the interfacing between the Kubernetes universe and the AWS universe and translating between those things. So it is our goal not to be like, we're only gonna do AWS stuff or whatever. We want to be a friendly, helpful, like useful toolkit for all of the other controller based systems out there. And yeah, we've got lots of image and chart downloads too. We have fully automated the process of both building the container images as well as the helm charts. So the ECR public is a registry that can house any OCI artifact. So we automatically construct the helm chart for a particular controller and all of the YAML manifests that go along with it and publish that to ECR public along with repositories for all of the controller images, the container images themselves. And also automated all the like OLM, the operator life cycle management stuff and publishing to Operator Hub and all that kind of stuff. We have a test infrastructure repository that contains virtually all of our automation. And so if you're interested in, I don't know, getting some inspiration to automate your own controller building or just want to grab some code that you think will be useful, I have some links at the end of these slides where you can go and see and just grab some inspiration or grab the code. And for those of you wondering what is the 2% left of the non-generated code, the code we wrote, the reason for that is that the AWS APIs or like writing code that generates code that handles a lot of resources across all the AWS services is almost impossible. And sometimes we have to manually write some custom bits of code that will like get injected in some specific places in the SDK or the Delta. For example, let's say you want to compare a map of a struct of arrays of maps. You don't use reflect for that. It's very hard to generate code that really will compare to arrays of structs or complex structures to do that. So we sometimes inject some bits of code, that's the 2% left. Our goal is to maybe reach 100% one day. Yeah, that's the goal, generate everything. Yeah, build things that build things that build things. All right, so we've reached the end of our little presentation here. I want to open the floor up to any questions they might have, sir. Yeah, one thing I noticed that was missing from these slides is any finalizers or child resources. For example, if I wanted to construct a factory like you did. If you want to what? If I want to construct a factory like you did, right? Right. But for my own internal APIs, but I want to have one resource that owns other resources. Could you also do that? You want to open up the runtime and show where the finalizers added? So in that common ACK runtime, as part of the reconciliation logic, right? The reconciliation loop that's common to all the controllers, we, the very first thing we do to put the resource under management is add a finalizer to the custom resource, right? Which essentially protects it from being deleted by anything else until we remove that finalizer, right? What was the other part of your question? Child resources, right. So we have a system called resource references where you can reference a custom resource from another custom resource. And we have this resolve references functionality that will wait to delete or update a parent object until all of its child resources get to one of our conditions is called resource synced. And we can control how the resource manager sets or determines whether a resource is in resource synced by instructing the code generator. Okay, for DynamoDB tables, we know that the resource is synced when the status.status is in active or available. Like I said, they're inconsistent. But yes, we can control that behavior of how the conditions are handled from one controller to the next with this generator.yaml file. And for the finalizer question, we have an interface called the AWS resource descriptor where you have ease managed, mark managed and mark unmanaged. And behind the scenes, those three functions only manipulates finalizers in your resource. Yeah. Yeah, so the declarative API usually translates to multiple imperative calls on the go SDK side. Yes. So how do we basically enforce that ordering? Like will controller generator take care of it? I wish. So yeah, it's a great question. So I mean, the question is, look, the AWS APIs are not declarative for the most part, right? I mean, it's a set of imperative API calls, right? Create bucket, put life cycle policy, like that kind of stuff. It's not the patch and apply type behavior that we've come to expect from Kubernetes. For many of the update code paths inside these controllers, we can generate most of the code that goes out and calls these individual imperative API calls. But for some of the resources, I'm thinking like S3 bucket, there's 22 separate API calls for updating bucket attributes. So we need to order. Say again? What is the solution there? Do we need to order it? I mean, if you're asking like, what's the solution specifically for AWS service APIs? You kind of got to take them one at a time and figure out what the behavior of the backend services. Some of the services, some of the service APIs, you need to call various update operations in a specific order. And if you don't, it gets mapped. Others you can call in parallel, multiple attribute update APIs. So it just kind of depends on the service API. We work with individual AWS service teams who know their APIs best and get advice from their engineering teams how to handle behavior of their particular service API. Oh, hi, Rob. Hey, Jay. I have a question maybe it'll be dear to your heart. With all the code you're generating, do you generate tests? How do you maintain tests? Great question. And how do you ensure the generator doesn't change the behavior? Great question. So right now we have a sort of half and half system. We have this thing called controller bootstrap. Which will generate all of our end-to-end tests are written in Python. There's a Python testing framework. And our controller bootstrap code generation project generates the basics of that. But then it's up to the service team and us to go and write specific end-to-end tests for those resources. I would love to get to a point where we can take a look at the API model definitions like we do for the controller and generate end-to-end tests. We're quite a ways away from that. Yes, do you ever plan to remove your generated code? Do you ever plan to remove your generated code and have it all in memory? Should we plan to remove the generated code? Right, if you have it all generated, can you do it without having to save code files? Since if 98% of your code is generated, can you do it all dynamically? Can we do it all by hand? Dynamically. Dynamically. Maybe it's a non-goal, possibly. I mean, if you're asking do I view a future where we're completely code generated? Where we can take an API model definition and a generator configuration file and just let it go? Yeah, I can see that future. Maybe not for the tests right now, but certainly for the controller implementation. What we do is we look for patterns, right? How does the AWS service function in comparison to another one, right? How does RDS and ElastiCache and Amazon MQ out? They have certain patterns of API usage and we look at that. Once we identify a pattern, then we can write it into the code generator to just generate an implementation for that pattern. Some of the APIs are a little harder to do that for. We have a 100% generator controller. Is the step function one? Yeah. They're literally zero custom coordinate because the API is just very simple and straightforward. We probably have time for half a question left and then we can go up. First, thank you. That's really cool for you guys to share this so we can see what you're doing. The question is, since you're talking about centralizing a lot of your code and it looked like 20 controllers so far and bringing it into a library and everything, have you seen a noticeable impact from centralizing that code? Have you noticed any sort of reduction in bugs across all the different controllers and whatnot? Yes. We absolutely have. Some of our controllers actually started out life as hand-built, right? Like the SageMaker operator started out life as the Kubernetes SageMaker operator. Something like that. And a whole class of bugs and user pain and inconsistencies, we just wiped that off, right? When you start code generating everything from a model definition, you're not handwriting thing, you just kind of remove a whole classification of common failures and bugs. So yeah, we've definitely seen an improvement in that. That's really cool. Thank you. Sorry, we do want to give away our Bob the Builder hats. Well, I had a trivia question. First one to raise their hand gets it or at least first one to raise your hand that I can see. Okay, so here's the question. What is the name of the code generation tool that generates the runtime object copy and copy into functions for CRDs? Close. The name of the tool, deep copy gen, is that what you said? Yes, that's it. You can gift it then. All right, and Amin has his question as well. Okay, next question is, from where can you import the shared Informer interface? Yes, very good, client go. All right, thank you very much. Appreciate it.