 Hello, and welcome to my session, cluster API as code. My name is David McKay, and I am a CNCF ambassador and senior developer advocate for Equinix Metal. And I am from Glasgow in sunny, sunny Scotland. I have a miniature of animals in my office. I have chinchillas, daegas, a feather and a dog. I'm also a prolific live streamer, streaming multiple times per week, trying to cover all of the cloud native landscape technologies. Hopefully providing learning materials that we can all learn together. And of course, I'm also sporting my wonderful COVID hair. You're welcome. And I am definitely a member of the Pen Apple Pizza Club. So let's get started. So what is cluster API project? Well, it is a sub-project of Kubernetes that aims to commoditize the creation and provisioning of new Kubernetes clusters, provided you have a Kubernetes cluster. The cluster API provides a declarative API, just like we're used to with Kubernetes, for managing our Kubernetes clusters. Its responsibilities are to create and provision Kubernetes clusters that will handle upgrades and day-to-day operations including remediation when required. It is currently in an alpha state, which means it's volatile and changing quickly. But it doesn't mean you can't use it in production today. And it is an extremely collaborative project. With members from AWS, DigitalOSN, Google Cloud, Equinix Metal, and more, almost every cloud provider has a cluster API provider for spending up that next Kubernetes cluster. OK, so let's cover some basic vocabulary that you need to get started with the cluster API. Firstly, the documentation will make mention of a management cluster. This can be an existing Kubernetes cluster that you have available within your organization. Or something as simple as Docker for Mac, many-cube or kind. This cluster will run the controllers for creating subsequent or target workload clusters. The documentation will also mention infrastructure providers. The cluster API aims to work across all major cloud providers. The provider could be Google, Amazon, or Equinix Metal. Hmm, perfect. And then there's the workload cluster. The workload cluster is the cluster that we're going to have the cluster API create and provision for us. Then we have machines. These could be virtual machines or VMs on AWS, GCP, or bare metal instances on Equinix Metal. We then have the concept of a machine deployment. This is a high-level request for the cluster API to create one or more machines. The machine deployment is responsible for maintaining and creating machine sets. As we make changes to our machine deployments, new machine sets are created. The machine sets are then responsible for creating the machines. We also have machine health checks to make sure that the clusters and the nodes that we provide are always healthy. This should feel extremely familiar. It's very similar to the resources that we work with day in and day out on Kubernetes, deployments, replica sets, pods, and probes. OK, now that that stuff's out of the way, let's take a very brief look at how you get started with the cluster API today. I'm going to quickly run through the Getting Started guide from the cluster API book. If you haven't checked out the book, I recommend doing so as soon as possible. It has everything you need to know about getting started with the cluster API. So what we have here is a very simple just file. Just files that can make file put better. It has two different targets. The first in it, this will prepare our management cluster for creating new Kubernetes clusters. Next, there's a create cluster target. This one's a little bit more for both, but this will generate the gamma required for us to apply to our management cluster in order to allow the controller to create a new workload cluster. We specify the Kubernetes version that we want, how many control plane nodes we want, and how many worker nodes we want. You don't have to worry about having a single control plane node or a highly available control plane node. Cluster API will take care of that for you. And in order for these just file targets to work, we just need a little bit of environment information to configure it along the way. Of course, we need to provide an API key for the cloud provider of choice. Because I'm using Equinix Metal, I also have to provide which project to create the devices in. And I also have to tell it which operating system to use for each of the nodes, as well as the facility to launch them in. We can configure the pod sider and service sider for our cluster. And we can tell it which instant types to use, individually for the control plane node and for any worker node pools that we have. To get started, I can run just in it. This will speak to my management cluster, which is Docker for Mac. It will install all of the custom resources to that cluster, get the controllers running, and pass in the packet API keys I need for deploying a new clusters on Equinix Metal. It's called packet just because packet renamed to Equinix Metal last year, and the provider is a little bit behind. It's not the quickest, but it is pulling a number of images to the management cluster. However, that is now available for me to generate my target cluster. Cluster control uses templates that are baked into each provider's repository, to try and guess what you need for your target cluster. And in fact, we can pop this open in VS Code and take a look. Now, inside of our cluster channel, you will see a Kube ADM control plane. This is a custom resource that tells cluster API how to get a control plane node on your provider of choice. Here we see a whole bunch of shell commands that are required to configure the host and install all of the Kubernetes component. We then have a machine template that tells us what our control plane nodes look like. We then have a cluster CRD that contains each of the podcider, servicider, and other infrastructure parts that we need, followed by a provider-specific cluster that just tells it the facility and project IDs. We then have machine deployments, more machine templates, more Kube ADM configs. And there we are, 206 lines of wonderful, beautiful YAML. Okay, let's get inserted. What's the problem? Seems simple enough, right? Unfortunately, YAML is not the best programming language for complex logic. And we seem to have got ourselves into a bit of a pickle in the cloud-native ecosystem when it comes to YAML. In fact, there are memes all over the internet that how to be a DevOps engineer, to be a platform engineer, to be a Kubernetes developer is to be a YAML developer. And while they're funny, sometimes incorrect, and a whole bunch of other things, does identify real problems that we have, that we've gotten ourselves into. We've got Helm, we've got Carvel, we've got Capitan, we have Customize. The help is try and wrangle this YAML into something that is much more fluent to work with, that allows us to handle complex logic and loops and templating. And I think it's safe to say, if you maintain any Helm charts, I certainly do. It can be overwhelming because those Helm charts, which are trying to set out a predefined way or a best practice way for deploying a piece of software, doesn't fit everyone's use case. People want to do their own thing. And we layer and layer on more loops, more conditionals, more loops and more conditional, trying to provide enough flexibility for the end user while saving them time. But at the expense of everyone's time, and there has to be a better way. So I reached out to my colleague, Jason DeSiberis, this happens to be a contributor and maintainer of various cluster API projects. And I won't read his words verbatim, feel free to pause and do that on your own accord. But I think the message from Jason is clear. The cluster API's responsibility is the reconciliation, creation, provision and operability of the target clusters. The cluster API's responsibility is not to provide, you know, ergonomic tooling to generate the channel to describe the clusters. And as we've seen through the QuickStar, while it does offer some simple mechanics to generate that getting started YAML, the minute you need to be able to make your own tweaks, things get a little bit complicated. You have to start copying and pasting your own YAML to get the kind of cluster topology that you would want or expect. Yeah, fine. It gives you a control plane cluster highly available. But it only generates one single worker pool. And for most production clusters, you're probably gonna want more, potentially within different availability zones within a single region, with a variety of instance types, some desk heavy, some memory heavy, some CPU heavy, that really are tailored to your workload. We're seeing the same confusion and complexity that we see across the broader Kubernetes ecosystem for deployments and templating now starting to come to the cluster API. This is that not all clusters are the same. And as an end-user, I wanna describe a cluster that works for me. And for that, we need to look beyond the cluster API to other tools. Okay, let's talk about solutions. If we have to look beyond the cluster API for tooling to allow us to define the clusters that we need and we want for our day-to-day life to be better, then we have to look at other toolings. Then we have to pick the right toolings. And I am a firm believer in infrastructure. That's code, not a jammer, that's code. We're using real high-level languages that allow us to provide the abstractions that we need to make our jobs almost enjoyable. And that means that I want something that is flexible. I wanna be able to define a cluster that is the size I want, that uses the node pools I want, uses the instant types I want, that has the operating system that I wanna use because maybe there's other things on there that I wanna take advantage of. I think more importantly, it also has to be composable. I think that we are in a stage of Kubernetes awareness and adoption where we know that having one giant monolithic Kubernetes cluster is not the answer. We actually want lots of smaller Kubernetes clusters and our tooling has to evolve to help us provide that and describe that. So when I say composable, I wanna be able to define what a node pool looks like for machine learning workloads and reuse that over and over again. I wanna be able to define what a node pool looks like for databases, for stateful workloads, where disk IOPS are critical and security across the board. And it has to be ergonomic. I have to enjoy using it. Coding doesn't have to be a chore. We provide abstractions and libraries to make our lives easier. I wanna make future me life easier. And there is a way. So the solution I'm gonna show today is based on a project called Pallui, which hopefully you're aware of. It is an infrastructure it's good to. It allows you as a developer to make your own choices. You wanna rate your infrastructure as good in Python, Go, JavaScript, TypeScript, .NET. These are all choices you get to make. Lumia allows you and provides an SDK to work with a language that you're more comfortable with. And what we have here is an example of using the TypeScript SDK for Azure to spin up an app service, which has a handler and anonymous function. My business logic can go there. And then just four lines of code to describe how to deploy that to the Azure ecosystem. These are the ergonomics that we're talking about. Less than 10 lines of code, easy to read, easy to reason, easy to change. It's very explicit. It does only what I need to. There's another project from the Pallui Corporation that allows us to generate these types from the custom resource definitions that are widely available in the Kubernetes ecosystem. So I'm not confined to V1 or apps V1, so they're already broadly known and understood, but to any custom resource that is online that has an open API specification as part of its resource. Pallui can generate the types and allow us to consume them using the same ergonomics that we're becoming accustomed to. As I said, Pallui supports the languages that you're already familiar with. I'm a big fan of using TypeScript. I think it works really well for infrastructure as code with its strongly typed but dynamic nature. However, this is cloud native. Go is an option. As long as you don't mind checking for errors every 13 seconds. Python and .NET are also available and more languages I've heard are coming soon. And there's one secret thing that launched very recently from the Pallui Corporation. And as of that, they have enabled cross-language SDK runtime capability, meaning I like TypeScript, I publish libraries in TypeScript, you let go. You want to write your infrastructure as code and go. You can now consume TypeScript libraries on Go as I can consume the Type... the Go library from TypeScript. Wow. So for the rest of this talk, we're going to be moving into the live demo where I will show you the library that I've built and how it allows you to define Kubernetes clusters with great ergonomics, composability, and extremely flexible. Okay, let's take a look. So what I have here is a brand new Pallui project inside of my KubeCon directory. It's configured for Kubernetes and TypeScript, which means that it comes with the Pallui SDK, Kubernetes SDK, and some Kubernetes helpers. In my own namespace, I have access to all of the cluster API raw sources that have already generated and published. The first thing I'm going to do is just pull in the cluster API generic package. From here, we can open our index.TypeScript. And we want to import Star as Cappy from our new library. And one of the more difficult things to automate with the cluster API is that initial infrastructure provision. Because the only way to do it is through a cluster control init dash dash infrastructure packet. It then requires a secret to exist in the environment that tells the controllers how to speak to the cloud provider. Cluster control does not provide a way to generate the YAML and store that for a get-ups fashion. However, the library I've provided does ship with helpers to initialize your provider in your management cluster. So the first thing we want to do is initialize the cluster API. And we can do that through a call to init. Now, one of the really nice things about working with TypeScript is that you don't need to know the SDKs up front because the language-sever protocols are all really, really good. And in fact, I can just type Cappy. And already I can see the API space that is available for me to use. Now, I'm not quite ready to create a cluster. I'm just going to do init. And I don't know what parameters this function take. But I open my parameters and I can see, oh, the init function takes the cluster API config. Hmm, do we know what a cluster API config is? Well, not yet. But we can expand that and use our autocomplete. And we can see, oh, it wants to know whether we should install Serap Manager. And if so, which version? It also wants to know whether we want to enable any of the feature do. And finally, it needs a Kubernetes provider. So let's just quickly run through this one by one. When we did the cluster control init earlier, we did see that it installed Serap Manager. In fact, it's required. So I'm just going to say true. Yeah, why not install Serap Manager? What version? Well, I already know that 1.1.0 is available. So I'm just going to drop that street in. But you could find that from the cluster they have from the Serap Manager GitHub repository. Next, oh, feature gates. Well, we can hover over and we know that it expects a list of something called a feature gate. So let's just pop open our list and click on feature gate. And we can see there's two features available for us to enable. We don't need to know what our machine pool is yet, but I am going to enable cluster resource set. Cluster resource sets are a way for you to define manifest to deploy to your new target clusters or workload clusters when they become available. And can I get up? And finally, we need a Kubernetes provider. I'm going to leave this blank for just a moment. And that's it. That will enable the cluster API controllers on my management cluster. However, I have not provided the dash dash infrastructure provider yet. Let's do that next. So we're going to pull in one more import. We have to go to our package.json and add one more package. This time I'm going to add the packet provider. We can run a Yarn install and that'll just take a moment. Okay, now we can come back and complete this. We're going to pull in the packet provider. Again, we don't know the API up front. But we can say packetCappy equals cap. And again, we have the ability to create a control plane or initialize the management cluster. Just like before, it needs a config. Okay, so let's take care of that API key first. One of the really great things that we use in Pulumi for infrastructure is code is inbuilt secret management. That means I can just say blank strength for now, jump up to the top of my file and import the Pulumi library. From here, I can create a new config object and we'll call this the configMetal, which is a new Pulumi config with a prefix of equinexMetal. I can then say here configMetal.require secret of pocket. That's it. What's actually happening here is that our Pulumi stack is configured with a secret provider. That can be a cloud KMS from AWS, GCP, or any more else. Or it can be a password protected local provider. I can open the Pulumi production environment and we see our prefix of equinexMetal and we see a key of offToken. And here we have an encrypted version of our API token, which is all that has been consumed here. Next, we need a cluster API. What is that? Well, we can hover over and see, but it's just looking for some providers that come from our CAPI init. Our CAPI init actually returns a set of manifest. So from here, we can say cluster API, now cluster API. And finally, there's a Kubernetes provider. We ignore that above, but we're going to take care of it now. So all we need to do is create a new Kubernetes provider. For that, we import the Kubernetes packet. And we can create a new, okay, .provider. We can see here, but it needs a name. We'll just call this local. Now we can add our Kubernetes provider to both of our init. And that is it. Now we can run this Bellumi program to provision cluster API controllers to a Kubernetes cluster. So of course, I won't make you take my word for it. Let's actually run this. We'll pop over here, jump to the terminal, and we're going to run Bellumi up. It's going to work out from those functions that we've provided, which resource has to be created within the management cluster. You can see there's a fair few. We can hit yes and give it just a few moments and it will begin applying these to the cluster. Now, one of the things that Bellumi does is try to ensure that what we apply will actually become healthy. That means it's probably going to sit there for quite a while while it waits to make sure that the deployments and the images that build, the pods are passing all the probes, the services have endpoints, all of those little bits and pieces that normally we just allow the Kubernetes reconciliation loop to take care of. So I'm just going to click control C a couple of times and I'm going to run get pods all, maybe. There we go. And we'll see we have cert manager and we have our CAPI controllers and we even have the packet controller manager not quite ready yet, but getting there. And I've control C that because I've not fussed about it let it finish it. I want to show you an alternative approach that may benefit your workflow. Okay, so let's jump back into our code. And where we created our Kubernetes provider, we're going to make a few small tweaks. We're going to open up the arguments and we're going to see that we have a bunch of things that we can tweak with the Kubernetes provider. For one, it doesn't have to use whichever context is available to me in the environment. I can provide my own Kube config. I can specify namespace that I want to deploy to or I can render YAML. We can say render YAML. Now we can come to the command line and run pull me up. And instead of reaching out to that cluster, checking which resources exist, waiting for health checks and services and add end points and liveness probes to pass, we can just click yes and allow pull me to write a bunch of YAML to a directory that we can apply whenever we want. Just that simple. And that's it. We can pop open VS code. We'll see this brand new render YAML directory. We can see the CRDs that need to be applied to the cluster, as well as all of the manifests with all of the resources that have to exist. And that's that all. It's just the beginning. As well as installing cluster API and the providers to our cluster, we can add and layer on more abstractions to make our lives easier. You can see that there's already the ability to create a control plane on an Equinix Metal cluster. That has the configuration option that an Equinix Metal customer would come to expect, the project ID, the number of replicas, the machine types, the facility and the image. And of course, it's not Equinix Metal specific. And in fact, we can take a look at CAPD, which is the DigitalOcean provider, again, using just a simple init command and a config. We can provide the access token, the cluster API manifest and the Kubernetes provider just like we did earlier, to make DigitalOcean Kubernetes clusters available to us on our management cluster. There's no reason we can't support AWS, GCP and all the other major cloud providers. It just requires a little bit of effort to put together these abstractions. Okay, let's head back to the slides. So what's coming next? Well, CDK is a really, really cool project. It's very similar to Pildome in that it exposes a multi-language or runtime for defining and creating Kubernetes resources. Generating the types is actually a lot easier with CDK than it is in Pildome. It doesn't require an external command, but there are some trade-offs from both approaches. As we see them with Pildome, we can use Pildome's runtime to speak to the Kubernetes cluster and do the apply. This allows us to take advantage of Pildome's in-built secrets management. This protects our access tokens and API keys from being rendered to YAML and available even infamerately in some location. And CDK doesn't have that, just as Pildome doesn't have it when we render to YAML too. So it could be a good idea to install and edit the cluster API and the providers via Pildome, but then use CDK and Pildome's abstractions to generate the YAML to define each unique cluster. That's up to you. And I'd like to see more use of Open Policy Agent for testing the clusters that we are creating. The Pildome SDK that I've provided applies some basic tests to ensure that all of the feature gates are interpolated before they hit your management cluster. But there's a lot more that we can do there too. And I'd like to see more providers. I'd like types generated for AWS and for GCP and maybe even Oracle Cloud. Who knows? And I think what we really need is more abstraction, more helper functions to create unique cluster configurations for machine learning, for computer intensive workloads, for stateful workloads, more abstractions and higher level functions that can make all of our lives easier. One of the really good things about cluster API is cluster resource sets. The ability in the management cluster to say when we have a new healthy cluster created, we should go and apply a set of resources. These are a little finicky to work with, because you have to actually encapsulate the deployments, the services, the config maps, the secrets for those individual workloads into a config map itself. And this is exactly where abstractions like this can make our lives easier. So that's my top. I hope I've given you some food for thought. Not only does the tooling that I've shown today for cluster API exist and can better our lives, but it's not confined to cluster API either. There's no reason we can't use CDK to simplify across our entire deployment surface for Kubernetes. But there's still a lot to be done. A lot of challenges to solve and a lot of things we can make simpler. I can't wait to see what you do with it. Best of luck. Have a great day.