 All right. I just want to welcome you, everybody, back. I literally haven't seen this group for a long time. It has been great to see so many familiar faces. And this community has been growing so much, so we're really proud to be here. We actually had to switch rooms, because so many actually signed up. So welcome. And we'll have a lot to cover. Ivorish, do you want to introduce yourself? Does this work? OK. Hello. Hi. Hi, everyone. I'm Ivoraj. I've been working with the CAPI community for about a year. I work at VMware. Yep. And I'm Vince. I also work at VMware. And I've been with Cluster API since APOC. And this project has seen a lot of iterations. So let's start with what is Cluster API? Does anyone here already know what Cluster API is? I see some maintainers over here raising their hands. All right. So Kubernetes is hard. But managing Kubernetes is even harder. Changing and understanding the Kubernetes APIs, it's a steep learning curve. And you get into a lot of these reactions. It's like, I'm going to make a meme out of it, because that's what I'm going to do. We're trying to solve the cluster lifecycle part of it. So this is the definition that we usually give. So we have Cluster API running on Kubernetes, using Kubernetes itself to run other Kubernetes clusters. Take that in for a second. Now, we want to focus on the creation, configuration, and management. On a wider level, we want to make this whole operation boring. So you know how, like, you just open your browser. You just type a website in, and it just loads. I would like to have the same experience, or close enough, at least, to that experience. But there's a lot of stuff that runs under the hood when you do that in your browser. And Cluster API is really complex as well. We have a few key principles. Extensibility is one of them. We want to make sure that whenever we provide defaults, you can actually swap them. That is one of our key principles. So when we design our APIs, we always adhere to these things. And your version is going to go into a lot of these later on through the demo. The other key principle that we have is to focus on the 80%. What that means is we usually try to collect as many requirements and use cases as possible to really understand what the problem we're solving is. Because without knowing that, we don't know what to do. So whenever we create a new field, we create a new CRD, a customer source definition, and we give it out, we make sure to collect and just spread out to the mailing list, Slack channels, whenever we can to get more and more use cases. So if you haven't done so already, please reach out. So at the end of last year, in October, Cluster API became production ready. This was a major milestone for this team. It took five years to get here. We publish a blog post with CNCF, and I'll go into some of the quotes. But I want to stop here for a second. Our best feature is actually the community that's behind it. There is a huge bunch of folks that are so kind, so welcoming that I'm really proud to be here and to be serving this community. This is the blog post. Please check it out. You just type CNCF, Cluster API, 1.0, and you'll see a bunch of quotes. The reason I'm focusing on this right now is that we have so many end users that use Cluster API in production today. So for example, Twilio has a whole team building on Cluster API. I won't go into all the quotes. They're on the website if you want to do that later. But they run 100 plus production clusters. Giant Swarm, they went all in on Cluster API like a few years back, and they were back in the community as well. Spectro Cloud, heavy user of Cluster API. Towel Systems, they have a custom bootstrap provider for Cluster API. That's where the battery is included, but it's wappable. We have QBDM entry, but they built their own provider. So we literally also are welcoming in our APIs in some ways. New Relic has a whole team dedicated to Cluster API. They're using and building on top of it internally. Red Hat, same thing. They use a bunch of the APIs that we have today, and we expose. Dutch Telecom, it's a huge, huge company that's using Cluster API. They have done a demo a year or so ago on how their deployments. Really complex. It's on our YouTube channel. Please go check it out. It's really, really cool. The US Army software factory is using Cluster API in production today. That's something. The US Army is using Cluster API. I didn't know that until we actually published the one point of release. Azure is deeply, deeply involved. We have the CAP-Z Cluster API Azure provider that focuses on providing a great experience for Cluster API for all their customers. Amazon EKS anywhere is also built on top of Cluster API. D2IQ, Samsung, SDS uses Cluster API in their production clusters. SK Telecom uses Cluster API as well. And they're a hybrid cloud company, and they have like a massive deployment of Cluster API. Mercedes-Benz, you probably have seen the talk, like about the 700 cluster that were moved. We have a few people here, actually, from Mercedes. You've seen probably in the keynote. They're also using Cluster API, and it has been like a great experience. At the end, also VMware is using Cluster API. As you might all know, look, we have a great team focused on building the best we can. There is a few links here, like all the talks about Cluster API. The notes will be published online later, so please check them out. They're really great talks. I think the other one was actually overflowing. So definitely check out the video later. Yuvarash? Hi. So we've seen how good Cluster API is and how the community is growing. So let's take a closer look at how does it work? How does Cluster API work? While it looks like Cluster API is doing magic, let's go under the hood and actually see how that magic happens. So in Cluster API, we have this concept of a management cluster, which is responsible for creating and operating a fleet of workload clusters. So workload clusters is where we will deploy our workloads, like our Kubernetes workloads. And the management cluster can be any Kubernetes cluster, as long as it is conforming. So it could be your local kind cluster. It could be a cluster on any of the managed services, like AKS, EKS, or it could be another Cluster API created cluster. And the important thing to note here is that the same management cluster is creating and operating clusters across clouds and across various providers at the same time. So you can have an AWS cluster, a GCP cluster, an Azure cluster, all managed by a single point, the management cluster. So the users have to only interact with the management cluster, of course, through YAML because Kubernetes. And you can have a fleet of clusters managed by a single point. So let's take a closer look at what is inside the management cluster, what makes up the management cluster. So you heard the word providers a few times right now. So in Cluster API, we have four kinds of providers, the core provider, the infrastructure provider, the control plane provider, and the bootstrap provider. And these providers extend the Kubernetes API using custom resource definitions, commonly referred to as CRDs. And they come with their own CRD definitions and also controllers. So the reconcilers, as you probably already know, are responsible for maintaining the state between what's actually present and what is the desired state. So the core providers, as part of the core providers, we ship some definitions like the cluster definition, the machine deployment definition, the machine set, and machines. The infrastructure provider is the connector in between that is responsible for interacting with the cloud. So for example, if you have an AWS provider, it'll know how to interact with the AWS APIs for us to be able to do cluster API things. And then we have a control plane provider, which is responsible for initializing our control plane, bringing up bootstrapping and all of those things. And then we have a bootstrap provider, which is responsible for bootstrapping other worker nodes into our Kubernetes cluster. So let's take a closer look at how all of these CRDs are working relation with each other so that we get a better idea of how cluster API operates on some of these new types. So this is a rough tree on the dependencies between the objects that we just saw, the CRDs that we just saw. So at the top left, you can see the cluster definition. So that's the kind of the root of the whole idea. And the cluster definition generally holds specification that are environment agnostic. And it is responsible for maintaining the cluster's lifecycle. So environment agnostic configurations like the pod and the service site and the DNS domains, those go into the cluster definition. Some of you might have observed the topology section with a comment saying this is an alpha feature. We'll go more into that in a bit. It's really interesting, so keep that in mind. Next, followed by that is the infrastructure cluster. So examples for infrastructure cluster involve the AWS cluster, the Azure cluster, the GCP cluster. You can have bare metal cluster instances and so on. So these generally contain the specification involved for the specific cloud providers. So for example, the AWS cluster will have configurations on which region you want your cluster to spin up in. What SSH key should it use and so on? Then comes the control plane. The most common control plane, at least the one that we ship with CAPI, is the Cubadium-based control plane. So in the spec, you will see that the Cubadium configuration is part of the spec, like the init configuration, the init configuration, the join configuration, and so on. And in the control plane spec, you will also see that there is a reference to the machine spec which is used to define how the underlying machine for those control planes is supposed to look like. And it also has a replicas so that we can scale up and scale on our control planes and also the Kubernetes version that we care about. Next is the machine deployment. So now that we have our control plane defined, we need something to define our machine deployments, which basically make up our work onwards. So a machine deployment to machines is very similar to the Kubernetes deployment to pods. A deployment in Kubernetes represents a collection of pods that are all kind of similar. And a machine deployment is exactly the same. A machine deployment represents a collection of machines that are exactly similar, that you can scale up, scale down, and you have a version that you can specify. The Kubernetes version that you can specify, which makes up the Kubernetes version of the eventual target workload cluster. So with that, let's check out what's new in the project right now. So as Vins mentioned, we just hit the V1 milestone release. And with that, we shipped our V1 beta 1 APIs, which is huge and special thanks to all the contributors who helped us get to this big milestone. Next, we have a huge UX improvement called cluster class and managed topologies. We'll talk about that in a bit. It is designed to improve how our end user interacts with cluster API. As you just saw, we have to deal with the tree of objects right now to interact with cluster API and manage your target cluster. So with cluster class and managed topologies, that improves. And you don't have to deal with a big surface as before. So we'll take a closer look at that. And we also introduce ignition support for bootstrapping and also flat car OS. Cluster Cuttle, which is our CLI that is used to interact with and bootstrap our cluster API, is now available on Windows. And we also have arm support for it. And our provider community is growing by a lot. These are just some of the new providers that join at the moment. I think we have around 30 plus provider support for cluster API. So that's a huge community right now. Cluster class and managed topologies. So eventually, our goal is to create the tree that we see on the right. But cluster class has this idea that we just want to define the structure of a cluster, the topology of our cluster once, and just reuse it across multiple clusters so that we just have one object called the cluster object in which we'll have a topology section, which can then be used to stamp out clusters that look the same multiple times. So that would look something like this. So we just have a cluster object at that point, which all references the same cluster class. And then by changing the values in the managed topology, you can spin out multiple clusters, which kind of look same, but are different in their own way. Let's take a look at the cluster class and managed topologies demo to better understand it. Let's see. So I have a kind cluster in my local. I have a kind cluster in my local. I'm going to use this as my management cluster. It's already been initialized so that it becomes the management cluster. And as you can see, we have the different kinds of providers already installed on this kind cluster. So we have the core provider. We have a control plane provider. We have a bootstrap provider. And we have two infrastructure providers. I also have two cluster classes installed here right now. So one is called the Docker demo class. And one's called AWS demo class. I know very interesting names. So let's just take a look at the Docker demo class right now. If you look at the Docker demo class, it has a section for how the control plane is supposed to look like. And it just references some control plane templates, like the QBDM control plane template. It has a section for how the underlying infrastructure cluster is supposed to look like. And it has sections on how the worker deployments should look like. And all the templates that are referenced in cluster class are also defined within the same YAML file. So cluster class is just a collection of templates. It doesn't do anything by itself. And now let's take a look at a cluster object that is using this cluster class. So if you look at a cluster object, I have a Docker cluster object here. You can see that it has the topology section that we saw before in which it specifies, OK, I want to use, I want to create a cluster that is based out of the Docker demo cluster class. So the Docker demo cluster class already has information on how the underlying infrastructure is supposed to look like. Like what templates am I supposed to use while creating the underlying infrastructure? So we just have additional configuration on top of it that most users will care about, which is how many replicas of the control plane do I want? What labels do I want to apply on my control plane machines? Similarly, how many replicas of my worker machines do I want? And what labels do I want to apply and stuff like that? And the most important thing, we just have one place where we can define the Kubernetes version. So just define it there. And you can guarantee that cluster API will propagate the version across. And you can relax and don't have to worry about upgrades. In fact, we'll actually try to do an upgrade and see how that works. So for the purposes of this demo, I'll be creating target clusters which are based on Docker so that they are quicker. So let's just create one. So I just created a Docker cluster. So if you look at it, so we have the Docker cluster created, the cluster object created. We have the Docker cluster created. We have the control plane object created. And we have the machine deployments created. So you can see. So I have three watches here right now. The first one is watching the control plane so that they can see that it's actually rolling out. The second one is a watcher on the machine deployments. And I also have a section for the Kubernetes version just so that we can actually use that while we do an upgrade. And the third one is just a watcher on the machines. And perfect timing. So the first machine is trying to come up, which will be part of our control plane. And as soon as the control planes machines come up, the control plane is then considered ready. And once that is done, the machine deployments, machines are going to come up. And then the machine deployments will also join the control plane. And then the cluster should be ready at that point. So the machine deployments, machines are also coming up. And I'm cheating a little bit here. There is something called cluster resource definition within cluster API. You can look it up later. I'm just using that to install the CNI so that I don't have to install it by hand right now. So as I was saying, so while this is getting created, I can show you that we also have a second cluster definition, which is called the second cluster 2 fancy name, which is using the same class as we saw before. But right now, it's changing the topology of the target cluster a little. So it introduces a control plane with the same version, but it now has a second machine deployment section, which has additional labels that you can use on your target clusters, which could be important. Like, for example, you want a GPU marker to identify the machines that have GPU support and so on. So if you go back, yeah. So our KCP is ready, our machine deployments are ready, so everything's ready. Now let's try and actually upgrade the cluster. So we are on version 123.4. Now let's upgrade it to 123.3. Let's upgrade it to 123.4. So as we mentioned before, the goal with the SIG is to make cluster lifecycle management as boring as possible. And what could be more boring than editing a Cliamel file? So this is the cluster definition for the Docker cluster that we just created. So if you just change the version here, and that is it. So we just triggered an upgrade for our cluster. You can see that the control plane has picked up the new version, the 123.4. And it started reconciling so that the control plane machines now switch over to 123.4. So the first machine for the control plane at version 123.4 is being created. And once that is provisioned, the machine that's already at 123.3 will go down. And then the process will start with the machine deployments. Right now, this upgrade is phased. What that means is only after the control plane is successfully upgraded, will it start phasing down the upgrades to the machine deployments. And if you have multiple machine deployments, it will do it one at a time. And that's the demo for now. What's next? All right, wasn't that awesome, though? So if you look at the 2019 deep dive and this one, they're completely different. Like the user experience and user views, that's what we're focusing on today. And we will keep doing that in the future. One of our goals is to make sure that cluster API is as accessible as possible, that you can use it in a number of places. So right now, we have edited a bunch of YAML. But what if I want to do that in a UI? What if I want to do that easily through a CI deployment? We need, for example, better GitOps integration. And that's what we'll be working on for the next year or so. The next level that we want to integrate with is operating systems and bootstrapping integration. We have seen batteries included, as in, like, we ship cluster API with Qubidium. We want to go even further and make sure that Qubidium is exposed in the machine in the fullness of time. We'll have the whole configuration in there so that you can change it. And potentially also deploy those changes on the fly. But I'll get to that in a second. And on the OS level, we want to understand how better integrate cluster API with the operating system. So something that we cannot do today unless you build your own images is upgrade the operating system. This is a hot topic. We have heard, like, so much feedback for bare metal and edge. Folks want to run Kubernetes, both on bare metal and edge. And they share a lot of commonalities. Edge, like, you can just see it as a shrink down version, a more limited version of bare metal at the end of the day. We're not there yet. We have some bare metal providers. We need to do better. And the edge, it's like a tough story to sell. So we want to do better in these areas. And I urge you to look at some of the proposals that will be coming up. And there are already up in the cluster API ecosystem. There is some proposal that are already up, runtime SDK, add-on management. Add-on management is kind of needed soon that the CPI, CSI integrations are getting out of Kubernetes. Cluster API is kind of going to do the bulk of the work to make sure that, like, one, we will have a smooth migration. But also, we'll keep upgrading those components when we upgrade Kubernetes itself. The other one is then extended integration with managed Kubernetes, so EKS, AKS, GKE, and so on. The auto-scaling folks from Red Hat have been working on the from and to zero proposal so that you can auto-scale your pools of nodes automatically. And then certificate rotation. This is something that has been on the roadmap for a while. But it's 100% needed to make sure that, like, your worker nodes, so your cubelets, and the API server will keep working. So there's a huge roadmap in front of us. I mentioned, like, the user experience. Something that we want to do, for example, with the get-up story and the bootstrapping story. We've seen that you've already had a cluster running, so he kind of cheated a little, and he admitted it too. If you look at our quick start, it's still hard to spin up Kubernetes clusters today. We want to go even further than that. We want to be able to, with one command, to just create a Kubernetes cluster to go from zero to Kubernetes. And then the upgrade story, that was simple. But we need to do also better than that, like, for example, when you upgrade in things don't go as smoothly as we saw before, you might need to back up and restore your cluster. What if there is a disaster recovery scenario? Like, we are thinking about, for example, should we snapshot clusters, the workload clusters, and how would we go about it restoring those clusters after a disaster recovery scenario has happened? So I mentioned before, too, that the best feature that we have is the community. And I would urge you to get involved. Any skills you have, it's OK. We'll have something to do for you. If you have writing skills, like our book, Quick Start, Architectural Diagrams, we'll need that for sure. Product skills, we need to get better at understanding for the user experience across many components. Coding skills, we have a lot of pull requests open right now that need to review. And a lot of things that are marked as either good first issues or help wanted. There's links in here, so you can just check them out. And reach out to us like in any of the Kubernetes Slack channels. We are very active, and we respond to questions like in a timely manner. The last thing, we have weekly community meetings. We're also like a look at the mailing list. So if you join the mailing list, you will also receive the invite for the Gloucester API meeting. I think we have a few minutes for demo. But this is Gloucester API. Thank you. We have any questions? Can I have one cluster using two different providers, like VMs and bare metal? Can you repeat the question for the lady? Yeah, one cluster using two different providers. No. I said it has been untested. If you try that, and it works, that's great. Technically, if you have network connectivity between the cluster, it could be possible. But it's definitely challenging. OK, I understand. Thank you. And one more thing. If, for example, I have Metalcube as a bare metal provider, can I use Qvert to create another layer of provider, like VMs, like stacked providers? I would say yes, you could. Qvert is still in early staging right now as our provider. I think some folks from Apple are actually pushing it. But they have a Slack channel as well, so I would reach out to them. Thank you. Hi. I have a question about multi-accounts. So if you support multiple providers, for example, GCP, but is it a problem to have clusters managed from one management cluster via in different AWS accounts? So for example, I don't know, 10 AWS accounts and one workload cluster in every AWS account? What is your question? Multiple AWS accounts, is that the question? It's 100% possible. You can pretty much push multiple credentials to the CAPA controller, the AWS provider. Or you can also make them assume roles. So as long as they have the ability to do so, you can 100% do that. I have a question. We are heavy users of spot instances at our company. So what's the current support for having queued some configurations for running worker nodes with the support instances? Is anybody from AWS here? Yes, yes, we do have support for it. I do remember seeing PRs like a while back. So it seems like that you can, in the AWS machine template, you should be able to set the spot price, if I remember correctly. Yes. Go to Richard for more AWS questions. Hello. So I have two questions. One, how do you handle node upgrades, like the version? Do you have some way to drain the nodes before rotating them? And the second question is, how do you handle drift? So if I make a change manually into a EKS configuration, how do you reconcile that? For node drains, yeah, we do have configurations that we can set up for node drain timeouts and so on. And sorry, what was the second question? How do you handle drift? So if you make a change manually, for example, in an EKS cluster, how do you import that change into cluster API if you want to? So it depends. For EKS specifically, if you do a change, cluster API thinks it's the ultimate authority. So it will override your change. A few more questions over here. Think of that in the back there. Mike's in the back. So I was wondering if you were thinking of having a provider for Terraform, the reason why I'm asking is because I feel like it would be easier for many people to migrate towards cluster API since it's kind of a use case to use Terraform in the first place to currently provision via. Secondly, as you mentioned, there are still some providers that you do not support yet. For example, someone from this provider that could already have integration in Terraform. So I feel like, first of all, it could automatically add towards the providers and also be ease of use when it comes to migration. Do you want to take that? Beautiful. So for cluster API, as we mentioned, it is most of the components except for the core cluster provider components are swappable. So it should be possible for you to write a provider for any of the scenario that you want. And we just had a talk before this on how you can write your own provider that works with cluster API. And the community of the providers that we have right now is growing. So if we don't have a provider support for infrastructure right now, it might be added by someone later. Or you are more than welcome to contribute. And the community is more than welcome to contribute and create a new provider that everyone would benefit from. One thing that I add for Terraform specifically or other infrastructure as code provides, it has come up before. It's like, hey, I want to manage my infrastructure with this other tool. But then I want cluster API to integrate with it. There are some talks about it right now. If you do have use cases, please do reach out. Because those are really good. It's like, we need those more. I think there were a few hands over here. I came in a little bit late, so I don't know if you covered this. But I see you have a rolling upgrade support already. And you can define the max search and have additional nodes available that way. Is there any plans for any in-place upgrade kind of things? Yes. It was in the plan to actually talk about it a little bit. But it's very early. We are exploring some ideas to do in-place upgrades. The problem is that right now, we just delete the machine and recreate it. But for example, for things like Edge, you can't delete an Edge node because that's the only node you have. So we are exploring a way to roll those upgrades in place by using some sort of node agent. But how that would work, it's still like exploring. OK. Thanks. I think that was a question. Hey, thank you for the talk. So I have just two questions. So is downgrade possible? No. OK. No downgrades. Right now, it's blocked by our webbook. So even if you try to downgrade, let's say we throw a warning saying downgrade is prohibited. And how rollback is handled? So if you have done an upgrade and there is workload on the clusters, but for some reason, the upgrade failed, so how is going to, how cluster API handles the rollback? I can give you the back. Yeah, so for a number of reasons, we don't do rollbacks. One of which is let's say that you upgrade the first control plane nodes. And that one fails. But some of the upgrade went through. So for example, if there is a new LCD version, SCD gets updated. And then you try to rollback, there will be the older APIs ever might expect something that SCD has changed. So for these reasons, we prefer usually to understand why the upgrade was blocked. And usually it's something like, I don't know, for example, network activity was lost. Or you just have to retry because the machine has gone bad. Or there is misconfiguration. In our experience, we try to make upgrades as safe as possible. So we don't proceed even with the first machine if all the set of health checks are OK. So we check at CD, it's all the member healthy. We check the API server, controller manager, all of those components of the API server before doing any action. I know it's not a great answer for rollbacks, but rollbacks, it just gets in too many use cases that we're just not ready to support. Are there any other couple of more hands? Is there a way to kind of specify versions for the cluster classes? Like, if I want to test some upgrade, is it possible to do that? Rebase, so question was, if you can version your cluster classes or if you can? Specify the version that you want to use when upgrading. I mean, if you want to roll out or just test a change in your cluster class, maybe you can try to, I mean, because if you upgrade the cluster class, that will actually upgrade all the clusters, so if it's possible to version. So if you want to, I think it was designed on purpose so that if you change the cluster class, changing one cluster class will have, as I said, ripple effects on a lot of clusters. So whoever is doing that has to really make sure that it doesn't ripple effect and create problems with the clusters. So cluster class right now doesn't support, we don't allow cluster class to define the Kubernetes version within it. That version needs to be specified within the cluster object, but if you are talking about other kinds of configurations like, let's say, the underlying machine type or something that needs to be changed across clusters, those can be defined in cluster class, but again, you have to be really careful when you are rolling out those kinds of changes. And to help with that, there is something called inline patches in cluster class, so you can still do small changes on top of your cluster, which are all sharing the same cluster class so that one cluster could use, let's say, an AWS machine small and the other cluster could use an AWS machine large, even though both of them are based on the same cluster class. So we don't have versions like for cluster class itself. You could copy the cluster class and then change the cluster class on a small set of clusters. So you could build on top of it to achieve that. Hello. Thank you, first. I would like to know the upgrades. Goes like one by one minor update by time. Yeah. Yes. Yes. Any other questions before? Last question. Good to have you. Hi. Thank you for the presentation. I have a question. We currently have our clusters deployed with Terraform and also some batch scripts. And we have achieved a point where we just write a Terraform test state with a URL to a Git repository for Flux. And we just run the Terraform. And it will also deploy Flux for us, and then Flux will deploy everything else. It's something like that possible with cluster API. So the question was, you have Terraform and Flux. And Terraform deploy Flux, and then deploy the cluster. Exactly. The short answer is I wouldn't suggest it today. Because we need to improve the GitOps experience. We have a lot of machinery inside of our CLI today that we need to move server side. But if you just stick to one version, it's technically possible to achieve what you just said. OK. Thank you. We're good. All right. Perfect. Thank you so much, folks.