 Hi, everybody. So welcome to the cluster API tutorial. You're all very welcome. I'm Kylian. I'm Yuraj. Hi. I'm Stefan. I'm Jack. And Shivani is supposed to be presenting this, but she couldn't be here due to travel restrictions. We're going to have a video from her in a few minutes. So a couple of announcements. There's prerequisites on the tutorial at the link. If you've all run through them, can I just get a sense in the room? Can you put up your hand if you have done the prerequisites? Okay, good showing. So we're going to take a few minutes now to just let people run through the prerequisites. A couple of important things. If you're running on Fedora on your laptop or whatever, if you go to our troubleshooting guide, there's a specific setting you've got to set in Fedora. The tutorial can be unstable on Fedora because of, yeah, system limits. So please just check that out. Also for Windows, if you're using Docker for Windows to run this tutorial, you want to be using Docker for Windows 4.10.1. Newer versions are very unstable for reasons we didn't figure out. So yeah, so let's go through some of the basics of the prerequisites just to talk through that. Also, if you have a Docker app account, Docker login will help prevent you from getting rate limited. Yeah, so just to repeat what Jack said there, if you, yeah, we will get, if you're pulling the images, which is part of the prerequisites, please do log into Docker because otherwise this many people pulling from the same IP is probably going to limit it. So yeah, so the prerequisites are up there on screen. They're also on the repo. This is the link to the repo. See if any need to type it in. It's also on the shed entry, the description. So it's there, CPU limits, RAM, 32 gigs of disk space, each install Docker, cube cuttle, kind, cluster, cuttle and helm, all of the instructions for installing those things are in the guide. And there'll be people circling around the room for the next few minutes. Just see if you've got any trouble with that. So yeah, if you have trouble, just put up your hands and somebody will come to you. Do you want to say something about five minutes? So we'll do this for the next five to 10 minutes and we'll time box it and then we'll get into the full tutorial. Check, check, take a mask. Oh, yeah, I'm sorry. Yes, I'm just gonna get new to the top. So just for anybody just joining us, we're going through the prerequisites currently for the next about five minutes. If you haven't already managed to run through them. Everybody see the font or is it too small? Okay. Okay, so there'll be people circulating through the room throughout this tutorial. Put your hand up if you want help with any of the steps. If you still haven't passed the prerequisites, please, yeah, ask for help later on if you need it. So now we're going to hear from Shivani. Shivani is going to give us a quick overview of cluster APIs, a project, what the project is doing. And yeah, let's hear from Shivani. No worries. Is that a little louder? Yes. Out any tooling is very difficult. There are several tools that solves a part of the problem. And this is that is intentional, like as the sake, which is focusing on building these tutorial wants to solve the prop pieces of the problem and never tries to build one solution and even not try to build one solution that fit all problems. That's because they don't believe that we'll actually have traction in the community as everybody's needs are so different. Cluster management is really difficult. And one of the pieces that difficult is cluster creation. So we have Cubadium for that piece. But another another part of problem is managing infrastructure. So even if you have bootstrapping tools like Cubadium, they assume that you already have physical or virtual machines to run Cubadian commands on top of it. But to get this environment, we should have infrastructure specific knowledge. Also, we don't have any common interface until cluster API that ties together the infrastructure around tools like Cubadium to provide you a sort of holistic way to set up and manage the Kubernetes clusters. So like till here, we have discussed a bunch of problems. But now we'll see how we can solve it using the cluster API. First and foremost, like cluster API is a declarative API. And prior to this, the Kubernetes ecosystem didn't really have a way to represent clusters and machine inside of a Kubernetes cluster itself. It not only helps in creating and managing Kubernetes objects, but also provides declarative APIs to orchestrate the underlying infrastructure components. We really think that it's important to have the pluggable architecture. While we don't want to provide common logic to solve these use cases for people, it's not going to be one size to fit all problems. So one place where it's really important to have a pluggable architecture is the level where we interact with infrastructure provider for infrastructure provisioning or management. We should have a provider abstraction where we can plug in support for any new cloud provider or bare metal providers relatively easily. Finally, we really want to be able to enable common tooling to manage cluster across many different environments. So now we understand how cluster API helps and what does cluster API, but let's now look at the official definition of cluster API. We also call it as CAPI and it's a project of sick cluster lifecycle. It uses Kubernetes to manage Kubernetes, which is also referred by CAPI logo turtle all the way down. And by official definition, cluster API is a Kubernetes project to bring declarative Kubernetes style APIs to not only manage Kubernetes objects like pod deployment, but also useful in creation in cluster creation, configuration and management. So here we are done with cluster API basic introduction. Now let's look take a look at next item from our agenda. So to establish a common language that you and I can use throughout the remainder of this tutorial, I would like to provide you some definitions for frequently used term. So the first is management cluster. Management cluster is a Kubernetes cluster on which and to which the cluster API components have been installed. This enables the management cluster to manage the lifecycle of other Kubernetes clusters that are known as workload clusters. And this is the place where we like deploy our workloads are and our application. We also have another term like self hosted management cluster. There are type of management clusters that manages itself. So to accomplish this lifecycle management cluster API also leverages the concept of a provider. Providers have names like cluster API provider for AWS, for cluster API provider for Azure and similarly vSphere and many more. These providers are also known by their acronyms like Kappa, Cap Z, Cap V and so on. So they provide the support for an integration with a particular infrastructure platform. And in the last of this glossary section, we discussed what all things comes under inside cluster lifecycle management. So it includes creation and deletion of your Kubernetes cluster, including the underlying infrastructure, managing the underlying infrastructure, scaling up scaling up and down the number of nodes in the cluster and upgrading clusters to another Kubernetes version. So now let's see how does cluster API work. But before directly jumping into the cluster API functionality, I want to give a brief overview of Kubernetes, some of the Kubernetes concepts that are heavily used in cluster API. So at the core of Kubernetes is a control loop. We also call it reconciliation room that is responsible for reconciling reconciling the desired state and the actual state reconciling desired state and actual state simply means changing actual state to look like the desired one. The desired state is the intended state of the system, which is specified by the user and actual state refers to the state in which your system is actually in. The controllers are the components that implement those control loop, which modifies the actual state based on the desired one. The Kubernetes way to specify the desired state is through objects called custom resource definitions. The CRDs have a spec object representing the desired state and status object representing the actual state. Along with CRDs, we implement their controllers and cluster API uses the CRDs and controller to extend Kubernetes to manage the lifecycle of clusters. Basically, users want to specify the configuration of clusters. And based on this configuration, let the controllers implemented by cluster API to create and manage the cluster. This way, the building blocks provided by Kubernetes in the form of CRDs and controllers are used to create or manage your new Kubernetes clusters. Let's see this again for more understanding with this diagram. Here, we have a management cluster on which we are running our cluster API components, and we have one client. So the user specifies the declarative cluster configuration in the form of CRD to the management cluster. The cluster API controller then present in the management cluster, read those configurations available in CRD and create new Kubernetes cluster based on that configuration. This way, CAPI makes the actual state of the system same as the desired state. Now let's see what all cluster API components are running inside management cluster. So the controllers are divided into four types of provider based on their responsibility. And each provider has a manager which runs their respective controller. So in cluster API components, we have four type of managers core controller manager, bootstrap controller manager, infrastructure controller manager, and the last one control plane controller manager. Now we'll see in brief like what all these these manager contain with the help of this diagram. So core provider manager, infrastructure provider manager, control plane provider manager and bootstrap provider manager. They are they contain some type of CRDs and their definition. So the core one have four type of CRDs cluster, machine deployment, machine set and machine CRDs. Similarly, infrastructure provider contain infrastructure specific controllers that are responsible for connecting with the infrastructure, be it cloud or bare metal servers. Whereas in the middle you can see control plane provider and it's responsible for initializing your control plane. And bootstrap provider finally responsible for bootstrapping other worker nodes into our Kubernetes cluster. I think it's a big bit high level view. But now let's understand how these mentioned CRDs work with each other. So in the next few slides, I'll explain each of the CRDs mentioned in the previous slides and their interaction. First, we'll take a look at cluster CRD. It's a root of whole thing and responsible for maintaining the cluster lifecycle. Environment specific configurations like pod service items and DNS domain goes into the classes. Sorry, DNS DNS domain goes into the cluster specification. Next, we have infrastructure cluster. Infra can be AWS Azure vSphere or even bring your own host. And specifications under Infra cluster are based on underlying infrastructure and have details required for that particular environment. Next, we have configurations for initializing your control plane. By default, CAPI supports QBDM. And it's related specifications like init and join configuration are mentioned in control plane CRD. So once control plane is initialized, we need machine deployments that defines machine for the worker nodes. So that's how these all these CRDs interact with each other and dependent like their flow is streamlined and they are dependent how they are dependent on each other. Next, we discuss about a cluster class and managed topology. It's a huge huge UX improvement on how end users is interact with cluster API. And it basically reduces customer surface area of interaction. The sole idea behind cluster class is we just want to define the structure of a cluster or the topology of a cluster once and reuse it across multiple clusters so that we just have one object called the cluster object in which we will have a topology section which then can be used to stamp out clusters that look alike. So that would look something like this like this on the screen. So if you see on the left hand side, we defined a cluster class. It's a collection of templates and then we provide different managed topologies to the cluster class like we have cluster A managed topology and cluster B managed topology. Then they refer it then the cluster then they refer this cluster class to create two different cluster object which looks alike, but also different in their own way. So that's how we can leverage the concept of cluster class to create different clusters which looks alike. So that's all basically with the cluster API fundamentals. And now with other fellow speakers will start with hands on part. Thank you everyone. Okay, I'd just like to really thank Shivani for that video. And yeah, it's just really nice to have her here in some way. So I guess at this point, have we got hands for who's done the prereqs? Have a lot of people managed to do them during the course? Anybody having serious problems that they'd like help with at this point? Again, hand up anytime somebody will come to you. So yep, for those of you that have your Docker kind, everything set up. It's time to get your first cluster running. So let's see what that looks like. So at the bottom of the prereqs, if you got that open on the tutorial, there's the link to the next section, which is creating your first cluster cluster API. So let me make sure I've got my command line here. Okay, so the prerequisites mentioned this a bit, but this is going to be a cluster you're going to set up using Docker infrastructure. So you're going to set up Kubernetes clusters where each node is running in a Docker container on your local machine. So we have a cluster API provider called CAPT D cluster API provider Docker infrastructure that manages this for us. You can also use cluster API to set up clusters up across any number of clouds, bare metal environments, there's loads of different providers. And we have a quick start guide for a lot of those. The actual flow of the quick start might be a bit different, which I'll explain in a few minutes. But yeah, so you can set up clusters on AWS, Azure, GCP, Digital Lotion. And we've got a list of providers on the cluster API book online. So for this guide, this part of the tutorial, like the prerequisites is split up into Linux Mac and Windows, so select the version that's closest for you. The Windows version is PowerShell with Docker for desktop. If you're running in something more like Linux environment on Windows or VM or WSL or something, you might want to follow the Linux guide instead. But yeah, so click on the one of those that's closest to your system. Mac OS. So the first thing we're going to do is set up a single Kubernetes cluster. So we should have already pre-pulled the images. So these are all the Docker images we use during the tutorial, during the prerequisites. So I'm going to make sure I'm in the right directory. And I'm just going to run the script to create a kind cluster. This is a normal enough kind cluster, just as a couple of slight changes to help it work with our infrastructure provider for Docker. So this cluster is going to perform the basis of our management cluster. So once it's up and running, the next step will be to install the management components on it. And those components are what will manage new clusters that we create using cluster API, manage their lifecycle, manage their infrastructure. All of the providers that Sivanee showed us there. So the core controller manager, the infrastructure controller manager, the control plane controller manager, and the bootstrap controller manager, they run on the management cluster, which then creates other clusters, which we call workload clusters. Okay. So once we've got our kind cluster up, we just just check our nodes. So we got one row. There's a single node kind cluster. So now we're going to install the management components. This is the controllers I mentioned earlier. So cluster API is pluggable. So we've got the idea of a control plane provider, and there's multiple implementations of that. So the specific providers we're using today are the Docker provider for infrastructure. Capsi, which is the Azure infrastructure providers, an alternative to that. We're using the core cluster API provider, which every setup uses, that's what manages the fundamental CRDs that cluster API manages. Our bootstrap providers using QBDM. So we've got a core in our core repo, we've got a API or a cluster API provider, a bootstrap provider for QBDM. And similarly, we've got a QBDM control plane provider. But everything except for the core manager is pluggable. So to run this, we just need to run this command get up as this copy and paste, which I should use more. So we're just going to copy this. This will set up a couple environmental variables. So one of them is the repository we use locally. I'm just going to let that run while I explain the next two are feature flags that we'll be using in the course of tutorials. These are recent features that were added to cluster API over the last, I guess, year 80 months. The first one is cluster topology, which is cluster class. This lets us create many clusters from a single template that we stamp them from, that template's called a cluster class. This is the difference you may see with on the quick start as it stands the moment on the cluster API book. Many of the other providers aren't using cluster classes, their primary method right now, but the cluster provider for Docker should be very similar to this guide. One time SDK is another feature that we're using. We'll see that a little bit later. And so this will install all of our controllers and it installs cert manager as well, which we just need to manage communication between components. So now that we've installed also cluster cutlinit does all of that for us. We can just see what pods are running on the cluster. And we can see CAPTI, this is our infrastructure, this is Docker controller manager, QBDM bootstrap controller, which handles the bootstrapping of the nodes, QBDM control plane controller, which handles our control plane, and the CAPI controller manager, which is our core manager, and then some cert manager and the normal Kubernetes control plane components are there as well. So now we've got a functioning management cluster. The next step is actually to create a cluster. So the first thing we're going to do is create the cluster class, which I mentioned earlier, that's this command so all of these are in the repo. So we just apply that like any other Kubernetes resource and that will create the cluster in our API. So we have that successful. We can take a look at what the cluster looks like just before we create it. So this cluster because the specific spec of the cluster is defined in our cluster class, it hides a lot of that complexity. So this cluster is a very simple object. It has a name, namespace, a little bit of networking information, and then under the topology, which is how we stamp the shape of it, this is the cluster class that we've just created. Just call a quick start. We've got a Kubernetes version, we've got the number of replicas we want in the control plane, and then under workers we've got we want a single machine deployment. This is an allergist to a deployment in Kubernetes and we want one machine in that machine deployment. So just going to create that cluster and we're going to use cluster cuttles. So cluster cuttle is the CLI that comes packaged with cluster API. So it's part of the core repo. So you should have been able to download this during the pre-rex. I'm going to use watch here if you have it installed. I just encourage you to use watch for a minute. So with cluster cuttle described, we can see all the different parts of the cluster come up. So cluster API after creating that YAML, simple YAML with 10 or 20 lines, we create the cluster. It goes away and links it all together. It creates machines in the infrastructure, in this case, darker containers. It will bootstrap those machines into nodes. It will bootstrap the first machines into a control plane node and then create a second node that will be a machine deployment, which is a worker node. And this is all then managed centrally from our management cluster. So next step, let's just have a look at the clusters. Again, it's a Kubernetes resource. We can just have a look at it anyway we want. So we can see the cluster there. It's in provision state. And the next thing we can do in order to install stuff on that cluster, we're going to use kind because this is using Docker infrastructure. We can treat it as a kind cluster as well. We're going to use kind to get the kubeconfig and with that kubeconfig we're going to get the nodes in the cluster to see how they're doing. So right now we only have a control plane node. It's not ready yet. So these nodes will become ready in a minute once we install the CNI and we can do kubectl get machines to see how the machines are doing. So we've got two machines, but only one node and that's because the second machine, which is our machine deployment worker, that's still waiting to bootstrap. So we'll come up in a second. So to make our nodes into ready state, we're going to install a CNI, in this case Calico. Just get the CNI ready condition. So we're using the kubeconfig for our cluster. We can get the pods. We can see the Calico nodes coming up. And we can just have a look at the nodes and just wait for them to be ready. So we can see we've got, so the first command here is get nodes on the workload cluster. We can see both of them have become ready. One's 90 seconds old. That's the worker. In cluster. Describe everything is true. So this means that all of our machines have come up. All of our nodes have come up. Yep. And just to have a look at our clusters again. Okay. So that's your first cluster. You'll have a few minutes or a few kind of a couple of minutes to finish that up. But if you've been successful in managing to create their cluster, we've got hands up for people who've actually got one up and running. People happy. Yeah. Cool. So next section is we're going to use the cap visualizer. This is an open source project that's kind of saw us by Jonathan Tong. So Jonathan made this really cool open source project that we're going to deploy in our cluster now on our management cluster. It'll just let us have a look at what the cluster looks like. So I'm going to open a new terminal to run this. So we're going to use Helm to install this. You should have all the charts and everything locally. So run that Helm command. We can see it's come up here. And the next step is so we're going to just run a port forward. You might want to do this in a different terminal window and leave this running in the background because this visualizer is a really interesting thing to look at throughout the tutorial. So it's really good to get an overview of what your cluster looks like. So we're going to run a port forward and like that command, you can either run it in the background in your terminal or run another terminal and just make sure it says up. It'll take a couple of minutes to get up and when you contact it, it does tend to die if it's not up yet. So let's just try that in a couple of seconds. Yeah, so we can see this is our management cluster. Sure. So this is our management cluster. We're going to click into dot plus one and this is the overall topology of our cluster. Once you've got your cluster up and running in your machine, run this. Like I said, if you keep it running background in your machine, you'll be able to check this throughout the tutorial. We go through different operations, scaling and see what happens. So I think for the next few minutes, we're just going to run around and just help everybody who can get up their first cluster just before moving on to the next section. He has error like enable the read image data from from cup d. Error failure to provide images into the Docker machine. Could you? It seems that he is missing some images that we are expecting. Did you do the pre-poll script? Yeah, and it worked? It seemed to work fine. Yeah, it seemed to. Let's check it here about them because I don't remember which one. I mean, all these guys came from the pre-poll. This is from the witch pod. Is this log spring? This is the cup d controller. Okay, got it, cool. A cup d control. Oh, we can look at the cup d image to see which. Should I try to pre-poll the image? What is the traffic gain? Let's see. Maybe the I don't know if the images are only maybe they're only different. I don't see the exact. Okay, how many folks built their first cluster API cluster in the last 20 minutes? Wow, that's great. I see like 10 hands. How many people totally failed to do the same thing? Nope. Oh, one. A couple of people are. Okay. Yeah, if you raise your hand and we'll have folks come and help you out. No, it's not. That's separate. Well, I think that means wait a few more minutes. Two minutes. So hopefully my voice is loud enough. Some folks heard that anyway. All right, so the next section we are going to go through is cluster topology. So when we say cluster topology in cluster API, we mean you'll hear terms like shape and size. So typically it means things like the number of control plane nodes running on your cluster, the number of worker machines running on your cluster, the number of pools working of worker machines running on your cluster. And what we can demonstrate now that you've built a cluster with cluster API, the real power of doing things like cluster fleet management can come into focus with these cluster topology gestures. So the first thing we're going to do is we're going to add a new pool of worker nodes. So if you have your clusters all ready and set up, what we're essentially going to do is define a new declarative spec that declares a new worker pool node. We're going to point our cube CTL to the worker cluster that we're running and we're going to apply that spec and we'll see an entirely new pool of nodes show up in our cluster. So here in, as for reference if you folks can see this, this is what the original declarative spec for our worker nodes looked like. You see we've got this workers and then machine deployments array and we've got one entry in that array called of a class default worker. So what we're going to do is I'm going to copy this command and show you the difference between what we originally installed and this additive YAML spec. So we can see that there's a new entry called md-1 with the replica count of one. So if I apply that real quickly, let's see if I can find a command to get the current nodes. So here is here's the current set of nodes running on the system. So you can see I've got one control plane node. It's that second one in the list. And then I've got one worker node with if you look right in the middle, you can see even though the text is really small, you can see that little prefix md-0. So again, we're going to apply a spec that installs a new pool and it's going to be identified with md-1. So I'm going to run this kubectl apply command and now I can do this get nodes and I can watch it. And before too long, we should see another node appear. See if there's any other material here. So wait a few seconds for you folks to do that same kubectl apply and then well that's happening. Hopefully this get nodes watcher statement will give us what we're looking for. There we go. So you can see it's identified there in the middle of that long gnarly string with an md-1. And now we've gone to ready. So that was an example of let's like sort of revisit what we just did. That was an example of a really simple kubectl gesture. Against the management cluster to add a new worker pool. And so then on our worker node, we've got a new node coming online. And another fun thing we can do is if we go over here to our visualize, you can actually see you saw there for a second the visualizer sort of refreshed itself. So now we can see even though it's a little tight in there, we can see we've got another machine deployment. That's a cluster API CRD. So again, we're looking at in this visualizer, we're looking at the cluster API abstraction layer. So these are the, this is the view of the cluster from cluster API's point of view. But you can see if we expand this, we've got md-1 which joined the preexisting md-0. So this visualizer is a really nice tool. In addition to doing this watch command on the command line, it's sort of similar to that. As we modify the topology, the visualizer will sort of show some in progress visualizations to indicate what's going on. All right. So now we're going to do the same thing in reverse. So I am going to item potently apply the original cluster spec and that this is going to now, this is going to going to determine that there's a delta between the new spec and this, this spec right here where there's no longer that md-1 in the configuration. And I'm actually going to use the visualizer this time. We should see that md-1 machine set disappear. So the way that you can sort of infer this works is that the authoritative set of pools in that array is understood to be the definitive configuration. So if we've got say, there we go, just got rid of md-1. So if we've got like five pools in our array and we send a QC to apply with a declarative spec with two, then those three that aren't there are going to be deleted. And I should point out that these are going to be deleted gracefully. There's going to be a coordinate drain as a part of it. So if, depending on your workload scenario and production, if you're doing this in production, that might take a long time for that machine pool to disappear because coordinate drain can, you know, block on success. But because we're doing this in a demo environment where there's no workloads running on the system, that coordinate drain was really fast. Okay, so there is a section here that I'm actually going to skip because it takes about 10 to 15 minutes and in the interest of time we don't have to wait for this to happen. But in your own time, definitely go through. This is a super powerful gesture for cluster API to scale out your control plane nodes. So as you've probably noticed, you wouldn't run a production cluster like we've been doing in this demo with only a single control plane node. So to get that to three or five or some appropriately, HA redundant number, you simply update the replica count in the existing. I'll show you the the diff here without actually running through the command. And you can see that if we change that replica's value from one to three, then cluster API will receive that spec and then reconcile that eventual consistency. The way that it does it for control plane nodes is one at a time, which is why it takes a little bit of time. And it's probably best to just skip it for the demo. But super, super important gesture there. Okay, so the final thing we're going to do here in topology is well, so I've been I'll do a really quick demonstration of maybe a more idiomatic gesture for folks used to editing live Kubernetes resources. So this should be familiar to folks if you want to like edit a deployment to increase the replicas or perhaps you want to change the image from the image reference points to V1 and you want to update that to V2. So we can do something similar for all of these cluster API resources. So as we see here in the the cluster resource, we've got a replica's count for the control plane. So I can if I change it to three and that's that's going to initiate an eventual reconciliation for that. And if I go down here, you can see that now in this cluster spec, I'm back to having a single worker pool called MD0. If I were to put that to 10, that would scale out to 10 nodes from one. So that's that's a more sort of real time view of how you might apply that configuration. And the final thing I'm going to do here is so I think the the doc walks through scaling up from one to three. So I'll actually go through that and make that change. So I'm going to what the documentation is trying to describe is find the replica's configuration in your MD0 machine deployment. And I'm going to change that to three. So I'm going to colon WQ because VI is my configured editor here. And I've got the output from that command that the Docker cluster one resource was edited. Now I can go over to the visualizer. I could also do a cube CTL minus W. And as we see right now under here, if you can follow my mouse on the screen, we've got one machine under this machine set. So what we should see in 30 seconds or so is that expand to three. And these these scale out events will happen concurrently. There there's cluster API configuration that you can look through in the quick book that describes how to set those sort of rolling upgrade type events. So you can do and at a time it's configurable. All right. So now we've gone to orange. So you probably can't see it on the on the screen in that much detail. Just do it. Actually I can maybe zoom in a little bit. So if we go over here, we're not there. So this maybe that's big enough. You can see this is sort of it's like Mr. coffee churning and bubbling stuff is happening there. And also anybody doing any successful topology changes out there? Anyone scaled their cluster? Had it a pool? Deleted a pool? Cool. Any of those folks do this for the very first time? Never done this before? Exciting. All right. I think in the interest of time I'm going to skip ahead so we can get our next speaker here. The next page in the docs are about machine health checks. So I'll quickly give an overview of what this is. So a machine health check is really a way of declaring certain vectors that inform the health of a particular machine and when that vector goes unhealthy then cluster API will cordon and drain and replace that machine. So here is an example hopefully that's big enough of the sort of at least for me these are these are sort of canonical health vectors to check. So at the top we've got this under machine health check node startup timeout. So you're able to declare a sort of maximum timeout value beyond which you're no longer going to wait for that node and you're actually going to re-initiate the provisioning process. You're going to clean up that machine replace it with a new one. So that's one way of configuring a machine health check. There are these two other conditions that are under the unhealthy conditions array there and one is defined as type ready status false. So essentially what that means is a machine is deemed to be unhealthy if the ready condition is not met. So when you're doing like a QCTL get nodes the the thing you'll see in the status column is not ready. So we've got a timeout defined here. What that tells us is that if we if cluster API observes that a node is in a not ready condition for five minutes then it's going to go ahead and recycle that node. It's going to consider that an unhealthy event and similarly this example here for ready unknown. So if you've got a node in an unknown state it's going to recycle that machine. So if I go down here further here's an example of a novel machine health check that we're defining. We're calling it demo node healthy and similar to the examples above this if the status is false for 60 seconds or more then we're going to consider that node unhealthy and the reason we're defining this novel condition is so that we can easily reproduce this. So let me make sure that so the we ship these machine health checks in that original cluster class spec that's in the example. So I should see those running here. Okay, great. So as we see here we've got a machine health check output. There's not that much detail there. We could we could get more if we wanted to drill down. But what I'm going to do here is let's see. Just going to look at the machines as the doc suggests and we've got four machines running running running. So I am going to manually patch I think I think the way this command work is it's going to manually patch all the nodes in the in the doc I believe we scale back down to one before we proceed but this will be fun anyway. Let's do it live demo. So this is going to manually patch these nodes with these the condition that we're looking for in our demo demo node healthy machine health check. So you can see here in this sort of gnarly escaped. Jason data that we're doing that. I wonder if it has to do with the replicas count. I'm going to set this back down to one. And this should reconcile fairly quickly because again we don't have production workloads running on this. And so the coordinate drain will be quick. And let's see and watch what happens here. Looks like I've only got one machine. Right. Okay. Let's retry that command. I just add a fades while it tries to make sure I'm doing copy paste. Oh that wasn't for windows. I see. Beware everyone when using windows escape commands on macOS. It's not going to work. Okay. So you can see it was patched. So we've added that statement and now if we now we'll see that it's we are I was already not quick enough to even detect that machine being recycled. So I'm going to look at the visualizer here and what are you going to tell me. All right. So we can see now again we're in that there's a little orange spinny thing that you probably can't detect in any detail up there on the screen which is indicating that that machine is in the process of reconciling. So we've deleted that machine after patching it with the the machine health check condition that we defined and in the visualizer and over here in our watcher we should there we go we see the new machine. So now cluster API has accorded and drained that prior machine according to the machine health check condition being fulfilled and then according to the rolling update configuration recycled that machine brought a new one online and now we've got a new node. And I think we are ready for Kubernetes upgrade. I know we're kind of cruising right along here. Do you want to give a few minutes? It's how many folks are in real time with the presentation on stage? How many folks are way behind? Okay. We'll wait three minutes. That's a fix that you can do just the model next machine. Feel free to raise hand if you have any questions. And nothing happens. Four minutes. Just restart the policy class. And also while people are working on this feel free to continue this lab at home or whenever you find time. And if you have any problems feel free to reach out to us on the upstream Kubernetes Slack. The Slack handle is called cluster API and we should be able to address any of the problems there. If you have any issues we should be able to sit up there and feel free to open up open up a discussion thread. And we'll take a look. Yeah. So let's take a look at what we have right now. So we do we have one worker cluster that's that has one company node and one worker node which are both at 124.6 version. Let's try to upgrade this to our 125 version. And so since class API is declarative we following the same pattern as we have for the other examples we just have another YAML file that dictates how the dictates the change in version of the target cluster. And if you take a look at that change we'll see that the only thing that we changed is a version field in the YAML. So from change we changed it from 124.6 to our 125.2 And that's it. That's the only thing you'll have to change to be able to bump up to a Kubernetes version. And it's as simple as that right to change one value in the YAML and your Kubernetes cluster upgrade should be triggered. And let's do that right now. So I'm applying the change. Yeah. So my change is applied and let's just watch the control planes to see that it's changing. And you can see that the way class API performs an upgrade on these workload clusters is it first upgrades the control plane nodes all of them to the target version. And only after they're successfully upgraded to the target version will it move on to your machine deployments. And if your workload cluster has more than one mission deployments it will go it will go and upgrade them in order. And it's completely orchestrated and you just have to monitor it. You don't have to perform any immediate actions to be able to let this through. And as we can see the control planes spin up a new machine to be able to pick the new version and once the new once the control plane machine for the new version is available it can scale down the it will delete the machine at the older version and once the control plane is completely upgraded it will move on to upgrading the machine deployments. It could take a few minutes for this to go through but we'll just let it go through and we'll see that the control plane is completely upgraded. As you can see the control plane is now targeting the version 125 too and as soon as the other machine is scaled down the 124-6 machine is scaled down we should be able to get a control plane that's completely at target version. Yeah, it's started deleting the old control plane. How many are you how many were able to trigger the upgrade a few? Okay, yeah. So the control plane is completely upgraded to the target version and now it's moved on to upgrading the machine deployment. You can see that a 125-2 version machine for the MD0 machine deployment is now being provisioned and as soon as that was ready the 124-6 machine for the machine deployment MD0 will be scaled down will be deleted and then our control and then our cluster is completely considered as upgraded. So yeah, the machine for the machine are the machine deployment is now completely upgraded and the control plane is now completely upgraded. So now we just have like one control plane node and one machine deployment node. So the same topology as for same topology as the cluster that we had before this is that both of them have been upgraded to the newest version. So we so we just have one control plane node and one machine deployment node as before and both of them are at point 25 of top two. Now let's move on to the next section. So before that we can just clean up this cluster for now so we don't need this anymore. So let's just delete the cluster that we would the dog cluster one cluster that we have been using and after that after that successfully deleted let's move on to the next section which is cluster life cycle books it's pretty interesting. Yeah. Yeah. The cluster successfully deleted there are no more class demo there are no more workload clusters in our setup right now. So let's move to the cluster life cycle books. A little bit about cluster life cycle books. So cluster API right now has the ability to produce certain events called first life cycles events and they can be hooked into by external systems to perform certain actions in this demo today we'll just take a look at a simple extension simple extension that just receives these events and then logs in as you can see in the topology that we have here the management clusters sends certain events to a targeted extent test test extension server that could be running anywhere in this demo we'll just run that as a deployment within our management cluster itself and once the extension server receives these events it can send back a particular response depending on the event that it received and then that can affect the cluster life cycle in certain ways simple example would be when a new workload cluster is requested to be created the management cluster can then send a before cluster create hook to the extension server the extension server can either send back an allow response or a block response so if a block response is sent the workload cluster will be not created and then the cluster API will just keep requesting the test extension server depending on some of the parameters that it sent back on okay am I allowed to create the workload cluster no if not just let me know when I need to check back and so on so you can set up an environment like like so where you have maybe not one maybe have multiple extension servers each of them dictating if you need to if you're allowed to go ahead and create the workload cluster or if you need to wait before the workload cluster is created the before cluster create hook is just one example we have six life cycle hooks right now within cluster API and at this demo we'll we'll take a look at at least three of them so as I mentioned before in this demo we'll be running a test extension server test extension server that is within the management cluster because it's easier for us but ideally you could you could possibly imagine a case where you're running an external external servers or you can run it in any other cluster you're not required to always run the management cluster as long as it's reachable it should be fine so for this demo let's just run a test extension server deployment within our management cluster just make sure that our deployment is running our extension extension server is running it's running it's up we have the one replica available so we should be able to move ahead once the test extension server is running we need a way to register that with cluster API so the cluster API knows where it is how to reach it and the original information like certificates and so on so here is an example of an extension config which is used to define which is used to declare where an extension server is and this is how we register an extension server with cluster API so for each extension config you just you need a unique name and I'm using I have some CA injection mechanism set up right here so this is just to be able to inject the corresponding CA and the certificates and the client config refers to where the extension server lives since my extension server lives within my within my management clusters so I just have a reference to that particular service that's exposed as part of the deployment and it should be able to reach the cluster API controller should be able to reach the extension server from within the management cluster I also have a match expression this is just to be able to make sure that this extension server doesn't operate on all events from all the workload clusters so you should be able to filter it down to only act on certain events if you if they match if they are coming for a particular workload cluster that matches certain labels here I just have a namespace label so only the events that are triggered by workload clusters in this target namespace will be handled by this particular extension server so let's register that so let's register our extension server with cluster API just make sure that it's reachable and it's all good yeah so the extension server has successfully discovered cluster API was able to reach the extension server and do the necessary things to be able to register and there are a few optional sections for you to explore on how an extension server communicates with cluster API to let it inform what kinds of events it supports and what kinds of events it's expecting I'll leave that for you guys to explore now that we have an extension server registered and an extension server that's already running let's actually look at some of these events so as I mentioned cluster API right now supports six life cycle hooks we'll take a look at two of them so we'll try to create a workload cluster we'll see that the extension server receives an event called before cluster create and then it just logs it it doesn't block the cluster creation operation you can always change the extension server to be able to block it and then maybe rely on some other input to decide when to unblock the creation and then allow allow the process to go ahead right so I just have a file here I just have a YAML file here that creates a workload cluster so let me just do that once the workload cluster creation is triggered an event should have been sent to the test extension server so let's take a look at the test extension server's logs to see that it actually received the event and it logged it so I did receive the the extension server received the before cluster create event and it just logged it and you can see that it sent back a success response with retry zero retry zero just implies that do not block the creation just let it go through let it allow it's basically equivalent to an allow response there is a there is a section at the end of this tutorial which in which it's completely optional in which you can try to change the extension server's logic so that it blocks and it's completely optional try that at when you have some time for this tutorial we'll just continue with the allow responses but I would highly highly encourage you to try it out to get a feel of how extensions how powerful runtime extensions are and how you can use them now let's also try to delete the cluster and we should receive similarly we see what delete cluster hook and again this extension server defaults to just allow allowing the operation to go ahead and it doesn't block any of the operations and it should allow the workload cluster to get deleted but as I mentioned there is an option optional section in which you should be able to block it and see that the workload cluster deletion is blocked until the extension server responds with a success allow response so let's delete the workload cluster yeah a delete has been issued let's look at our extension servers logs to see that it received the event yeah it did receive the event as you can see here let's just clean up let's just clean up the extension server so that we don't change the change any other items here okay so we wrap it up here thanks everyone for coming we have a card called for feedback if someone see feedback here we would have had one last section which is about self-hosted clusters so how do you use cluster API to manage cluster API itself feel free to follow up at home if you have any problems any questions we are in the upstream Kubernetes Slack channel cluster API feel free to ask whenever you need anything yeah that's it feel free to stick around ask some questions but let's say the official part thanks everyone