 Yeah, so yeah, hello everyone. Yeah, welcome again to the dev conference and yeah for the next session We have with us Michael and Joel who'd be speaking out on this topic the declarative Kubernetes cluster with the cluster APIs and yeah, and if you have any of the questions regarding the session going on You can just pop up your questions in the chat box, which is on the right side of the screen So let's start our search Hello, and welcome to our presentation declarative Kubernetes clusters with cluster API My name is Michael McEwen and I'll be joining Joel speed today And we're both engineers that work for Red Hat where we work on cloud infrastructure tooling and one of the projects that we work on is the cluster API project Let's do a little overview of what we'll talk about today So we'll start with a tech review to make sure that everyone's on the same page Then we'll talk about what is the cluster API product? Then I'll hand it over to Joel and he'll talk about why you might want to use cluster API What the internal anatomy of the project looks like and give a small demonstration of how you can use cluster API Then I'll hand it back over to me and I'll talk about how you can get involved with the project. Let's get things started First let's do a little review of Kubernetes and not necessarily from the perspective of a user But from the perspective of someone who wants to understand the topology of Kubernetes and how it's architected So when I think about Kubernetes, I think about it in these kind of three Different boxes. There are the infrastructure providers at the base layer who give the surface upon which will do all of our computing work And then there's the control plane which manages the cluster and the resources that come into it And then lastly we have the workers which actually run the workloads that users load into the cluster Let's dive a little deeper into each one of these components though just to tease apart what they do So the control plane is responsible for making decisions about the cluster itself And the first point of entry is the kube API server And this is a process that takes requests from the user about API updates that need to happen And these are the APIs we know like the resources such as pods and nodes and those type of things And then it enters those into the system so that they can be acted upon XED is the next component behind that which provides a consistent highly available store for all this information Next we have the kube scheduler which handles taking requests for pods and Placing them on to nodes so that everything gets scheduled with the resources it needs Another component is the kube controller manager and this runs some of the core controllers that watch for the base Objects inside of Kubernetes things such as nodes and replication controllers And then lastly we have a cloud controller manager which Coordinates talking from Kubernetes to the underlying infrastructure so that when information such as Machine sizes or network topology need to be learned This can be a place to get those details and all these processes together work in a horizontally scalable manner So that your control plane could be one machine or it could be multiple machines Next we have the workers and the workers are where all the workloads run So the main process here is the kubelet and the kubelet is responsible for ensuring that every pod spec That it is aware of matches with a container on that machine It will not watch containers that don't have pods associated with them And this respect tries to keep the pods in order for kubernetes The next component that's on each worker is the kube proxy And this is a networking component that helps to build the service Mesh that we know inside of kubernetes and helps to make the trend the the communication between pods Work in the way that we know And then finally there's a controller a container runtime This runs the pods and runs the containers and there's several different runtimes that could be added So lastly we have the infrastructure providers This you know whether it's bare metal or virtual machines Networking and storage this the provider gives you all the components that are needed by kubernetes to run Now there's two entries at the bottom here that I kind of put some question marks by There are some providers that give really radical different ways to look at a cluster One of them is called the docker provider and this is used By a tool such as kind kubernetes in docker And then there you can see how a container can be made abstractly to look like a node and so kubernetes could actually Be deployed to a series of containers then on a single machine without needing a large infrastructure And this is great for testing and experimenting Another great really interesting test project is called kubemark And this is a project that allows you to model a kubernetes cluster in memory You mean you can't actually run, you know Heavy workloads here, but you can use it to test the core scheduling and and other mechanics of the kubernetes cluster So I add these only because I think they're really interesting projects and they just show how flexible kubernetes can be So let's start to dive into what is the cluster api project And if we go from the description, uh in the project itself We see that cluster api is a kubernetes sub project focused on providing declarative apis and tooling to simplify provisioning upgrading and operating multiple kubernetes clusters Now that's kind of a mouthful, but let's break it down to see what it really means So declarative apis if you've used kubernetes, you're most likely familiar with it When I approach the kubernetes api, I give it a declaration of what I would like it to match I say this is the pod I would like These are the networks I would like it to use. This is the storage it needs And then kubernetes will either accept that or reject it. Hopefully it will give me back a pod and match what I've declared So cluster api wants to take that methodology and apply it to entire kubernetes clusters So in essence, could I ask my kubernetes cluster for one cluster please and have it return me an entire cluster Now you can see I've labeled these management and workload And these are concepts that come from the kubernetes cluster api project The management cluster houses all the resources that I need to speak about workload clusters so resources such as cluster and machine and machine deployment These are things that I can use as resources in my management cluster to talk about my workload clusters And let's take a look at how that works a little bit broader So here's a situation where I have one management cluster and three workload clusters each deployed on a different cloud And this is one way that I could use the cluster api project To help manage multiple clusters and really have many deployment options for how I want to use them But even more than just creating and destroying clusters I can get a little deeper with my usage of cluster api So here's an example where I'm talking to the management cluster and I want to have it do work in a workload cluster Based on specific actions So perhaps I want to delete a specific machine from the workload cluster Or maybe I want to add more machines to that cluster cluster api can make all these things very easy By giving us tools to address those machines to address those clusters and to address the groupings of machines within those clusters So you can imagine a heterogeneous cluster where some machines are one type And another batch are a different type cluster api gives us a way to manage those machines in a very convenient api But even more than just the basic management of machines cluster api adds a bunch of extra tools that are really cool So there's a control plane management resource that allows you to control kubernetes version that's deployed and the topology of the control plane There's a machine health checking component that allows us to automatically remediate unhealthy machines Cluster api is supported by the kubernetes cluster auto scaler So you can automatically grow and shrink your clusters based on the workloads that are on them There's also a bootstrapping tool and resource which allow you to control How a cluster is created and bootstrapped and how machines are created and bootstrapped into that cluster And then there's also support for spot instance usage And many providers give a something they call spot instances or it might be called something else Depending on the provider But these are instances that can be used at a lower price rate based on availability So when the when the provider has extra machines available They might make them make make them usable at a reduced rate and cluster api can automatically watch for that and use them when possible There are also eight different cloud providers that currently exist in cluster api You saw some of the big names earlier. There are many more. I encourage you to come check them out So with that said i'm going to hand it over to joel and he's going to start to talk about why do we need the cluster api Take it away joel Thanks mike. So now let me know what capi cluster is. Why do we need it? Explain this i'm going to take you for an example of a small software startup that i'm going to call the awesome software company They've got this product team who's built an awesome product That's going to be hosted as a software as a service product on top of kubernetes their customers to use So for that they need a production kubernetes cluster to run the product So that they don't break production when they're making changes They want to make sure that all of their software changes go through a staging process first For this they're going to use a separate staging environment on a separate kubernetes cluster So that they don't break their qe process They're also going to have a separate development cluster where they can make rapid iterative changes Play around explore ideas and not break that qe process As the company grows they realize that of a near the kubernetes cluster Actually isn't all that useful for their product team They're going to need some supporting services and for that they're going to hire a platform team who are going to build those supporting services Things like monitoring and logging ci systems alerting systems these kinds of things Now because the platform team are also going to be developing software They want to have their own development environment so that they don't break their customer the product team And they're also going to want a staging environment too Thus maybe a ci cluster so that they can run builds for the product team As you can see we're starting to get more and more kubernetes clusters in this company and that's just two teams Now imagine that as the software takes off We've got customer coming in who want to run their the product in a different region Say for data protection laws, they need to run it within europe or within china Or perhaps The clusters are getting too big you've got thousands of customers running on a single cluster So behind the scenes the awesome software company have decided they're going to split that cluster in two and migrate some customers Onto a different cluster and mitigate some of that risk As you can see The number of kubernetes clusters here can grow rapidly And soon become very unmanageable So How can capy help with this? Well capy centralizes the management of kubernetes clusters By making a single management cluster we can manage an infinite number Of kubernetes clusters across multiple cloud providers from a single place Just using cube control No more logging into amazon and then gcp and then it's your to check the status of your clusters All our information is centralized Secondly capy automates the provisioning of kubernetes clusters This is something that used to be a very long and manual process with lots of tedious steps Now using capy you can use a few yaml files and then just wait Apply the yaml to your cluster 10 15 minutes later you will have a kubernetes cluster Thirdly automatic remediation As an sre You will often be paged for Small tasks in the middle of the night where something has gone wrong. You've got an alert Because your node has gone down It is very common for software and hardware to have bugs And so there are a number of reasons why a kubernetes node may stop working Capy has something called a machine health check that monitors nodes And when they go unhealthy replaces them automatically Which keeps sre's happy because they're going to get fewer notifications And are less likely to have to wake up at 3 a.m to replace a node Finally while capy can't automatically replace the control plane yet What it can do is automatically upgrade your working machines Say you've realized that you've deployed them as one instance size and actually you've not got enough memory Update the configuration to use a bigger instance type And using a machine deployment capy can automatically replace all of those worker nodes for you Now we know why we need capy. Let's take a closer look at how it actually works Capy is implemented via a number of custom resource definitions on kubernetes Custom resource definitions allow you to extend the kubernetes api and add your own custom types In capy there are four core custom resources Busters which act as a parent for all of the other resources every capy resource must belong to a cluster And machines machine sets and machine deployments which are all very closely related Machines are responsible for creating virtual machines that will join the cluster Machine sets are responsible for creating a number of identical machines Just like a replica set is responsible for creating an identical number of identical pods And a machine deployment, which is responsible for rolling out updates to the machines Should there be any changes to configuration much like a deployment rolls out configuration changes When there are changes for the pods When managing capy you're going to need something called a management cluster This management cluster is going to be where you will create the capy resources such as the cluster of machines And this cluster is a normal kubernetes cluster just like any other. It has its own control plane. It has its own worker nodes When you create a cluster on your capy management cluster This is going to create the shared resources that are required for the workload cluster This is things like a vpc load balancer for the api server dns records And you're also going to need a kubernetes control plane This is going to be responsible for managing the control plane of the workload cluster However, without any Worker nodes, you can't really run any work workload on your cluster But this is where machine deployment, which creates machine set, which creates machines will then create your worker nodes That allow your workload pods to run on your workload cluster If you take a closer look at the management cluster for capy, you'll notice that there are actually four sets of controllers You have your core capy controllers, which are responsible for things like the cluster and machine deployments and machine sets You'll have the provider controllers, which are responsible for the interactions between capy and the cloud provider For instance, creating an easy to virtual machine on aws There's the bootstrap provider, which is responsible for configuring those virtual machines to ensure that they join the cluster And the control plane controller, which is responsible for ensuring that the complex kubernetes control plane of your workload cluster is healthy at all times So let's take a look at those resources in a little bit closer And see how we can create one of these clusters The first resource you'll need is a cluster There's not too much on this Apart from some information about the shared networking and a reference to a pubadm control plane and an aws cluster The aws cluster that we're referring to is what is referred to as an infrastructure cluster This contains the information unique to aws about how to configure the cluster For instance, which region it's going to run When you create your aws cluster a aws infrastructure infrastructure provider cluster controller Will behind the scenes create the vpc The dns records low balances, etc that are required for the cluster Next we have the control plane And this is responsible for configuring how pubadm will start the control plane of that workload cluster For example, it tells us which kubernetes version the control plane is going to be running You'll notice here that this refers to something called an aws machine template This machine template tells the control plane controller how to create the virtual machines that will run the control plane For example, it's aws specific again and tells the aws infrastructure provider machine controller how to create the machine On this example on a t3 large instance with the given instance profile and sshk these four resources define a basic cluster With this you'll have a cluster. It will be functional. You will have a control plane But you won't have any workload nodes as i've suggested earlier So to create those we're going to need a machine deployment Which again references a machine template, but also a bootstrap config Tells us how many replicas we're going to want Or our workload nodes in this case at the moment it says zero it'll also say What version of kubernetes we're going to be running in this case version 1 18 2 The final part that we're going to need is that bootstrap config template This is going to tell kubernetes how to join the node that we bring up to the cluster so With all that in mind let's take those yaml resources and create a cluster a bootstrap my management cluster on a kind cluster that is kubernetes in docker Which is the recommended way to set up your first management cluster We can see if we get the pods that are running on this cluster That we have a number of cappy related pods running So I also have all of that yaml that I've previously shown in a single file here So i'm going to apply that to create the cluster You can see various resources have been created there If now we look at the right hand pane you can see it says that there is a cluster provisioning If we get that cluster take a look at it We can see down here that it's waiting for the control plane which was mentioned earlier And the infrastructure provider to be ready Now this process is probably going to take about 20 or 30 minutes So i'm going to cut here and fast forward So the cluster is now provisioned and we can see we have a single control plane machine started This is a default for cappy whenever you install the cluster It will come up with a single control plane machine and no worker nodes While it is a functional cluster, we probably want to add some worker nodes. So let's do that now We have a machine deployment already applied However, the machine deployment has no replicas But that's fine because we can scale it up really easily So now that that machine deployment is scaled up We see on the right hand side three new machines that are going to be joining the cluster Because we only have one machine running in our control plane at the moment our cluster isn't very fault tolerant If we lose this control plane machine, we've pretty much lost the cluster But let's scale that up as well So now behind the scenes what the control plane controller is going to do is scale up one by one a new control plane machine Join that to the cluster and then scale up the next and join that to the cluster This is done safely so that xcd does not lose quorum at all during this process now If we want to access the cluster, we're going to need to get the cube config I have a small script that is going to let me do that. Let's just run that now This just downloads a secret from the Management cluster and converts it into a cube config file Now what I should be able to do is get the nodes from that cluster using that newly downloaded cube config We can see at the moment only our master is in the cluster and it's actually not ready Let's just make a watch over here for that The reason that the node is not ready is because it doesn't have any networking at the moment The final step for setting up a cluster when using capi is to install the cni driver For this we're going to use calico. This will set up the networking for pod to pod communication within the cluster Once that is done Our master over here should go into the ready status you can see now also One of the worker nodes has just joined the cluster the other two should follow suit The next thing I'm going to demonstrate is machine health checking I've already prepared a machine health check example here machine health check is targeting the conditions of type ready on the nodes if either If the condition is in either the unknown or false status for 30 seconds The machine will be considered unhealthy and the machine will be taken out of service and replaced by a new machine We can also see here. We have a max unhealthy value of 40 percent This means that if more than 40 percent of the nodes targeted by this machine health check are unhealthy Then the machine health check won't replace any nodes. This is to prevent Broken clusters getting further broken by machine health check actions Let's apply that now and if we get that machine health check back We can see it's expecting three machines and currently sees three healthy machines Let's simulate a failure on this To simulate the failure. I need to jump through a bastion host onto the machine and shut down kubelet AWS cluster has an option to set up a bastion host for us, which is done So under this IP so I can use that to jump through to get on to one of the nodes Let's use this note one two two ten more one two two two three seven I'll shut down kubelet on this note and what we should see shortly Is that the health checks will start failing and this node will go into an unready state Let's exit from here And take a look at the machine health check logs You can see now the node has gone unready And what we should see shortly is that one of the machines gets replaced By a new machine that because the machine health check has deleted The broken machine We can see here that a fourth machine has joined the cluster Our not ready node should now go Away as the machine that is running on Will have been deleted See here it's gone scheduling disabled the machine has gone into a deleting phase The machine health check has now replaced that machine. The final thing I would like to demo Is that the machine deployment controller will actually update machines when we make a change If we look at the machine deployment We can see that it references a machine template here If we look at the machine templates We can see that I actually have two for the machine deployment j speed capi one md zero And the same with an x large tag on the end All I have done between this and the previous Is change the instance type from a large with x large instance So we can edit our machine deployment To reference the new machine template, which should then cause a roll out of new machines What we should see now on the right is that we start seeing new machines being provisioned You can see this one has a different ID here compared to the previous ones because it is using a new machine set And you can see that there are now two machine sets For this machine deployment One of which has three replicas and the next has one machine deployment is performing that rolling upgrade for us This concludes the demo Thanks, Joel. That was a great demo and I hope everyone watching got a lot out of it So if you're excited about cluster api like I know Joel and I are Let's talk about how you can get involved with the project The first place that I would say to go is to read the book This book has information about how you can get started with the project It describes each of the components of the project and where you can go for more information I would especially like to highlight the quick start section As that will allow you to start up your own cluster using cluster api And you can do that with or without a cloud provider So if you only have access to docker or to a virtual machine host You could use that to create your own cluster locally and if you have access to something like a Google cloud or vSphere you could use those to spawn clusters as well So after you've played around a little bit You might have some questions and I would say come to the cluster api channel on the kubernetes slack This is a great place. There's always lots of people hanging out there Good discussions happening and if you have a question about the project more likely than not someone is there who can answer it And if if chat isn't quite your thing I would look in the cluster api book because there is a mailing list And you can join there and you'll get the same level of of interaction So if you'd like to get a little closer to the group come attend one of our meetings They're on wednesdays at 1700 utc at this zoom link It's a great place to come introduce yourself meet other members of the community and learn about what's happening with the project And if you have questions or concerns about how to use it. This is also a great place to bring those issues If you'd like to look at some of the past recordings and notes that have taken place Look at the cluster api book because there's a great link that you you can go and look at All the historical information that we've created So if you're really excited and want to go to the next step that I would say come check out the project Maybe propose a change if you've got an idea On github at kubernetes sigs cluster dash api is where you'll find the main project And then sub projects are prefixed with cluster dash api dash So you'll find all the providers and things like the machine health checker and other components Now let's talk about what's coming up in the future for cluster api Cluster api is currently in an alpha api version state, but we are rapidly moving towards a beta api This will hopefully come out perhaps later this year or early next year We're also always working on increasing the testing that takes place around the cluster api components As you can imagine, there are many different providers to test and lots of different scenarios that we like to examine So there's always room to add more testing We're also making some improvements to the cli tooling if you're familiar with kubectl You might be also familiar with cluster cto, which is our controller to allow you to interact directly with the cluster api components There's a little bit of internal refactoring going on to make the alignment of the internal apis more in sync with the kubernetes apis There's also a feature coming up on bootstrap failure detection that we'll be looking into to see how we can provide more early warnings about when a bootstrap fails And there's also a big feature request out for a pluggable load balance This would be a dynamic way to change the load balancers in front of your cluster So I hope you can see there's a lot of interesting work going on and there's lots of room for for more people to come on board and Help us out and maybe bring your own ideas So with that, I'd like to say thank you Here are a couple links that you could use The cluster api book is going to be immensely helpful I would start there and then come check out the cluster api project and examine the code And if you'd like to stay in touch You'll have our I guess twitter and mastodon handles here Please reach out to joel or myself. We'd love to talk more about this Hey everybody Does anyone have any questions about what they've just seen that's an interesting question Uh, how does hive compared to capi? Mike how familiar are you with hive? Nearly as familiar as you are I'm sure Yes, because I'm very familiar with hive As far as I'm aware, they're quite different projects. Um, I'm not an expert on hive at all um My understanding is that hive is kind of for like punching out clusters the same way But Hive You know, you've got the machine pull thing in hive that is like creates multiple machine sets in the child cluster It's it's a similar thing, but hive I think is just the management cluster part Where in capi itself It covers the whole lot. So we actually have some elements of capi within open shift. So Uh machines and machine sets the machine api, which is the team that Mike and I work on Uh, we have those elements of capi already in open shift and hive kind of does the management bit So if you take hive And machine api You kind of have capi If that makes sense Think that hive had as many features for kind of like multi cluster management that capi does But I I could be way off on that. That was just kind of my impression from what I've studied I can't say I've ever played with hive. Uh, I just hear stuff on the river mill about what it does Like I think our main interaction with hive is seeing, um, you know from the open shift side how hive is used I think, you know through terraform to deploy these things, but You know, I I think what I've seen from capi is that they're really trying to target end users who want to do this kind of A centralized control management cluster and then be able to control several different workload clusters from that And I'm I'm not familiar enough with hive to know if that's really kind of a central goal of hive I hope that helps like answer the question a little bit Yeah, I can appreciate why they might sound the same No, like I I've been working on the machine api stuff for like nine months now And I still not really sure what the difference between or not really sure what hive is like We've been working with them recently because they We're going to take over some of the stuff that hive does. I think and bring it into Open shift core Doesn't mean I actually know what it is It's it's a good point to attempt something for us to look into and kind of learn a little more about for sure Yeah, to mike's point a minute ago about who uses capi. There's a lot of people who will use it and I think, um Not sure if i'm allowed to say their name, but vm where they're using it Um to run their their kubernetes hosted offering where you ask them for a cluster and that's they use it under the hood As well. So it's there's definitely potential for I think A lot of people to take this on if they want if they want to sell kubernetes clusters or just manage it in their own company like if I used to work for a startup and we ran 12 or 13 clusters as towards the end and that was That was a lot of work Managing them via something like capi would have made things a lot easier To the point that joel's talking about too, you know, there are some big cloud providers who You know who participate with the cluster api project, but there are also other shops You know the one that comes to mind immediately for me is new relic A new relic they're contributing to the project, but they're a heavy user of the project They use it to manage their Internal infrastructure and it's you know, so I don't even think they're really selling a kubernetes offering per se But they use it for their own development purposes. So there's a lot of different users kind of coming to the table questions, yeah, thanks Awesome. Thank you so much michael and joel. Are there any other questions in the chat? Very quickly Been a long day for the room mods here. Yeah Yeah, all right. Looks like there aren't anymore. Thank you all so much