 Well, if that countdown didn't get your morning afternoon or evening started, then maybe maybe another cup of coffee might do Thank y'all for joining us today Welcome to cloud native live where we dive into the code behind cloud native I'm Taylor Dullesall a senior developer advocate at HashiCorp focused on all things infrastructure application delivery and developer experience Every week we bring on a new set of presenters to showcase how to work with cloud native technologies They will build things they will break things and they will answer your questions Join us every Wednesday at 11 a.m. Eastern time This week we have Jason Deteris here with us to talk about the cluster API with a bit of pixie dust I gotta say I'm looking forward to this magical presentation today Also, join us for kubecon and cloud native con virtual North America from October 11th to the 15th To hear the latest from the cloud native community This is an official live stream of the CNCF and as such is subject to the CNCF code of conduct Please do not add anything to the chat or questions that would be in violation of that code of conduct Basically, please be excellent to one another really looking forward to having fun today with that I'd like to hand it over to Jason to kick off today's presentation Yeah, so as Taylor said, my name is Jason de Tiberis I am a principal software engineer at Equinix metal the cloud provider formerly known as packet I know some folks in the community may know that name a little bit more and You know, I've been working on infrastructure That's specifically kubernetes infrastructure since about 2015 now and for the last year. I've been working Primarily on how do we enable that infrastructure? In a data center in a more cloud native way You know, how can we take these experiences for managing kubernetes clusters in the cloud and actually apply those in a data center on bare metal Whether it's a bare metal cloud provider like we have at Equinix metal or your own data center. So let me take you into my data center here I Refer to this as us de Tiber one and what we have here is five small form factor AMD nook like machines and On the right is just a little mini itx box. That's running tinkerbell and we'll get into what tinkerbell is in a moment But these five small fall form factor machines Hopefully by the end of the demo I have today Will become part of a actual kubernetes cluster And there's nothing on them to start with And we should be able to bootstrap them, you know from zero to kubernetes relatively quickly today assuming everything works if not we can dive into You know how to troubleshoot this and you know more of what's going on along the way but That's kind of the overview. I do have a slide deck with me today To just kind of go over the basics because i'm going to assume that not everybody watching today knows All of the background technologies that i'm talking about So if we can just get those slides up and we can go ahead and Skip past a couple of these but The first technology to talk about today that's going to underlie how we're going to manage these physical machines is cluster api and Basically, you know, if we want to distill it down into the basics cluster api is a You know a project that's sponsored by kubernetes sig cluster life cycle That's the special interest group in kubernetes dedicated to trying to improve the life cycle management of kubernetes clusters in general and the goal of the project is to provide a declarative set of apis similar to kubernetes the what kubernetes provides for applications But you know apply it to the infrastructure management that you need for running kubernetes clusters themselves including installation upgrade And you know anything else that you need to do to tweak the configuration of a running kubernetes cluster from the infrastructure side Um, and you know, basically the way it works. Uh, we actually use kubernetes to manage kubernetes So for cluster api, there is a kubernetes cluster that is running somewhere You deploy the cluster api components to it You define the cluster like you would any other kubernetes resource, you know, basically just a big ol ball of yaml You throw it at the api server and out the other end you get a kubernetes cluster And that could be running in one of the various different Supported infrastructure providers that cluster api supports anything from aws to vSphere to equinex metal and now even tinkerbell for running bare metal And that's I think basically all I want to do for cluster api, but if anybody has questions as we get along we can easily dive deeper into that the other project that I've already mentioned is tinkerbell and tinkerbell is basically trying to take what kubernetes did and Apply that api centric management to actual physical infrastructure in a data center A lot of the existing tools that are out there in the infrastructure management space Do things like provision os's on machines Be able to you know Boot machines over the network all of that You know a lot of the systems that were previously designed were designed before You know cloud native was really a thing and tinkerbell is trying to basically take those cloud native approaches And apply it to that infrastructure space And there's a few different components in there and because we're building on top of tinkerbell Probably go into it a little bit more There's several different microservices that make up what we call tinkerbell The main one is the actual tinkerbell workflow engine itself And this is basically what runs underlying it's the you know connects to the data store it provides the basic api for interacting with the hardware and there's Three basic resources in tinkerbell a hardware a template and a workflow the hardware is actually You know the description of the infrastructure that you're going to manage itself So you point it towards You know the the actual machine you give it the mac addresses for the machine What ip addresses you expect that that machine to have and some other basic definitions And we can dive into that when we get into the demo too But it's basically a description of the hardware that tinkerbell is expected to manage And then you define a set of actions that you want to do on that hardware through what's called a template and that template basically just says You know Do this step do that step, you know until you get to the end of whatever it is you want to do In the general case we talk about provisioning os's on machines In this case we're talking about provisioning kubernetes on a machine But it doesn't even need to be that you can actually do other tasks that you might need to do on infrastructure like You know update the you know firmware on all the machines that you have in the data center You can define those types of actions The other primitive the workflow Basically just takes those other two primitives and ties them together So that tinkerbell knows that there's an actionable thing that you need to do So the workflow just says take this template apply this hardware to this template and go and do the thing and That's where the other microservices start to get involved as well. So, you know, I mentioned the tinkerbell Workflow engine. There's also a boot service. This provides the basic dhcp pixie booting services Uh that are needed by tinkerbell. Uh, so that you know, basically provides, you know, ip addressing to the hosts Based on what you've defined in tinkerbell. It also provides a way to network boot those hosts and Get into a minimal os environment that we can then run the tinkerbell workflows from There's also a key goal metadata service and this we leverage quite heavily With the cluster api integration Because in general when you're bootstrapping cluster api cluster It expects that you can shove user data somewhere And be able to you, you know, take that user data and run it as a script On the host when you come up. So It this is what actually lets the tinkerbell environment act like a real cloud provider instead of you know, just You know the the basic Pixie boot environments that a lot of people are familiar with um, and then the other uh Component is the actual worker and this is just uh, basically a client that connects to tinkerbell And says, you know, do I have any workflows assigned? What are they and goes off and executes the actual actions that make up those steps? um, and that's all run in what we've generally called a minimal operating system installation environment And that's just a slimmed down os that does nothing but boots up runs that tinkerbell worker contacts tinkerbell and Does what it needs to do And the one other project that i'm going to bring up today and I haven't yet mentioned it But because we are talking about uh, ha kubernetes clusters One of the biggest requirements that you have there is you need some way to have a Persistent end point that stays constant For the life of that cluster whether that is a load balancer of some kind Or a virtual ip address that can be migrated between the machines You know, you have elastic ip's and actual cloud providers you can do it through Um, you know various different types of virtual ip mechanisms running on hosts or you know through a load balancer and You know because a lot of the things that we do In the tinkerbell space we try to keep things As simple as possible and try to keep things as You know a minimal as you know What's the minimal way that we can do this to provide the exact functionality that we need? And not try to take on any additional management so When we looked at how do we enable ha for these clusters? You know We wanted to avoid doing things like trying to spin up an ha proxy load balancer and manage You know the uh the back end end points on that ha proxy load balancer or anything like that And that's where cube vip comes in and it's basically a project that manages um You know virtual ip's either for the kubernetes control plane or Of services of type load balancer for kubernetes and in our case, we're doing the control plane load balancing and You know, it does this through a couple of different mechanisms that are involved It can either do it through ARP advertisements of the virtual ip address or it can do a bgp based configuration and actually publish the bgp address out to different bgp peers and Then you enable like, you know full Active active type, you know load balancing, you know essentially behind You know bgp in our case For the demo today, we're using ARP based Just to keep the simplicity down In the environment that I have here. I did you know didn't require having to set up a bgp server and you know publishing those routes anywhere to Enable access to that so in my case I can just do the You know ARP advertisements and and be done with it You know in this link there's also links out to all of these different projects and The demo script that I'll be running through today as well Because hopefully, you know, if everything works well, I'll be able to run through actually creating You know kubernetes cluster using this and I want to pause there if there are any questions or Anybody wants any clarification on anything that I've kind of run through real quick Awesome. Awesome. I haven't seen any questions coming yet, but if you do have any, please Feel free to ask them and we can get to those Clearly jason, we've come a very long way from cube up that sh so very excited to see that Yeah, and like I said the the idea here is you know The cluster api project has shown being able to create You know kubernetes clusters and be able to more easily manage these clusters across various different cloud providers and there have been a few attempts in the past at doing that in data centers as well With our approach with tinkerbell though, we wanted to try to take as cloud native as an approach that as we could You know try not to basically shoehorn You know cloud native management through cluster api into that traditional data center management You know, how can we you know more accurately do real cloud native management in the data center So in this environment right now, I've already stood up cluster api in a local kind cluster I've already got the cluster api provider tinkerbell stuff already installed Basically just to save the time for bootstrapping that bit and if I come in here Actually, I think it's kubectl api resources We can see here that the first thing you'll notice is is there's three entries down here hardware templates and workflow related to tinkerbell and These basically just coincide with the hardware's Templates and workflows that I discussed earlier This just exposes them through a thin shim layer through the kubernetes api instead of having to Talk directly to tinkerbell It gives me the opportunity that as we start defining these things I can actually look at the status of them, you know through kubernetes instead of having to switch back and forth To tinkerbell and and back but we also have, you know Some resources related to cluster api for tinkerbell specifically these tinkerbell clusters tinkerbell machines and tinkerbell machine templates which coincide to the same similar types that are exposed by the other infrastructure providers Just in our case, you know, they're tinkerbell specific and this is what's going to let us define the cluster there um So at this point if I actually do cube cuddle get hardware We'll see that there's nothing actually found here. I haven't actually defined anything In kubernetes related to this but if I come over here and I'm going to Cheat here a little bit I've already predefined You know five hardware resources related to the Um small factor machines that I have over here and if I switch back over Um, so the hardware cam folks that are there They're basically labeled a b c d e from the bottom to the top So I've also cheated a little bit and given these id's Suffixes that you know correspond to those so that I can more easily identify which You know pieces of hardware we need to toggle as we need to toggle them As we go along But if I come back here and uh, actually first go to my cheat sheet um And as I mentioned the link to this is in that slide deck um, I can come over here and I can basically Create this hardware that screen And all right, so the hardware is now created uh, we see it in Kubernetes and if we describe it we'll get some more information about it And this is all stuff that I've predefined within tinkerbell for the purposes of the cluster api integration And if I take this one in particular Uh, what we see is is there's a spec the id and this associates with the uh id of the resource within tinkerbell itself And then the status has gone and populated with the actual information from That was defined in tinkerbell um, so basically what we can see here is there's metadata defined on that instance um, it's just some basic metadata that cloud and it's going to be able to use to Pre-populate things as we bootstrap things along um, we've defined that we do want this to Be able to pixie boot and we do want to allow it to run workflows And we've defined it an actual ip address And this is the mac address associated with the With the actual hardware and that's what's really going to tie everything together You know the request is going to come in saying It's coming from this mac address, you know the dhcp request give me an ip and then everything goes from there The other important thing is is that i've defined A disk device here as well and that's important because when we go to actually bootstrap this machine we need to be able to know where to actually write that information and The way hardware is uh, that could be varied from device to device So currently we require people to like pre-define it Eventually we'll be able to add support to auto detect the stuff and populate it as we need to um, the You know the challenge here is is we can't just assume that like the first block device we find is some that we can write to because if you look at like some arm hardware and things like that Sometimes you'll have an sd card that contains your actual firmware that you're You know bootstrapping the device with and if you overwrite that now you've just broken all booting of that machine So uh in this case, you know, you tell it where you want it to be able to deploy to as a prerequisite Um And I see uh, somebody did ask about bmc support in pb and j um, we haven't uh integrated pb and j yet into Uh this workflow, uh, but we do plan on adding that in the future um, you know the The hardware that I have here actually wouldn't even work with pb and j because the remote management is dash-based and that's You know a whole nother specification outside of things like ipmi and um redfish and things like that that Are supported by pb and j but server class hardware is supported by pb and j and We can integrate that into those workflows Uh the main thing that's been stopping us right now is uh, we want to make sure that as we're integrating that into um kind of the standard upstream For lack of a better term reference architecture for tinkerbell and into the cluster api integration uh that we're Kind of doing it in the best way possible because there's various different ways that we can You know add support for it. We can add workflows that go off and you know process the pb and j type requests we can uh kind of build it in a little bit more automated in different fashions either through the cluster api integration or Through tinkerbell itself and we want to try to make sure that we're Doing that integration properly instead of trying to rush, you know the uh The most expedient solution out of the box for us all right And at this point i've got the hardware to find uh, there's nothing else to find so at this point We can go ahead and start making a cluster um and i'm going to Copy and paste here again And I will describe exactly what i'm running before I do this So anybody who's already familiar with cluster api There's a pre published templates that we have for the cluster api resources to make a cluster We're leveraging the same thing here and we're specifying some specific variables that get plugged in there In order to be able to actually create the machine The first one here is this control plane vip And this is actually the ip address that's going to be managed by cube vip and migrate around as needed through the cluster And we're also specifying the pod sider here and this is actually important for My demo use case because the default pod sider Would actually conflict with the physical networking that I have for this lab environment. Um, so i'm overriding that to avoid You know Having having to deal with ip address and routing issues The other things are basically, you know, just the basic things we're saying to start with we just want a single control plane machine Don't create any worker machines to start and that's just to Kind of serialize the startup a little bit for our purposes because I have to toggle these machines somewhat manually And we're also specifying the kubernetes version and you know What that does is, uh, you know, we've actually pre built images with kubernetes with the kubernetes components already in there This is some that most of the other, uh Cloud provider implementations for cluster api have done. We followed the same suit and right now for The way things are configured for this demo environment that actually sits on a web server Uh on the tinkerbell hosts that I have here. Um, so That's the version of kubernetes for the image that I previously built. Um We are looking in the future to move to, um OCI registry based distribution of the Operating system images that contain kubernetes and once we do that we'll be able to actually stream them live from the web pretty much anywhere and I'd be able to have a little bit more flexibility for which, um Which kubernetes version I was doing but for right now This is the only version that I have an image available for right now All right, so at this point, um, I've created the cluster And I can run basically, uh, keep going. I'll get cluster api here We'll get everything that's associated itself with the, um Cluster api category. So this is all of the main, uh cluster api resources plus also the tinkerbell cluster api resources that, uh I've defined as well And we can see here, you know, we have a few different resources. We have the cluster We have some machines. We have this kubernetes control plane machine, which actually manages The control plane for us based off of kubernetes another SIG cluster like cycle project for helping to bootstrap clusters We can also see that it's tried to create one replica That replica is up to date according to the configuration, but it's unavailable right now And uh, that's because I haven't actually turned on the machine and hasn't bootstrapped yet The other thing we can see is that here's this tinkerbell machine This is the bit that's actually associating that, uh, a cluster api machine with the, uh, actual tinkerbell infrastructure And it's going to tell me that, uh, you know, it assigned it to this instance ID, which Basically just gives me the UID of that hardware device And because I cheated a little bit with the, uh UID creation here, I know that's related to the, uh Hardware d box that I have over here. So if I switch over Here And my VM locked up so in this case What I will do instead of triggering it remotely. I'll just go over and power it on real quick by hand All right, so, uh, that's coming up now Unfortunately, uh, I can't even Redirect the text console because that windows machine locked up right now But, uh, I can tell you that it is going ahead and bootstrapping It is attempting to, uh, pixie boot against boots It's going to get that, um Minimal operating system installation environment image It's going to run that and it's going to start executing the, uh tinkerbell workflow Which is the more important thing and I can show you basically What cluster api has created as far as that workflow? so uh workflow actually Just describe it and because I've serialized the creation with just one control plan instance right now I can just describe, you know workflow. I don't have to specify which one Um, and we can see here that basically these are the individual tasks that it's going to run as it goes along um Let's see All right. Yep. All right. So, um, the first the first task that it's going to run It's going to go ahead and stream the image. This is that pre-created image Uh, that I've already created um, it's templating out the url based on some Configuration that I've given it. I told it that tinkerbell Can be found the tinkerbell host can be found at 192 168 1.1 Colon 8080 so that's filling that in for us. Uh, I've told it that I want to use Ubuntu 2004 uh through the um The resource that I created through that template and I told that I wanted kubernetes version 1.18.15 And uh, the important part here is we see that the destination disk This is actually using that data that we pre-populated in the hardware to fill this in Um, so it's right into that disk device that we pre-defined when we configured the hardware It's also adding a basic cloud init, uh, config On top of that image and this is just so that Cloud init will actually run against the hegel metadata service that we have um So we specify, uh, this link local address which I configured this tinkerbell machine Uh to listen on uh that port, you know, 50,061 Is the hegel default port So it's going to go ahead and contact the meta server, uh, that we have set up And uh, I've also given it some basic Uh, a basic default user to set up so that if I wanted to I could, uh, create a cluster Defined in a way that would let me inject my ssh key Through the cluster configuration and be able to access this remotely as it is. I didn't specify that right now So if this actually fails to boot up, uh, I'm gonna have no way to troubleshoot unless we kind of modify that The other thing is is we're dropping in a dsi identify configuration And telling it that we want it to use the ec2 metadata source And this is because The way that cloud init does the data source detection If we didn't tell it force it to use the ec2 metadata service Um, it would default to no metadata service at all And the last thing we do is we issue a kexec and that actually, you know Executes the kernel that we just streamed to that disk via the uh, previous steps Now the one thing that you'll notice is is that in none of this is the actual steps for bootstrap and cluster api itself Uh, as I mentioned, uh, we're contacting the hegel metadata service and that's where that's getting that from so if I actually go and describe That hardware The machine What we'll see here is uh, there's a lot more Uh information here And this is basically all the information That cluster api created for bootstrap in this cluster um, and it Created it in the user data section of the metadata. So when cloud init runs, it's going to find this and It's going to go ahead and execute the script just as if it was You know, one of the other, you know, real cloud so to speak Um, so with all of that out of the way, this machine should actually be bootstrapped by now So if I do another get on cluster api We do see that That is ready. We got the machine. We got the kube adm control plane And it looks like it is unavailable. Um, so let me go ahead and Um, let me see what what is the name of that cluster that Demo, okay, so I couldn't use that. All right, and if I can type today um I can get the kube config for it and sure enough that machine failed to boot on me So that is fun Um As as it's such with live demos always, uh, always they always like to throw a wrench in the plans at the best of times So let's see um, we can go into this and Let's see That's 108 So the machine is up. That is interesting I think that machine You know what it is I bet The when I was running through the demo the first time it injected a new boot device in that machine for the disc that I deployed to and It pre booted into that and that's why I was actually able to ssh into it because I did not can define the uh ssh key Uh with this cluster that I created Gotcha, gotcha gotcha. So that is fun um So what would have happened You know in the case is this would have bootstrapped up. Uh, it would have configured, uh cube vip Uh as part of that bootstrap process with that You know virtual ip that we defined Um, and then at that point Um, this machine would be up. We could apply a c and i Um, and then we would be able to scale the clusters up at that point um But it's kind of hard to proceed from this point without breaking out a keyboard and monitor and that's going to be a bit awkward for this Especially since that windows vm locked up on me Uh for the bit of remote management I could do on these machines um So yes, uh, I would uh, gladly be able to chat offline, uh about um You know, how do we enable the remote management of things like power and boot order, uh for the machines, uh, especially with concerns around ipmi things like that I know there are things that we have done internally at equinex metal where we can be a lot more opinionated to help with those things such as you know Updating, you know, most of the hardware that we have to run Uh open bmc and being able to you know configure things on the firmware side that Most folks in a data center probably aren't doing or don't want to do Um, so as we start adding things like pb and j support We want to make sure that we're not just supporting that overly opinionated environment that we care about internally But also how do we support? Uh things that folks are going to hit in the real world as well And that's likely to be you know be able to support things like You know some of the consumer based hardware, whether it's like these um AMD based boxes with dash or the intel uh based stuff with their management firmware And with that, um You know, obviously it's kind of hard to show with the actual ha capabilities here But you know cubit basically acts like a kubernetes controller It keeps an eye on things it uses leader election to determine Who is the primary machine? At least in the arp case, which is what we have configured here So basically whichever one is able to connect to the local api server and declare itself the leader That's the one that's going to advertise that the arp for the VIP that we've defined So that ensures that you know That you know, there's minimal interruption of that ip basically You know as things go along For folks that want more highly available machines be able to do things like Be able to scale out requests across multiple api servers The bgp base configuration You know is there for that and There's also been additional support added for Being able to update dhcp As well So, you know, even if you don't want to deal with bgp and you have access to being able to access dhcp That's another alternative there but we also Want to make sure that we don't make things overly opinionated and only support qvip We want to be able to support other types of load balancers as well and as cluster api figures out how to do kind of proper Like load balancer support across different providers We'll be able to you know integrate with that and consume things because right now There's other providers that have multiple load balancer options and some of those are you know easily applicable to Uh other providers as well, especially the in data center ones, whether it's the open stack provider or Um, the vSphere provider or tinkerbell metal cubed all of those Uh, so we want to maybe we don't want to require each of those infrastructure providers To redefine all of the load balancers that they want to support that sort of thing So that'll be coming at some point in the future But for now qvip was kind of the minimal way to get that uh ha support You know on physical hardware without requiring, you know crazy management of uh external machines and all of the complications and all of the things that can go wrong along the way in doing so I did also mention that uh right now The image streaming is done off of a web server There's been support added to the tinkerbell tooling And some of the predefined actions to actually support pushing and pulling images directly from an oci based registry using the oras tool So we're looking at adopting that as well for this integration Which means kind of one less requirement that you have for kind of bootstrapping the environment to get started And it simplifies the management and will give us the ability to actually have public images available similar to like the aws provider and some of the other ones Not that we would recommend folks do that in production environments We want folks to kind of create their own images and uh You know do the types of verification with their own workloads That they'll be running on the cluster uh to ensure that you know everything works like they want but You know having those images available for the initial you know poc use case or or demo use case Would greatly simplify stand-up of uh the environment One one question that I had for you there jason was kind of the I've liked how you've gone through and kind of described each of these abstractions with the project with cluster api with tinkerbell Were there any Were there any abstractions that you came across that kind of were intentional at first or Were there some that were things that you really didn't see coming and then you're like We should draw a line here or or kind of draw out that abstraction when it came to composing This this project and working with these two Yeah, so I think I briefly touched that touch on that a bit with like the load balancer and you know And and trying to add that as like a first-class citizen with like cluster api You know when we started You know part of it was is uh, you know Not necessarily not seeing the need for those abstractions but trying to limit the complexity and the You know the amount of work it takes for an initial implementation to go so we tried to stay um You know very minimal with the cluster api abstractions, especially in the in the early days But we tried not to be overly simplistic at the same point, you know the idea wasn't to basically define an arbitrary, you know one cloud abstraction to rule them all type of abstraction because you know that's Been tried various places in the past and every place that it's worked you end up with You know a least common denominator that ends up, you know working good for you know generally a demo poc use case and very limited kind of production use cases and You know generally folks pretty quickly outgrow those super You know limited abstractions when you try to attract them across all providers So, you know, that's the reason why you know when we define the resources, you know There's this tinkerbell machine in addition to a machine resource We didn't want to try to abstract things away too much with the With the way that we were defining things But at the same point, you know, uh, there were plenty of us and you know who talked about You know, how do we support use cases outside of the cloud? And you know some of the things that are going to be important there, especially if we don't want Every single, you know on premises, you know deployment to have to define their own particular cloud provider And and we took that limitation at first To try to get the you know project bootstrapped, you know in some cases, you know simplify You know the conversations that you're having to come up with consensus among contributors all of that stuff The more stuff that you can kind of throw out Of those early conversations the easier it is to great, you know, create consensus and and be able to get started Um, but as we start seeing more adoption in other places now Uh, some of those ideas are starting to bubble up and become bigger pain points. So, uh, I know in addition to uh The uh proposal that's out there being worked on for adding load balancer support There's other folks that are looking to add things like, uh, ip address management support To cluster api proper, which is something that Will come in, uh great, you know, really handy to us on the tinkerbell side Because right now we define require people to pre-define You know the ip addresses on all these hardware devices But if we can integrate with, you know, some type of, you know, authoritative ip address management solution that would give us the ability to Be a bit more flexible there um You know other places where I see room for abstraction as well as You know, especially on the, you know data center front You know being able to configure uh firewalls and you know, uh equivalent to you know security groups in the clouds You know being able to define those types of things to provide more granular External restriction of, you know, what devices can communicate with other devices? Um, that sort of thing um You know, and and that's going to be some that's different for, you know, pretty much every on-premises environment depending on what You know, uh vendor they, you know, go go through for that type of solution Or if they've, you know, in some cases now with software defined networking written their own solution in that area um The other the other aspect too is, you know, more granular control over networking um, you know right now, you know in the major cloud providers you can generally be kind of Opinionated in those things, you know in AWS everything is vpc based now, uh, but you know, uh when you start looking at things like um Open stack or, you know vSphere or uh tinkerbell, you know, what's going to be available networking-wise to provide that kind of automated networking configuration is going to be uh a lot more diverse Um, then you see what kind of the major cloud providers So we want to be able to you know, we'll eventually want to be able to support those types of abstractions as well and whether those are proper abstractions within cluster api that are shared globally or their You know external abstractions that are shared between multiple providers. That's yet to be seen Uh, but I fully expect at some point we're going to see that, you know in an ideal world I would love a you know the ability for folks to Be able to treat their data centers like, you know, AWS does like, you know, google does or you know any of the large, you know Operators, you know, they should be able to just rack and stack machines have them automatically become available Be used as needed if you have a hardware fault Um, you know, you're not troubleshooting down, you know individual hardware faults within an individual device You're basically ripping and replacing that device because it's cheaper than um You know a person's time and labor To replace the physical hardware than it is to, you know deal with troubleshooting down, you know a failed, you know ram module You know and and you know being able to do that but also provide things like Virtual network isolation so that hey, I want this, you know These turn these machines into a kubernetes cluster. They get deployed on their own separate vlan with the proper, you know ingress and egress routing That it needs for that More granular than that, you know, that would provide, you know granular restrictions between clusters themselves at least that, you know, the virtual level, you know You know, I'm not going to say it would be, you know The same as physical hardware network separation, but you know, it's much improved over just throwing everything on the same L2 broadcast domain type of thing And then, you know with the firewall type thing, you know being able to integrate at that level And be able to say that, you know a workload running on, you know, this subset of worker machines cannot access the You know These various ports outside of the, you know, prescripted You know networking ports to access the internal, you know, api server client that sort of thing Uh, you know would provide, you know more isolated restriction on a per workload basis as well um, so um I see those things happening in the future, but again, you know You can do things both faster and better by Limiting the scope that you do at first and as we define these things and hit the limitations In various areas, that's, you know, that's the beauty of the community. That's that's when we get together and we say You know, how do we solve this and and what's the right approach and I won't say that, you know Cluster api is the perfect abstraction to rule them all around declarative management of kubernetes clusters But it seemed to do Pretty well and you know, it's uh, you know The more people that adopt it the more people we have Bring in their own expertise and feedback into the group And the better we improve, you know, the entire ecosystem for everybody And I think that's the goal that we have is not necessarily be the best Uh, You know abstraction around building a kubernetes cluster if I wanted to do that I would write a super overly opinionated installer that said, you know, give me a five node cluster on aws And that's all you you know, that's the only input you give it and out comes a cluster That'll work great for about five people. Um, you know, once you get into like the more diverse use cases, especially Um with people that are having to run, you know, uh workloads in highly regulated environments or Uh that are running workloads that are going to be attacked by You know nation state Level threat actors, you know, they have a lot of different requirements and we want to be able to make sure that we can Uh support all the varied use cases because similar to cube adm before it Uh, the thing with cluster api is we wanted to build the next building block on top of cube adm to help enable folks to Build out and manage these these kubernetes clusters, you know In the early days, you know every different Uh kubernetes vendor every different types of you know open source distribution You know, you know vanilla kubernetes You know, there were there were different installers in different ways to manage these things And the hope is is to try to unify some of that effort so that Not everybody's having to go out and the next kubernetes release comes out and like oh my god This doesn't work with our tooling anymore. How do we reverse engineer all that stuff, you know, provide a common substrate You know so that You know people can worry more about building You know the actual features that the customer actually cares about, you know At this point, I think installing and upgrading kubernetes cluster is table stakes for anybody in the market So, you know everybody, you know all the customers want more Focus on well, what are you enabling for my workloads? How do I do ci cd on top of this? How do I? You know take care of some of these more, you know complicated challenges. How do I run a iml workloads on my You know cluster. How do I deal with you know the security aspects and You know, let's let everybody focus on those higher level concerns and You know kind of share some of that burden for kind of The common crud that you know Only us infrastructure geeks actually really care about And that's and that's what I I think is most helpful. Um when you kind of talked about, you know Not not having the best abstraction or things like that Um, I feel like when people get to that mind space and the headspace of focusing on the abstraction solely Then you kind of get into that xkcd comic one of my favorites where it's uh Two people talking and they and they say they're 14 competing standards We should make one unite them all on that next pane as they're 15 competing standards so Totally feel that and then I really like what you said around kind of you know Everyone should be able to kind of work with their infrastructure similar to these these public clouds and how they They handle their workloads and I've liked seeing you know in in my uh in my time within the industry I've really liked taking a look at seeing the jumps back and forth between The mainframe setup and the personal computing setup and then honestly it really excites me to kind of be around for What is a data center at home look like you know? I feel like cluster api and some other things on that front allow for that granted It might be some raspberry pi clusters or some of the intel nooks, but you know nothing nothing too wild or crazy Or you know, hopefully we don't have to worry about like power consumption and water cooling and you know anything like that I'm Harkening back to the movie golden eye in the end if for anyone that's seen that I won't spoil it for you but recommend checking that out if you haven't seen it but It's it's really interesting to me and and I really like how you focused on And the community has really focused on what I would consider the right things to focus on and kind of Making sure that you have that feedback from everyone and while you know upgrading Kubernetes is Is historically been difficult to do or really appreciate the fact that that's becoming easier and easier with each passing day When it when it comes to making sure your workloads work on that Of course, you're going to have to experience some some you know back and forth on Has this api endpoint gone from beta to ga like of course you kind of have to deal with some of those things Kubernetes makes a lot of those Those abstractions easier to work with but I think that that's That's also exciting to kind of be working on a platform that's going to help enable easier upgrades So it's not like an ios or android or a core computer operating system update where it's like Well, I've got to rewrite all these things for all these new apis You know, there is a little bit of that But it's nice to be able to have a little bit of a softer landing When it comes to upgrading the kubernetes cluster and still supporting your applications I feel like that's you've really checked all the boxes on those fronts Well, i'm glad you mentioned upgrades actually because I didn't really touch on this But one of the things cluster api does is it treats the individual nodes You know machines and our parlance are roughly one to one You know, there are edge cases where you know, that's not the case, but it basically treats those underlying Instances of back that as a femoral, you know, we don't want to treat You know these things as long running snowflakes because that's where you get into the challenges of upgrades, you know You know does this specific kernel version Have an issue with the specific os packages that are installed in the way that it's used with the container runtime that you have installed with the You know various security configurations You know kms plugins that you might have running on that individual os with the version of kubernetes, you know I I spent time at red hat in the early open shift v3 days, you know, there was You know, even when we controlled the entire stack from the kernel to The kubernetes binaries they would still be you know, these weird edge cases where this version of ip tables doesn't You know have the support for Kind of fake locking For the clients to use, you know that opt-in flag that you can provide to ip tables to say, you know I'm being a good citizen. Don't modify things at the same time. I am type of thing You know that like kubernetes, you know adopted that and you know If they were running that older version of ip tables and they didn't have it all of a sudden You know kubernetes is blowing up and you know, we thought we validated, you know All of the you know use cases that we cared about and we controlled all of the packages that shit But we couldn't even prevent those types of issues So how can we do that in an unopinionated way? You know across various different operating systems across potentially various different kubernetes distributions, not just qadm based You know with the upstream bits You know the the challenge is ridiculous and especially if you throw in different container runtimes in there So the idea was is, you know, what if we didn't worry about that and I'd even take it a step further and You know, most people, you know, have various workloads running on these clusters I would rather see people migrating their workloads between kubernetes clusters as part of their upgrade strategy Rather than upgrading in place because you know, especially if you have those applications are talking to kubernetes itself Then that's where you get into some of those issues around You know the the older deprecated apis that maybe we're just working in the background and you weren't seeing the warnings on and Oops, the upgrade. They're no longer working. If you're migrating the applications then you know You have control over that availability and that availability to roll back that you wouldn't necessarily have with kubernetes upgrades um, and I don't think a lot of people understand but like rollback is Quite a tricky thing in the kubernetes world um, there's there's only a limited window in the part of kubernetes when kubernetes upgrade where you can actually roll back um, and that's basically, you know, you're Um, the the time that you're upgrading that control plane of that cluster if you fully upgrade that control plane of the cluster You know rolling back really isn't an option anymore. Not without a backup of a restore of uh, at cd and then that means you've lost state from You know that last backup to that current point in time That that's a tricky thing to like try to recover from and know the implications of that recovery And and how does that affect the workloads that you have running on it? So You know, I'd much rather see people migrating those applications rather than Everything else because then you can do full validation of the kubernetes cluster make sure that it's fully You know c and cf compliant And then you can also make sure that you know run additional compliance checks and make sure it works for your workloads And then know that you know that migration is going to happen safely And and you don't have to worry about those edge cases around rollback It's it's so true. I think i've been bit by that a few times myself just trying to upgrade in place And it's uh, it's it's it's so much better when you can kind of have a you know You have a guaranteed state when you set up kubernetes for the first time in a lot of cases And so that's I like having that certainty when getting those workloads moved over Well, um, awesome. Awesome. Awesome. We are unfortunately at time uh, jason. I feel like I could talk to you all day This is really been fantastic Thank you so much for your demo today and for everything else Thanks everyone for joining in To the latest episode of cloud native live, uh, it was wonderful to have jason talking about the cluster api in tinkerbell Uh, uh, some time we'll have to ask him if we stop believing in fairies. Will the tinkerbell project disappear? Hopefully not Um, we really like to the interaction and questions from everyone. Thank you all for showing up We'd like to bring we bring you the latest cloud native code every wednesday at 11am eastern Next week, we will have daniel cook presenting optimizing and securing kubernetes workloads with Polaris and goldilocks I really like the progression on all these names and these uh, and in these uh parables and this should be fun Thank you so much for joining us today. We will see you next week. Yes, everybody. Have a good one