 All right, we are going to go ahead and get started. So first of all, I would like to introduce myself. I'm Sayyam Bhattak. I am CNCM from Basidar, working as a software engineer and on Kubernetes and stuff. I'm CK, CKV certified. I'm Influx Ace. I'm Rancher trusted hands member and I run the local meetups at Bangalore for Docker, CNCF, Rancher and Influx. So that's about me. And I would like to thank everyone who is joining us today. And welcome to the CNCF webinar, declarative host upgrades from within Kubernetes. We'll go through some of the housekeeping items before we get started. So during the webinar, you are not allowed to talk as an attendee. There's a Q&A box at the bottom of your screen. Feel free to drop your questions in there and we'll get to as many as we can at the end. There's official webinar to... This is an official webinar of CNCF and as the subject to the CNCF code of conduct, please do not add anything to the chat or questions that would be in the violation of the code of conduct. Basically, please be respectful of all of your fellow participants and presenters. And the recording and the slides will be posted later today by CNCF webinar page. I'd like to introduce our presenters today. We have Adrian Goins, director of community and evangelism at Ranchel Labs, DAX, McDonald's software engineer at Ranchel Labs and Jacob Bling, principal software engineer at Ranchel Labs. So with all of these items, I'll hand over to Adrian, DAX and Jacob to kick off your today's presentation. All right, hey everybody. My name is Adrian Goins and thank you, Sayeem, for that wonderful introduction. Give me a moment here. I will share my screen and we are going to dive right in. So, Sayeem already introduced us, but here we are again. I'm Adrian and I'm joined by DAX, McDonald and Jacob Bling, Kristen and these guys are way smarter than I am. So I'm just here to introduce them. Talk for a moment about what Rancher, now we don't need this agenda slide, talk for a moment about what Rancher believes and how we got to the particular thing that we're going to be demonstrating for you today. Rancher, first of all, everything that we do at Rancher is 100% free and open source. And we believe that the key for production quality Kubernetes at scale has three pillars, certified Kubernetes distribution to enable computing everywhere, centralized management to manage all of that stuff and make your life easier and then a platform for secure application deployment. I want to stress again, everything we do is open source from top to bottom, everything that you're going to see in anything that Rancher does, you can do your, there's no open core, there's no paid features, it's all out there. We have solutions for intercluster communication, container attached storage, we have Kubernetes distributions for the data center and the edge, we have a solution for multi-cluster Kubernetes management, we have a solution for managing fleets of Kubernetes clusters and we're building an integrated deployment engine to make Kubernetes easier for developers to use. And sometimes when we're building these projects, we're able to spin off useful components that you can use anywhere, you can use them in your own Kubernetes clusters, you can take them and embed them in your own products, it's all out there for you to do what you want to do with them. And one of those things that's so cool, like you look at the title of today's thing and first of all, thank you all for joining because honestly, the title is pretty boring. Every other title that we came up with was also pretty boring, but the technology that we're about to show you, you'll be able to walk out of here in an hour and go deploy this and go start using this in any Kubernetes cluster that you're running. And that is called, oh, my slide didn't change, the system upgrade controller, another very sexy name, right? System upgrade, it is what it is. So with that, let me hand it over to DAX who's gonna tell you a little bit about what this thing does before Jacob gives you a demo. DAX, are you ready if I stop sharing? I am ready. I'm gonna stop sharing. It's all you, my friend. All right, let's see. So all of you are answering questions in the Q&A as well. So if you guys see anything that you have questions about, yes, we can see your screen, but it's the wrong one. Share the other one. Okay, try this, swap displays, does this work? So that's good. So as DAX is talking and as Jacob is doing demos, if you have questions, please post those in the Q&A section and we'll be answering them directly or if there's a question that you specifically want Jacob or DAX to answer, you can just put their name in front of it and we'll save it for them. We'll try to pause during the demo to answer questions, otherwise there's a block of time at the end where we'll answer as many as we can. So DAX, take it away. All right, thank you, Adrian. So today we're gonna be talking about the system upgrade controller. So this is a CRD-based controller. The CRD that we've created to, that the controller runs on top of is what we call the plan CRD. So this is just a brief overview. Basically, if we, by using the plan and the controller, we're able to form a declarative API, which is kind of the fundamental concepts of Kubernetes. So the plan that we're gonna demonstrate today is gonna do some really cool stuff for you. We're gonna show you how to, you know, upgrading K3S Kubernetes versions in actually in production today in Rancher 2.4. And then we're gonna kind of show you how the plan CRD works, and we're gonna dive into that. So I just said at a high level, these plans describe the desired state of the cluster. So it's a declarative API. So as you'll see later, you know, you're going to describe, I wanna go to Kubernetes version 1.17 dot, you know, what have you. And then these, the controller is going to work towards upgrading your cluster, that version. And it's not just Kubernetes versions. This framework is very flexible, and we're actually able to upgrade specific, upgrade underlying kernels, or even up, you can even use Kubernetes secrets to parameterize plans to upgrade specific packages that you care about. Again, these are Kubernetes native CRDs. So we use, we leverage things like label selectors to actually choose the nodes that you want to select for upgrading. And then we actually, the work itself is performed in a Kubernetes job with an image. So actually you're going to, so really a flexible framework that I think everyone's really gonna enjoy using. And I think, yeah, I'm gonna dive into, just going right into the plan CRD. So this right here is actually the plan CRD, a small slice of the plan CRD we use to upgrade K3S clusters. You don't actually even have to use Rancher to use this plan. You can simply use the system upgrade controller standalone. And these plans will work just fine. So I just kind of wanted to walk you through some of the high level understanding of the custom resource definition. And for those of you that aren't familiar maybe with CRDs at all, this is kind of what you'd actually be applying, keeps you to apply a plan that looks like this to perform an upgrade after you've installed the system upgrade controller into your cluster. So this plan is really an outstanding intent to mutate your cluster. Here we have, can I go through some of the things on the spec here? So one of the things in the spec we have a version and that's really describing, that's the version of Kubernetes 1.2. This is 1.17.2-K3S1. And then, so that describes what basically is going to inform when this plan has run to completion. So it's this plan where we're gonna keep creating jobs and keep working, trying to make progress as this until the system upgrade controller sees the Kubernetes version is the same as that. Additionally, we have concurrency one here. Concurrency is a field that we leverage to basically provide control over how many nodes are gonna be upgraded at a time. So concurrency one here is going to perform a very basic rolling update. It is going to upgrade one node and then wait for, wait for that upgrade to complete successfully and for that node to come back up before moving on to the next node. And then we have this node selector here which is actually going to match the K3S upgrade label on the underlying nodes. And then of course, we're using service accounts and various pieces of our back to provide the permissions to do this thing. And then another very important thing that we have here is the image that's actually doing this. You're welcome to go find this on GitHub. There's Rancher K3S upgrade. Really, it's a simple image that's just going to swap out the K3S binaries. We're lucky with K3S, that's a really painless upgrade process. But again, you can, as you build images, the system upgrade controller really provides the framework around which to basically do these declarative updates. So, and then of course, another few things that drain force here true. Again, we're kind of helping you along the way with codifying best practices, right? Like if you're going to take a node down, you probably want to drain that node first. Force, we're going to go a little more, it's going to actually go in there and basically ensure that everything is off that node before it gets upgraded. And there's a few other commands here too. You can see the full CRD definition on the K3S or the system upgrade controller on GitHub page. So continuing on here. So this is the plan CRD continued. So the first plan I showed you was a very small version that shows you just how to upgrade K3S only. This is actually a full system kernel upgrade that's parametrized with Kubernetes secrets. And I think Jacob's going to really do a really awesome demo of this later, which is really going to dive into some of the... This one, by the way, is just curl in SSL. But the one I'm wondering is for upgrading kernel. Yes, yeah, this is... Yeah, correct. This is just showing you that how granular you can get with system, the system upgrade controller. Like you can see, in our secret here, you can see that we're a specifying versions of curl, open SSL, and that when we run this image, it's going to upgrade those specific packages on our node. So then you can, the system upgrade controller allows you to do stuff in higher level, Kubernetes YAML, that actually goes down and affects your underlying host. A few other things that, again, concurrency two here, we're actually going to upgrade two nodes at a time. And really, this is just a quick demo showing that how we use secrets to basically provide additional parametrization on top of the plan CRD. And that is, I think, it from me, Jacob. Do you, I think, do you want to take it from there? I think that's kind of a brief overview of the plans. Everyone kind of has a basic understanding of where we're going to be going. I have a quick question. Is this currently in anything that's just taken out of K3OS, but is this in Rancher 2.4? Yes, so the previous one right here, this is actually directly the plan from Rancher 2.4. And you don't even need to run Rancher 2.4. This is a, it's actually on this Rancher K3OS upgrade. Feel free to go to that GitHub. And if you have K3OS clusters, you can now specify any version of Kubernetes that K3OS supports, you can go right to that. So it makes it really, I think, simplifies the upgrade process, which I think is one of the still rough edges in Kubernetes administration. If you use something like system upgrade chiller, you won't have to, you know, it's go and do a lot of the best practices for you. Like it's going to drain nodes, it's going to mark them as unschedulable, you know, and do, you know, what the concurrency of one do a slow rolling update. Nice. Yeah. There aren't any questions specifically related, actually there are two questions that aren't specifically related to this, but I'll throw them out there and then we'll dive into the demo. Jack asks, how does it interact with the cluster auto scaler? How does it interact with it? Either you can answer that. Jacob, do you want to take that one or? Sure. So the only way to interact with the cluster auto scheduling or is by, drains will, you know, negatively impact and scale down, not scale down replica sets, but basically connect pods from nodes, such that those replica sets will want to pull schedule pods, but they can't because the node is then tainted as a unschedulable really that's the extent of it. And that's optional, right? The drain I've got in the K3OS plan by default, and we have it in the K3S plans that come with Rancher 2.4, but you don't have to use it. Again, this is a way of gracefully scooching workloads off of you so you can do some operations and possibly bouncing if you need to. Hope that gets to the question. Yeah, yeah, that was great. Raj asks, what kind of privileges does the SA account need in order to do the system upgrade? And if you're going to talk about that in your demo, maybe we can just say we'll cover that. No, I mean, I kind of gloss over it. It's a very privileged account, right? So the system upgrade controller itself needs a service account such that it can edit nodes, list and create jobs and pods, of course, in its own workspace, in its own namespace, but the real big one is modifying nodes, right? Because the controller, when a job is complete, the way that it sort of keeps track of that for itself is by tagging the node, labeling the node with a particular hash, which is the current latest version. And in the same operation, if you've got a drain or a cordon configured, the controller marks that node as scheduleable again, right? So it needs, it's really got, the controller itself has got a lot of privileges. Then there's the pods, the general run. And as you can imagine, if you're mutating your node, you need essentially full access. And you've got even more permissions than a typical privilege. So these pods are privileged. They've also got capsis boot capability, right? So they can trigger a reboot if they need to. And they've also got the host, PID, IPC and net namespaces. And again, those are to facilitate various reboot regimes. For instance, K3OS, one of the things that it needs to do is do an NSenter into the PID in one namespace such that it can then invoke the reboot because there's no other way to make that happen. And so that's one of the things that, that's one of the things that it does. And I think that pretty much covers it, covers it. So privilege, capsis boot, host, net PID, IPC, namespaces, which is basically, you've got control of the node. Yeah, yeah, that's a lot. All right, let's roll into the demo. Oh, there's one more question before we get, that I'd like to actually touch on because I don't really speak to it here as part of the demo, which is from Mukesh, which is a belief that it's assuming that the server has internet access enabled. What if the one way to get images is via internal artifact? Yeah, so K3S allows you to configure what we call private repositories, which are essentially mirrors. So you can definitely run this in an air gap situation. The only problem is, is then of course, because everything is based, you need to make sure that you cure your images into your air gap environment. But it does work, we tested that in multiple different scenarios. Okay, great. So, all right, I'll stop sharing. And then, we'll get the correct screen. So from my background today, I chose a picture of my cluster that I'll be showing you in an hour. Can we see my system upgrade controller customization script? Yes, sir. Right screen, okay, good. All right, so, all right. What I've got set up for you today is I've got a small multi-pass cluster of Ubuntu, Plionic nodes, running locally on a Linux machine here. And I've sort of go ahead and downloaded basically almost all of the content I possibly could. And I've done the same thing for the cluster as well. And so what I was gonna show you is a slightly more embellished version of the plan that DAX was talking about. And what's actually doing, less so we can page through that, right? So as you can see, I've got a bit more node selector expressions on there and I'm doing a more updated version of K3S. But it's largely the same plan. You can see I've got, for this case, we've got master set to cordon instead of drain. And that's because I'm not really running any workloads, but I wouldn't be running much on the masters anyway. And so it's safe to just mark a node and move on because it makes the demo a little bit more brief. Let's see. And then the only other real sort of trick that we didn't mention with these, notice there's two plans here staring at you. And then there's a server plan and an agent plan. And these are just sort of K3S categories. Servers run the API server and controller nodes and all that stuff whereas agents are just workers. And what we've got going on here is there's this prepare statement that we didn't talk about. And what a prepare section is in a plan is a way of saying, hey, before you do anything, this init container needs to run to completion. And actually pulling in the wrong image so I better edit that on here. This init container needs to run to completion before we can move on to doing the drain or anything else. And what this allows us to do is to server nodes and agent nodes or the plans are going to be triggering jobs at the exact same time. And this allows us to say, okay, the agent plans, the jobs for the agent plans are going to wait for the server plans to complete before they're done. All right, so for me, it's hard to talk and type at the same time. So let's, I'm assuming there's some references in here. So let's get that again. All right, so a cool thing about, let's take a look. I don't think I've actually installed the upgrade controller here at all. So I'm a big fan of keep CTL, get all, dash A. We can see there's no system upgrade controller deployed. So let's do that. I've got a script here, which is actually using customize. So one of the cool things that customize lets you do is you can dump a customization.yaml at the top of your repo and point to various other things, multiple different manifests in your repo. It will essentially process an output and do some targeted replacement of values for you so that you've got sort of a manifest that does what you need it to do. And so in this case, I'm gonna pipe it directly into keep CTL so I can, pardon all the beeps. So customize builds, I'm gonna pull that down. It would actually output it, a big old manifest for deploying the system upgrade controller. And so now we'll take a look at that container creating. Did I not? Well, thankfully that container is actually pretty small. I'm not doing a watch, big fan of watch as well. Okay, I'll put it right. Great. Now, one of the things we're gonna wanna do is deploy the K3S upgrade plan. So, and I'm trying to remember if they will get picked up right away. I believe they will because we got concurrency, yep. And I believe all of those conditions will match right away. So, and immediately jump over here. Oh, and we've got some jobs. So we've got a server job running right here and we've got two agent job which is as, oh, it looks like the server job has already been applied because the thing is bounced. This is typical behavior. You'll see this is what it looks like when you're trying to constantly pull it. And we've got an upgraded version of the master runaway and you'll notice that scheduling was disabled when it bounced and you'll notice that I believe. Nope, you don't see it in one state. Nevermind, that's different. Okay, now we're back to being schedulable and the agent nodes have already been applied too or starting to be applied. We've marked them as disabled, we're forming a drain on there. So you get a lot of churn, you'll see some workload bouncing around. Now's a good time for questions while this is going on. But what's gonna happen here is that the upgrade controller, the first plan for the server finished and then the agent plans waited. As soon as that was done, they began to, they exited their prepare container and they're moving forward, moving forward. You can see we've got all of the jobs complete. And just like that, we've upgraded our case to rescue. Nice, you still have questions or you? We only have one question, a question is from Gorov. And he says, if we were to port the system controller to another Kubernetes stack, would we use separate plans to do OS and Kubernetes upgrades? Probably yes, at least in my opinion, it's good to have a high amount of granularity because one of the things that you get is you can run multiple system upgrade controllers because they only focus on their own name space they're deployed into. And the reason that I'm mentioning that is because you may want to run multiple out of time taking care of different things because maybe you've got a service account that does a bit more. Maybe you got one service account, you know you don't have plans that do things that they don't need and so you've got some limited permissions that you're giving to your pod for every reason that happens to be. But let's see, getting back to doing them separately, I believe you really want to make the smallest amount of item potency at a time that you can and multiple plans of a single namespace will be run serially on a particular node. One of the things that I've got built in here is that you can see that I'm triggering the creation of jobs which are creating pods, right? So in those pods, I've got some pod anti-affinity with anything in the same namespace with the same control, does that make sense? So you can run a number of things or you know, or declare the intent for a number of a different updates when you want to apply at a time and that will have got a nice, okay, I'm applying this thing and I'm applying this thing. Now, if you want to do that all at once, you can but I think that's really kind of, I think that tends to be a paver on the road to disaster. Any other questions? Yeah, there's another, I wasn't sure if you were going to continue, but. I wanted to mention that it is perfectly suited to any particular stack. I mean, I just happen to use K3S because that's one of our products but it's also super easy to use, right? It makes the stand up of clusters just kind of brain dead easy, which I love. Right, right. But it should, there's nothing that ties system upgrade controller to any of our technologies. It is a standalone general purpose controller. Barsad asks if it's possible to do expansion or contraction of pods during the upgrade. Expansion or contraction of pods. Meaning would you, are your replicas sets, I guess I'm not. Yeah, it sounds like could you also change the number of replicas that you're deploying? You could. That's how I would interpret it. You definitely could. I mean, you know, so the repair container is a lot, right? You've got the same permissions that the rest of the containers in that pod, which is a lot. So provided that you've given that pod permissions to modify other nodes or replicas, the fault, you know, the manifest that I have is this cluster, maybe so you can hog a while with it. You can definitely make some changes if you need to. Like, you know, if you need to label a replica set to tag it for whatever reason and scale it down, you could do that. I would advise deploying that off in a prepare script. Okay. So we said that this is in 2.4, but the answer made it sound like it might only be available for K3S in 2.4. Is it available for any Kubernetes cluster that Rancher 2.4 is managing? In 2.4, it's only available for K3S clusters right now. Like, we're hoping to expand it to more, but only for K3S right now. Got it. Sweet. And... Let's see. All right, go for it. I'll hold questions until you're ready again. This will be a good one to feel more questions because the full thing takes a number of minutes, but I want to see when the nodes go through before we move on. Okay. So what we've got going here is, you see I'm running Bionic nodes and by default these come with the 4.15 series kernel. And you know what? I've decided I would like to get the latest and greatest. Yes, it's two years old, but I've got the hardware equivalent or hardware, whatever the heck, that HWB for canonical. It allows me to use the kernel from another thing I can't pronounce which is Yone, right? I get the five that three series kernel and an Alty has a bunch of release. And I'd like to make that happen. As you can see, I've got it parameterized with a version. I've got 53046102, that was the latest as of last night or this morning at three in the morning. I've got an upgrade script to essentially say, you know what? In a container, in a Bionic container, update app get and then install because I'm running in a VM by installing the virtual HWE ATO4, the virtual headers and the virtual image. And at the end of that, this is kind of a neat little kind of a, almost a bit of a gotcha with the controller comes into play. You can see that I've got if one reboots required, then trigger reboot, otherwise don't. And you could say, well, you're only gonna run it the one time. But because I'm running this, invoking this reboot from the inside of a container, I'm actually shutting down the Kubernetes instance in this case before it has a chance to mark the job as completed. So when it comes back up, it's gonna run it again. But reboot require won't be there and everything will just sort of be like, oh, I'm already updated. And so what we're getting, what we're doing with this little bit of script magic is ensuring idempotency, right? And that you get to the instinct that you want without unnecessary side effects. So if you've already done a reboot, we're not gonna need to do it again. And so, you know, this is the script. This is where most of the meat of it is. This is just the selection criteria with a bit of verbiage to make it accessible. You can see I've scaled down the plan to be a zero. And this is to sidestep a bug with labeling a number of nodes at the same time. That said, I believe that I've already applied the plan labels. My fat fingers coming in. So yeah, so I've decided to only label the workers. So you can run this plan. So I'm going to apply this. I'm gonna edit this plan. You give it a concurrency of two and, oops, not 20. So something we can see here is that we are applying two of them at once. Now, unfortunately, these take a few minutes. And I should have started this earlier before us. This is the reason why I wanted to start. This takes an hour between two to three minutes. So if there's any questions, now's a great time. If there's not. We have questions. Okay, okay. Let's tackle those. All right. So Mikesh asked, is it possible to use this container just to upgrade the network configuration in a bare metal environment? Yes. You could definitely use it to do anything that would mutate in there. If that happens to be, you know, whether you're installing a package or dropping a YAML file, you're mutating state. So it is a perfect candidate. And one of the things that we haven't really addressed that we've hinted at a bit is that this is like, this has a lot of overlap with traditional configuration management, right? A lot of these things should look and sound kind of familiar. We're just using Kubernetes as the reconciliation loop in the driver, right, to make all that work and using native resources to pull it off. And that gets me to the fact that the way that the system upgrade controller knows how to use it, and that gets me to the fact that the way that the system upgrade controller knows not to keep applying a plan to a node is that it labels a node with the updated version at the time that it applied it, right? So what that means is plans can mutate over time and apply multiple times to a single node. But as they, you know, when they're in a single state, that is an outstanding intent to mutate individual nodes as per the node selector criteria in your cluster. And so that's a long way of saying, you can totally do configuration management and configure your mix in a hardware, very known time. Sorry, long answers. No, but I mean, this thing is so dynamic and so powerful and so flexible. So here we go. The two nodes have already got good kernels. Nice. And also what we could come up. And so one of the things that I mentioned that you'll see is like, oh, we've got these unknown jobs, right? And that's because the reboot happened before the controller could mark the job that's completed. But then when it came back up, it ran it again, because you can see two jobs on node four, right? And then this one ran with nothing immediately completed. And then the controller knows, okay, that node's done. Nice. Do you recommend running the master plan or the node plan first or is, does it matter? So it depends on what you're doing. In the Kubernetes case, you want to upgrade your API servers first. So it's good. In this case, if you're doing K3S, to upgrade your server nodes first and then do your, not, you know, then do your nodes that aren't, aren't labeled with role, role node Kubernetes.io master equals true. And so those are the selection criteria. One is equals true. And one is, and then the, and then the inverse of that is that, that it's not true. And those are your primary selection criteria from two different things. All right. I'm just answering another question in text. Dominic asks if it's possible to schedule these upgrades where you could, for example, say that you wanted them to run at a particular time or day. So yes and no, that having a cron job for, or not a cron job, but having a, a cron expression for scheduling one to apply these has been a future request. I don't think anyone's actually submitted it, but somebody internally asked about it. So it's something that I'm thinking about when I get time to work on this, I'm working on a number of things. But so that's the, that's kind of the no part, but the, the sort of yes answer to it is, is that you can have a cron job that runs every so often and every day and then set the version in the plan CRD. And then that will trigger applications because the controller will say, this has changed. Therefore, the hash has changed. Therefore, I know I need to apply this to the nodes that are in the selection criteria. And so we can definitely, you know, cobble together a number of existing resources plus this plan to make such a thing work. So, Gaurav asks another question that I think is interesting. We, we create tools to solve problems. And anytime that we use a tool, like if I'm using a hammer, there's lots of other things that I can use instead of a hammer. I could use my fist. I could use a rock. I could use a piece of wood. So if, if somebody wasn't using the system upgrade controller and they wanted to do this. Via, you know, some other manual means, what exactly are we replacing? Yeah. Yeah. So, I think that's really a really good question. And it kind of digs into why I started to develop this specifically for K3 LS. So, you know, I was brought in brought on. I came to Rancher in. After Labor Day sort of started the day after Labor Day last year. Tarts targeted to work on K3 LS specifically pick up that project. Have fun with it. And one of the things that was missing was an upgrade story. I remember had provided some, contributed some scripts to make that work. And that was all manual, that was all manual work. And I'm like, you know what, it would be, you know, and, and, and there's manual to run the scripts. It was manual to read, remount the node that the root of fast and the curl was on because by default that's mounted as read only like all of these things that you have to do were being done manually, but they were eminently scripted. So, you know, I've done DevOps and it's, it's not hard. It's just a little bit tedious to get all these things together. And so for putting together the upgrade story for K3 LS, I'm like, you know what, I'm a big fan of the command pattern. That is embodied in containers, right, which is a container can be your content, as well as the actions to take with that content. Right. In that case, it's a perfect vehicle for delivering an upgrade. So for K3 LS, I made it so we were publishing images. And then I worked on the upgrade control system, it could pull down an image and apply it. Right. So this is a really long answer. And I apologize. And it's very specific to K3 LS, but the law of the, the short answer is I'm replacing manual process with right things that are easy to understand because you can see it in a container definition. Right. So taking that manual process and trans transcribing it into a more declared container definition, and then you've a very nice way of understanding how your systems are going to change. And so since I've been speaking to K3 LS and this upgrade, I actually got quicker today than I did yesterday. I hope we're doing okay on time. Sorry, when I'm on demos, I'm all about shortcuts. So I've got a four node K3 LS cluster. K3 LS cluster heterogeneous. I've got, let's do the, let's just say this. I've got two nukes, eight gig nukes, running a i7 processors, i5, i3, not super fat, but they're nukes, running K3 LS 091, which runs K3 LS, which is why I developed the system up to controller four initially. I've also got an RPI three and an RPI four. The RPI three is for some reason only got half memory, but the RPI four has got four gigs. And those are based off of, those are K3 LS overlays on top of 19, I went into 1910. And they're all running in a cluster together. And I thought, you know what, it would be a cool thing on a demo to just go wild and upgrade a heterogeneous cluster in single declaration. I hope that sounds, I mean, to me, that's wild and cool. I hope it's cool to other people. But getting back to the thing that I was talking about, replacing manual process or something that you can inspect and understand a bit better. I wanted to take a look at the plan. So there's only one plan by default here. And don't mind the stuff at the top there. That's just a bunch of noise from the Wrangler. But you can see that one thing that we didn't mention is that you've also got the ability to trigger your upgrade or get your version values from an external source, what we call a channel. And a channel is just, if you're familiar with, like what's this URL right here, which is a GitHub browser URL, which will redirect you to whatever the latest non-pre-release tag is. This is what the controller expects. It expects a 302 for the location. And the last element of that path is going to be the tag, aka the version to use. And so that would replace, if you use version that replaces channel, but if you don't have version, you've got to have at least a version or a channel to get these definitions to work with the upgrade control. I've got a concurrency of zero here, because I didn't want it to apply by accident. We've got the drain going on as well. Oh, but speaking to that original question, which is what are we replacing? And this was a way of defining, okay, here's my upgrade. I've got a command I need to run, which is K3OS. K3OS is an all-in-one binary. A little bit like, like, RIMGOS is ROS. And I want to run the upgrade. Stop the meetings. I'm going to parameterize it, and I'm going to run it. I'm going to run it. I'm going to run it. You know what? If the kernel is there, upgrade that. If the rebuff is there, upgrade that. Before you do any of that, if you need to remount, remount, rewrite, sync when you're done, reboot when you're done, if you've made a change. And here's a lock file in case somebody was, you know, in case to lock out somebody that was by chance, running an upgrade manually on the host. First one in this. And then there's the source and target, and, and that's what this is. And there's the shell. And then the source, the shell does, does, does, does, does, does, does, does, does. Does it mounts the nodes root file system into the container slash host. And so that's why I'm saying, okay, the sources from the containers K through your system to the hosts K through your system. Oh, that's clear. two more questions, but let me know when it looks sounds like you're about to go to something else. I'm gonna finish this off and try to listen to you while you relay the question, if that's cool. Okay, you kick that off. I'm gonna finish typing an answer here and then I will read. I will read. Ha ha ha ha ha. Okay, Cental asks, what about a heavily loaded environment that has some failure? Can it roll back? Does it roll back without issues? How would you handle that? Right now, rollbacks are a manual affair, so the facts use the exact right wording, which is we do a very simple rolling update. And the concurrency is essentially a lot of slots that you can tolerate being down. If they fail, they will hang there and it won't move on until concur... Won't apply to other nodes until those concurrency slots are fixed, until those failed jobs have been addressed in some area. That's all that is to say is rollback is a manual affair at this point in time. And Anonymous Attendee asks, is this aware of who the STD leader is and does it handle migrating a leader to another node while things are being upgraded? No, that's the sort of thing that you would have to sort of codify in interlocking plans a lot of Q3S upgrade. Okay, the last question that we have right now is from Farzad who asks, if Rancher does the upgrade in multiple clusters simultaneously or sequentially or if it even can do with multiple clusters at all? Dax, do you wanna answer that? Yeah, I can answer that. So like sort of asking the system upgrade controller is it's actually somewhat decentralized. There's no requirement for Rancher. So you can, if you have five clusters with the system upgrade controller, and you launch a plan in every single cluster, the moment the controllers in those clusters see those plans, they'll kick off and they'll start performing the work. So yeah, for Rancher two out of four, we only support Q3S clusters right now, but yeah, you can launch those in that same type of already you want. But yeah, there's no requirement on Rancher. Okay, so whether it's sequential or simultaneous is actually at the discretion of the operator who launches the plans. I think, so are they, I think they're talking about just in the Rancher case, like how we specifically do it in Rancher or if- Well, he's specifically asking, does Rancher do the upgrade in multiple clusters simultaneously or sequentially? It can be simultaneous, like the multiple clusters. Yeah, you could trigger the upgrade in two clusters at the same time because they don't have any dependency on Rancher, they would occur at the same time. Okay, and is the process triggered, is it actually triggered independently for each cluster or is there a way to take a central plan and say apply this plan to these five clusters? There is not yet a way to do that of applying a single plan to multiple clusters. No, you'd actually, yeah, you'd have to go in there and you'd have to basically what it's doing is it's sending a plan to that downstream cluster. All right, that's the last question for now. So we have, actually we, well, we normally leave 10 minutes for Q and A at the end. We have no more queues. So if you want to continue for a few more minutes and I'll let you know if other questions roll in. Yeah, so what's going on here is the, unfortunately the master mode got upgraded first. And so I'm waiting for it to come back up. Oh, hey, paying response, good. Oops, let's see if that's the actual response. Oh man, there we go. So just to recap, you've shown upgrading the kernel, upgrading individual packages and upgrading the Kubernetes version in the different things that you've demoed today. And then this is upgrading K3OS, which is kind of a bundle of user space, kernel, kernel on Intel hardware and K3OS at the same time. So you can see the OS versions was upgraded here. K3OS version was upgraded here and the kernel was upgraded as well. It was like 37, 42 generic. Yeah. And then what's going on here is that I'm showing this, hopefully, working magically with heterogeneous mode because one of the things that I do with K3OS is publish manifested images such that I get to use a single plan for the final directory it should look like and then they go and pull down the hardware-specific one on the individual nodes. So while this is in K3OS and is part of K3OS and it's in Rancher 2.4 for the K3S support, this can run in any Kubernetes cluster. So if somebody is attending the webinar right now and they're not using any of those things, how can they get started using this? Yeah, I think probably the best bet is to take a look at the examples directory which I was actually in the middle of reworking yesterday that didn't finish, you know. This is what a snippet that, I think this is the CRD that the DAX was showing. This shows you how can you configure a secret to encapsulate a script to be run as well as parameterize it with maybe individual packages that you may want to update. Like this is a lot like what, you know, when your Ansible would be your, this is your individual devian packages, here's the names, here's the versions, here's how you make that happen, right? And this is what Ansible is doing in the background, folks, or maybe doing an update instead of install, but, you know, and then it's showing how to chroot to host and then use the host systems packaging, right? Is the way of pulling that upgrade off? That's probably the best entry into it is to take a look at, you know, the examples. And as far as deploying, as far as deploying the system upgrade controller into a Kubernetes cluster, we have that in the DAX as well. I do not, it's just a matter of copying the manifest, right? So, it's just a matter of copying this manifest into Qt and Qtcpl volume, that's it. Nice, pow, done. Easy. Now, one thing you want to keep in mind is, you know, the caveat is clustered in the moment. I haven't gone through the trouble of spelling out all the individual permissions that pods will need or the controller will need or something that I've been meaning to do, but right now, you know, we're in a very privileged setup. So be mindful of the content that you're pulling in in the next video. Yeah, that actually leads to a question from Dominic where he asked if we have plans for something like a system update controller images hub where people could share images or manifest or things and other people could use them. Has that even come up as an idea? Does it seem like a good idea? I mean, I like it. It's something that I thought of. It's not something I've begun to tackle. But it's a great idea, I love it. I think the reusability factor of it is really good. As long as people understand that you should check what you're running before you just go apply it with privileges on your cluster. I mean, I think in most cases, especially the early adopters, these images are gonna be the kind of things that you either craft yourself or they come from the source that is the sort of a deployment of the pre-controller is co-located with the bits that you're using. You know, I think in this case, K3OS, right? The image that I'm pulling down is K3OS. Hey, did these things upgrade? Oh, the RPI3, the RPI3 did not get updated. But the RPI4 did, that's something. Also look into what happened there. But yeah, yeah, any more questions? No more questions. We still have a couple of minutes left. I have a question then. In this situation, you've got the Raspberry Pi 3 did not get updated. You say you'll have to go look into that. How would you handle this? Would you log into that and perform an update manually? Would you downgrade everything that you did? What's the process you would follow? Oh, actually, you know what? I'm seeing that it's not ready. So what that's telling me is probably you got a boot. It's either taking a long time to boot up or it's got some other failure. So in this case, I would have to switch out my hardware and hook up the monitor and keyboard and figure out what's going on. It could also be that thing is just, it's running on a hair's margin. It's got no memory. It could be that it's just, these are both running off of external hardware. I don't know, I'd have to look at it. But I mean, it works for all of that. It works for that segment. Right. Oh, there it is. It's updated. Did it? Hey, all right. Yeah, so it's just there. Nice, it's just slow. This is awesome. So I didn't practice this at all, by the way. I said, I know it works on the nook. It's gonna work on the RPIs. And it did. Awesome. We had a question just pop up from Mike who asks, if there's any plans to integrate cron-like behavior to implement the glue of a cron job, changing a hash in a standard or integrated way? Well, because the system upgrade controller is meant to be general purpose and I feel like that particular bit, how you get a version into a plan, there's so many different ways. The plan is the local, there's all kinds of different paths you can get to there. I, for me, I don't have a use case where I need a cron job to do that yet. So I don't plan on developing it, but contributions are welcome. Sure, open source. Right, then there's the other side of it, which is if there isn't a future request out there, I wanna get one sort of outstanding that discusses when to apply, when is an appropriate window to apply upgrades? Because they need to be different per node. I wanna figure out like what makes that functionality valuable to people. Okay. We are coming up on the top of the hour, I see Saim just pop back in. We have one last question where somebody says that the manifest specifies system upgrade job, kubectl image is rancher kubectl v1.17.0 and wants to know if it'll work on earlier Kubernetes versions. So if you override that environment variable with the system upgrade controller, you can definitely get an appropriately compiled kubectl for your architecture and shim that in there. Right now that image is only built for v1.17. It's in fact kubectl right now, it's a bit behind, but you can definitely make it go a little bit further behind. Now the thing is the other caveat to that answer, the system upgrade controller is compiled against, we're built with Wrangler, which I believe is referencing either the v1.16 or the v1.17 series, which means your backward support is only goes as far as v1.15. Okay, I gotta cut you off there because we got less than a minute left and I know Saim wants to wrap it up. So thank you so much Jacob and Dax for your demo and Saim, all you. Okay, great. Thanks, Adrian, Dax and Jacob for the great presentation, a great demo with 100% success rate and thank you everyone for joining and we answered all the questions as well for today, so which is great. Thank you for joining us today. The webinar recording and the slides will be online later today and we are looking forward to see you in future CNCF webinar. Have a great day everyone. Thanks everyone. See you soon.