 Good morning. Good afternoon. Good evening and welcome to another edition of Red Hat Advanced Cluster Management Presents I am Chris Short, executive producer of OpenShift TV here, and I am joined by a bevy of Rackham team members and Scott I will rely on you to Do the round of intros this time because you always always appreciate jumping on your show. I'm sorry. I do bring the thunder Yeah, I try I try to bring the thunder no pain no game But you know through the what are we in season one of Rackham presents? Is this We have had a chance to introduce Various characters and the plot has thickened, you know, I think we're getting some peaks and valleys as we take the The experience through there's no breaker of change yet. We haven't quite reached like the dragon phase, but Anyway, metaphors aside, I'm excited to bring more of our team To meet you and to meet the world and talk about what we're doing with multi-cluster management I'm going to turn to my left, which I don't know where that ends up on people's music. How about Gerny goes first for the intro? On the screen, it's nowhere Can do all hoppin scott. Thanks. I'm Gerny Buchanan. I'm from the CI CD slash dev ops slash tools slash whatever gets thrown our way squad Um, so we do about it all and whenever scott needs to needs to spin up some more aws resources He hollers our way. So when I want to spend cash, I talk to Gerny Yeah, and they nicknamed me mr. Pool as well after the last stream about cluster pools So that that did happen one thing Gerny doesn't do is hang out at an actual pool But if it's a CI CD cluster pool, he's in he's all Nice Then we turn it over to Dale down there in my bottom left corner All right. My name is Dale Heidesack. I'm a developer with the g rc squad it's a little govern risk tab and Uh advanced cluster management Not so little. That's kind of a big deal. Governing governance risk That's a big deal shouldn't undersell No And then Kevin from the the sunnies guys of toronto Hi everyone, I'm Kevin Cormier. I work on the application lifecycle squad, uh, particularly on the UI side Awesome. So we've got dev ops. We've got g rc. We've got app. This is actually the trifecta When I talk with customers, this is the trident, right? Like yeah, you're trying to bring these cultures together We actually are going to try to do that today It it might be oil and water in some cases But we have some really cool stuff teed up a lot of this centers around The value statement of rackham being your cluster management interface your your central plane for the universe And one of the key pieces that we rely on in there as we've discussed in the past is hive And the hive api being that centralized layer that we drive all of our open shift goodness through So that we can drive it on cloud or on prem or wherever you're hanging out with your clusters Hive is part of that that message. So hive api I'll drop a link in the chat when I get back to that window. Thank you Um, and that's a place for you to hang out upstream and start learning about what we're doing with with multi cluster And how we're driving a single api experience there Then when you do that you find out that you've got cluster sprawl, right? All of a sudden you're spinning up clusters rapidly and Maybe you're not paying attention to the budget and the billing and when you get to the end of the month You realize you've just spent $10,000 on a cloud that you didn't expect to and that was actually what journey He doesn't actually have these wake up moments. He has these shower moments I'll let him describe how that works But he had this showering moment where he's like we're spending all this money, but we could use something called cluster pools And we can really start to centralize our costs and start to bring down things with hibernate Um, so it all kind of started to come together. We pitched it early with you guys A couple of months ago and we're coming back for another round of it that goes deeper and wider and and farther in between So nice take that cluster as cattle concept take that You know cluster driven experience with high understand cost and that kind of sets the table for what rackham is going to solve today And with that, I will turn it to gurney To to drive into the first area of the cluster pools Sounds good. Uh, thanks scott. Um, yeah scott was referencing the uh the usual I I I think the subreddit that I find a good home in is shower thoughts because most of the time Oh, that is a good one. Yeah, I uh, I sent one of I sent my uh tech lead a message at Admittedly 3am last saturday morning and said i've had an idea also ignore that it's 3am. Um, so It's one of those so i'll go ahead and grab the screen share. Um, and just just share my little uh We're just going to be pretty much 100 shell 100 of the time today Um, so, uh, well at least for me and I think the other folks as well Um, they may show some UI. Um, so I guess the the quick intro on cluster pools For for those that didn't hear me blather about it last time and haven't heard me talk about it for the past six months So all of my co-workers bless them um is that we have a uh, we have a I guess at first we had a cicd scale problem. So So we are you know utility let lets you manage Or a piece of software that lets you manage a bunch of different open shift clusters and and and some splat ks clusters On all of these different cloud platforms. And of course When we're going to shift and say hey customer we support this we're going to test it So we need a lot of clusters and we do ci every two hours or so We do a new pass for all of our maintenance releases So we're looking at like hey, we need like 15 20 25 clusters We need that to shrink and scale if we want it to be cost effective So as we started looking at hey wait, we ship functionality to help with this So what if we start using some of the hive bits that they ship? um, so the big thing that we we settled on as cluster pools, so What hive lets you do is you can say hey, I want to make a cluster deployment and my cluster deployment Is just me making some yaml that tells you how to make a cluster Um, and I want it to look like this and it'll go off and it'll make your open shift cluster for you Um, and say here's your cluster. Here's your credentials all those details. Um, a cluster pool Let's you say hey, I want you to make five clusters that look like this And I want to be able to check one out And then say I want to use it and then hand it back and say i'm done with this and you can go You go clean it up and replace it have another one ready and wait in there for me So that's what we we arrived at as a really good solution of this problem I'll go in a couple more of the details and the other folks will as well Where this really gets powerful but to start off with I will just go ahead and do a quick oc gets cluster pool in our um, we're going to be doing A twitch demo namespace on this open shift cluster today Um, so hopefully this will I haven't tried this a little bit. There we go So we have one cluster pool out there. So this is a great way to just kind of give the the quick five minute Here's where the cluster pool is We have a cluster pool named twitch demo 470. Um, it's running on this base domain and it's running on this Oh, this open shift image. So this is just saying hey when open shift 47 just came out all the shiny new bells and whistles And I want there to be four of them at all times. Make sure there's four Um, and right now it says hey, you have four ready already waiting out there um Now we'll make something later called a cluster claim that is you saying hey, I'd like one of those four so We can do oc get cluster claim and see that I cheated and went ahead and made two um, so these are cluster claims that are just Literally point directly to an open shift cluster that we can poke around at in a bit That's those are full open shift clusters the moment we go and delete these these kubernetes objects It'll it'll go clean up those clusters. So everything's managed as kube objects Even your kubernetes clusters or kube objects and the recursion is not lost on me Because we'll we'll we'll have some fun with that. Um, so the first thing that I'll show off here Is just this the process of making a cluster pool. So last time I came on the show Bored everyone with a ton of yaml. I just pull up the ammo files. I edited them and I oc applied them And I know that's like watching paint dry Well, this time we have a project called lifeguard, which makes life a lot easier I did make a terrible pun in the read me Up near the top. I should have hit did a head on that Um, but yeah, we we make a terrible pun that uh, this is just something to keep you from drowning in the club in the cluster pools Uh, yeah, here we are keeping you safe in the cluster pools Yeah, um, we make far too many puns and you'll get more later today. Um So it's just a bundle of bash scripts that I wrote in an afternoon and now everyone has started adopting and Dale keeps making prs into because I have typos everywhere Um in this in these bash scripts and it just exposes some real simple capabilities in an open source project I don't know if scott wants to drop the link in the chat but Um, we'll go ahead and make ourselves a new cluster pool So it's as simple as going into this cluster pool directory and then seeing hey, yeah Oh, oh gurney. I see that you already have your your one set of yaml sitting here. We'll ignore that um, so We can go ahead and it's just a bunch of scripts that let let you run through and apply and it says hey Yeah, you're on this cluster. Here's your namespaces. You can put a cluster pool in So we're going to go into twitch demo and say hey, yeah, we want to put it there Um, we're going to go ahead and toss it on aws Um, and it says hey, you know, here are the secret names that you have sitting out there So remember I did this ahead of time. We don't have to create any secrets on a live stream Um, so there's there's my credentials. There is an open shift pool secret and we just say here's go point at those Um, let's also make this one an open shift 4 7 cluster pool. So we'll just go ahead and Snag that image set that just tells it what type of cluster to put up We'll put this in aws region us east one where I have some space for this Tell it to use our super special super secret domain um We only want one cluster here. I don't think that we'll be using this So you can put any number of clusters to keep kind of little caesars hot and ready style Like it for checkout. Yeah, it's and we'll just call this uh twitch doing it live Um as I mistyped it live. Yeah And it just applies it'll they'll tell you here. Hey, here's the yaml I applied if you want to be boring and snoop You can look at the yaml. It just applied that yaml and now if we do an oc get cluster pool Um, we'll just see that hey We have a new cluster pool and we can go investigate if we really wanted to And see that it's running a nice little pod off in the background in a different namespace to provision one cluster for us And that that little bit of flip to one whenever we have a cluster So as that's often provisioning, we already have our other other cluster pool ready to do some do some checking out so We can go over here and just go into cluster claims And this is just once again a little utility that lets you create Delete grab your creds from and reconcile your uh your cluster claims So we'll go ahead and just apply real quick And this will run through and do a little bit more fanciness in a nice little script. Um Namely, we'll want to grab it from the twitch demo namespace It'll say hey, you have these only one of them has it ready. We'll go ahead and grab a cluster from that I'll do urn band live claim And it'll ask us so this is this is the number the the first feature that I can kind of talk about So you can set a lifetime on a claim. So This is incredibly good for ci scenarios where I know our job takes a maximum of eight hours for example So I'll set this number to say, you know, hey I only want this cluster to live for eight hours and after eight hours no matter what It will go in and reconcile and tear down these resources. So we don't get charged There's no risk that your ci is going to leak a bunch of expensive resources Because you'll always have this to clean it up um We then can associate it with an r-back group in this case sure Let's associate it with our r-back group so that we everyone knows that that us twitch dome or demoers do indeed own it Um, and then what it's going to do and this is the number two really cool thing that I absolutely love about cluster pools And that saved me saved us so much money Um Is when a cluster is sitting in that pool counting, you know, hey, we have four ready It's actually hibernated and and it's high of hibernation. I call it hibernated. It hasn't caught on You heard it you heard it here first. Yeah, still haven't gotten it to catch on but Um, basically it shuts down the vms. It's your usual Have you tried unplugging it after the the day is over to save on your electricity bill? um You know it shuts down the vms and then whenever you check out the cluster what it's doing right now is it says Hey, I've claimed a cluster It's still resuming so all those vms and aws are literally changing their power state to running And then it's going to make sure it can connect to the cluster So in about five minutes here, we're going to get a cluster that we can actually reach and log into and poke around with And it's a full open shift cluster um And in the release we're about to punt out the door We're actually going to have the ability to customize these clusters You can say hey, I want bigger workers or smaller workers or more or less workers Um, all of that sort of configuration that you normally get in the open shift installer is going to be here too Um, all these functions will kind of work So I've babbled on about this for a little while and said hey, we have all these cool pools And I mean you can grab clusters at some crazy scale Um other cool things you can scale the pools up and down in size These are all just kind of oc operations. You can just apply some yaml do an oc get do, you know Poll for the status of these kubernetes objects and that's all well and good and we have a little tool that makes it easier Um, but then we kind of just threw these ideas and said hey, we're using them Devs, why don't you guys go have a heyday with this? um, we can save some money if you guys do it and hibernate, um, so That's where dale and kevin come in so i'll go ahead and give an intro on these folks dale and kevin are both awesome guys I sat next to dale in the before times when we're still in the office and uh rtp and He said hey, these are really cool And I think I can make my life a whole lot easier Not having to like manage having clusters everywhere and provisioning them and depositioning them and forgetting where they are And you know, oops. I broke the cluster. I now need to spend a half hour fixing So they've built some really cool tools So developer a developer needs a cluster developer spins it up developer owns that life cycle Put the tools in the people's hands and let them party, but also make sure it doesn't drift Right at that lifetime of it says I don't want a cluster that's just hanging out in the corner collecting dust and cds You've got to pre bake in that security posture, too Yep It's it's really it's an automated life cycle management I'll admit that I've left expensive clusters up over the weekend because five o'clock on friday rolls around and it's like It's board game night. I'm gone. Um, you know, so I I have other obligations So it's like, you know, I forgot to turn on that cluster and that cost the company $500 this weekend Um, we've all done it. Yeah, whoops So see some clusters don't cost that cheap Oh, yeah, you're right. These are these are dev tests. We're we're uh, cost-optimizing Um, it's stuff like that in their their development environments and the other problem that we encounter with our developers And I don't know if anyone else developing on open shift has ever had this problem I'm sure they sure they totally haven't where you you know Accidentally run something in a deploy of something and you you make a huge mess on your cluster You have you have you've screwed up your dev test environment You accidentally installed four operators in the wrong namespace with the wrong configuration And you can uninstall them, but there might be some resources sitting around from finalizers You'll probably have to delete. Yeah, there's there's a bunch of stuff going on there. Yeah, and and we had um, I recall where it's like, oh man, my dev cluster screwed up. It's going to take me two hours to get this thing back Um, so that's kind of the other problem that we said Hey, maybe this will solve that because you can kind of just throw it in the garbage bin and ask for another one Um, is the really cool one So I think at this point my best my best next step is to hand it off to I believe dale's taken over after me and he's going to show a lot of the workflow sort of stuff for Hey, here's all the cool stuff I've built to use this tool and I will go and stop screen sharing while this checks out our cluster Nice. Oh, by the way, uh gurney uh hibernated.com is available if you want to go by it. Oh, okay Yeah, I I should probably go do that real quick. Yeah All that cash he's saving on his cloud bills, right? All right, dale take it away. All right So here we have An empty terminal Let me just hop on over Yes, like gurney said, um, he came to us and said hey guys you need to save a couple bucks Because we were it was uh It was messy. Um, we were just it was like kids in a candy store. They were like here's aws guys It's open season do whatever you want with it. So we all we each had our own cluster Um, it was it was great, but they came back and said, oh, you know, you guys have spent A lot of money We need to rein it back. So so so then gurney says We have these cluster pools and we go well, that's great, but Um, we're not saving any time like if we have this cluster just ready to go Then it's just sitting there like we can just get to it right away. So like there was a concern with getting to rack them as quickly as possible like And so that's where um our my solutions came in so gurney built up lifeguard And that was that was a great way to lower the bar to cluster pools And then I came up with a script. Um, I called it start rack them that encapsulates Lifeguard and our deployment scripts so that you can just one stop shop Run a script and it would deploy it would like claim a cluster and then also deploy rack them with the configurations We need for development. So that's That was my story You're basically tying together the cluster pool with our application ours is called rack them But you know johnny developer has something called You know pacman and they they would basically tie together these two things and script it out So they have a nice clean quick way to have a their development app up and running Every day nice Oh, so what have we here? Yeah, so we're gonna crank up visual web terminal. We're gonna We're gonna bring the terminal into rack them and try to interact with it here As much as much as we can because it is we are we are being hosted like all these clusters are made available because of rack them so Um, right now i'm in i'm not enough my domain space. I'm gonna switch over The twitch demo. All right, so now we're where I want to be um And so before I start talking about start rack them. I wanted to crank up this job Um, so I have I containerized what what I did is I created an image um that would containerize the start rack them script And so so that we could have that development cluster that we don't really care about don't really have to maintain if it like If it gets broken, it's fine. We get a new one the next day And it only deploys monday through friday. It has a lifetime that as gurney described so It deletes itself at the end of the day. So we don't even have to think about cleaning it up Um, which is really nice. You took out a bunch of guesswork for your team, too You basically centralized one way to do it and said, oh, yeah You said folks i'm gonna alleviate burdens that you would normally run to and waste time Because your squad of 10 developers no longer has to spin up their individual clusters Face those ad hoc headaches that they get with infrastructure Or code changes or some script that they're keeping on their desktop, you know to make it all work Right. Yeah, this is like real dev ops real get ops real dev driven Need and solution love it awesome So We'll just create it and let it run while we talk And rack them stack them. I know chris you wanted that moniker months ago. Yeah, no rack them and stack them is my thing Yeah, I said well that makes me think we're going to be spending our days in the data center I want to run away from that topic. Well, no, we got like we can we can reinvent the term Like that's what we're doing, you know, you want all these clusters racked up rack them and stack them with us. Here you go So, um, here's the here's the start rack them script for running inside the container So it needs to know where you're So it actually if you run and start rack them locally, it needs to know where your lifeguard repo is We have a private repo currently and called pipeline where we store all of our tags and things And then the deploy repo which is open. Um, and that's the scripts To install rack them And so you can see it Grabbing those and then starting up and then here is where it enters into lifeguard And claims a cluster. You can see it has a 12 hour claim Um, and it also populates with our our back group so we can get to it once it actually deploys We'll let that run We'll talk about start rack them a little bit so Like I said, like, um, we wanted a quick way to get to rack them and but we also needed a lot of Configuration we needed to be able to get any version. We need to get the upstream. We need to get down to downstream Right now right now. The other processes are a little bit complicated because we have to Because we're not we're working on being open. Like it's an urgent thing, but we're not we're we're nearly there, but not Probably like a couple of weeks away from me. Oh, really? Okay. Totally. Yeah. Yeah Yeah, there's a huge push. Um, even in this next few weeks, so But uh, so So it can it can claim any any branch version or snapshot that you want. So if you By default, it'll just get the latest one, which is usually what we need if we're trying to verify bugs Um, that was that was like the biggest hang up. We'd have to constantly update clusters So now every morning we get the latest The latest upstreet upstream snapshot On a cluster. Um, and then but if you give it a branch you can give it 2o And it'll give you the latest zstream. So it'll give you 204 208 i'm not sure where we are right now, but If it doesn't have enough space it'll resize the pool automatically you can tell it to um, you can also so We do development from local hosts. We run run our Run our component locally and then it's so the cluster needs to be able to communicate through oauth And accept those connections so it'll patch those if you want it to nice and then It will also It pulls configurations from a script so it does all the exports inside of This config script And you can see the all the cluster the lifeguard exports the rack em exports And then the nice thing about that is that i'm able to i'm able to like create Variables dynamically and if i need to adjust anything i just come into the script and change it and then Run the script and i'm off off to the races So Let's move over to let's see we're still claiming over here. It'll it usually takes about 10 minutes So in total to get to rack em So we'll hop over to our containerized script So it's it's a cron job. I created my my own image that pulls in all of our repos And then you feed it um You feed it secrets And you also feed it a slack url or slack token and that lets you post the credentials When it when it um deploys and then If you give it a slack token and channel id Slack now allows you to create a scheduled message. So it scheduled a message to the to pop up 20 minutes before the Cluster it is set to expire so you can you can get in there and extend it if you need to if you happen to be working with it User experience. I love it. Yeah driving driving the team and the slack and getting them all going there Yeah, that was actually my team. They were like, hey You're really nice to know when this thing is disappearing Isn't that funny how it works? You think you've got it all figured out? But no, it turns out in an open Environment with good decision making you can you can foster these kind of things on the fly you can you can build it And contribute this the red hot way So to speak Yeah, no kidding so So then we we also the the other thing we did um as we set up some our back users So that if you need to You can go in and see what it what the view looks like as like a view user or an edit user That's that's something we had We had set up in our development environments And there's something there's something new about that because I don't remember that being around When we looked at this back in Was it november timeframe? So what's the new story around our back? Yeah, yeah, so I think what so what dale I think you're showing here is you're making these our back users in the cluster you check out right That's correct Yeah, so I guess to go to your question scott. Uh, there's actually in the new version of a cm Which is the newer version of hive Um, we put in a we started using cluster pools a lot and found a couple our back scenarios that were like Hey, you know, this would be really nice So um now it's set up where when you claim a cluster you can associate it and this has already been been like this You can associate that claim with you know, an our back group or a user Some owner or a list of owners and they'll all be able to access that cluster So if you and your squad have like an our back group that has some permission on your hub cluster You can check out the cluster and say all of these folks own this cluster So everyone that's in that group can read and access your cluster In the same way now They hive is letting you an acm's letting you associate a A a group or a user as a cluster pool owner So they also get to see all the resources associated with the cluster pool So they can see like the inner workings of it's it's it's provisioning three new clusters because it's out of clusters That that's sort of detail that's doing they can kind of pull back the covers and see and that's a hive cluster pool admin So that's going to be ga sometime this week. Hopefully Um where that that capability and acm where you can say hey, this person's the cluster pool owner They should be able to see all the special behind the scenes Um, so that's going to be a really cool one too for that our back story And that may be a good follow-up as well scott because we're working on I know tim one of my tech leads It's working on a really good our back story around automatically configuring our back on these clusters that you check out Nice. Um, yeah. Yeah, that's awesome. And that solves a huge security Gap that we we don't want to put this out there without having thought through and it's it's awesome to see the enterprise controls Coming into hive more and more. Yep Sweet Dale, where we at with that rackum and stack them Yeah, so in the in the background you can see it just finished Right here is our deployment. Um, it's deploying the latest snapshot that it could find And you can see it deploying and right now It looks like it It wrapped up right here. So there's our url that we'll meet later And right now it's waiting. It looks for the ingress. So it's waiting for the ingress to come up because the Even after the deployment the pods are Reconciling and still installing and so stabilizing. Yeah. Yeah, so run that back. You just deployed the two three bill I mean you blow in my mind because that branch was just cut like what last week or something Right, and so we can we can quickly spin out dev test clusters for that without even a heavy lift. We're just You're already just targeting that new snapshot. Right. Yeah, this is this is from yesterday so wow Yeah, and we have basically a unified uh deploy methodology for our dev stuff Between our ci and our our developers where developers are using is these dev tools now So this is even something especially with like the app model in rakim you could do for your team Where you say hey, here's a dev tool that we use for ci behind the you know on our meta level And we use to deploy the stuff that that also you as a developer can run this image and just use Nice. Yeah, that's awesome. That's nice Yeah, so um, let's see So real quick, here's the message that you get that our team gets from slack From the bot So it just gives us the rback users and the password is generated randomly Gives us what snapshot we're looking at It's lifetime and when it was created. So the lifetime is from the creation point And then other credentials in the url and then 20 minutes before it'll tell you i'm about to go away You have 20 minutes nice That's helpful Dale, I think you accidentally made open shifts cluster bot which is in slack and you ask for clusters I think you accidentally made a pool based version of cluster bot. So that's that's really cool I know I was I was thinking it'd be really cool to be able to interact with the bot and say like give me a cluster We'll we'll see I know All kinds of ways to spend more money when you're not intending to Yeah, totally or when you do intend to It asks you to enter a credit card number whenever you ask for a cluster scott. That's the That's the setting it's craig's paypal account, right? Yeah. Yeah, it's it's bless our our Craig craig we joke about he's the money man, but he he handles all the the billing stuff bless him So we'll since that that's been claimed. So we should be able to If hop over to lifeguard and look at the cluster claims go to the cluster firms directory And do a recon style claims Um, and that what what that'll do I so full disclosure. I wrote this script It will Look at the directories because each each claim gets put in its own directory Whether it whether it gets created or not just so you can see the gamble and you can try again If you mess something up, but it's small Or if it was successful, um, then the credentials will go in there And so what this will do is it'll grab All the claim all the claims that are remote um, it'll Wipe away all the local ones Um that are are no longer relevant and it'll pull in all the remote ones and update them so I can go into this one And we we have our credential files here. Um, and I think I'll be able to show them but Because it's we're gonna delete this after this But we'll still Let's copy the password for now All right. All right. Let's see how we're doing. Oh and this is stopped. So Here we are it so it It patched the ingress with all of our local host connections That I all the paths that I wanted um, because our team is responsible for a couple different things. So There's a couple different paths that we wanted to have and then Here it is creating our users So it just uses a quick ht password Um, and instantiates the rback users and it should be ready to go. So let's go back up And get to our You are in a your shell within a cluster pool within a Rackham. Yeah Shell Where in the inception script are we? I think I think we're somewhere flying through the air on our back Right, there you go. Exactly the special effects have kicked in. Um Let's see looks like not everything's up quite yet since I can't get to the I can't quite get to the rackham console yet But we do have a kube config file in here Um, so this is a handy command I use all the time to get to it Hey, by the way, dale the newest build of rackham may not install the whole way. This is the in dev one We've only had a week to bake this one Yeah, so that's the other risk. Um with getting the latest is that sometimes, uh, it doesn't actually work For whatever reason because it is like leading edge. You never know what's in there So it looks it looks like we're getting there still things are happening. Yeah, yeah That's awesome So end to end and then talk me through that crown job is it you were you're basically instructing this to happen every day For a 12 hour lease. I think is what you had on there And then tear it down at the end of the day and do it again on tuesday And then only you know only monday through friday. So you're you've ensured that there's guard rails to um, you know to the cost factor on this one right um, and And yeah, only monday through friday. Let's close this out and see what we got So there's this cron job. Yep Uh, the one through five is monday through friday If you know, you're Never easy to interpret So and then we also have a cluster pool expand and shrink job. Um, and it's those are hosted in start rackham also If you want to go back and take a look under the extras folder And so all those do is run a patch on every cluster pool in your namespace And so that at night our cluster pools are scaled down to zero and during every morning. They're scaled back up to one So we so we have like right now. We don't have any We haven't migrated everyone over so we literally have zero cluster pool clusters running at night And then with with start rackham's ability to scale cluster pools dynamically Yeah, we don't need to have a large cluster pool running We can just have one cluster And then if someone claims one half an hour later another one pops up to be ready And then if you run start rackham, it'll scale the pool automatically to two and and and grab that extra cluster. So nice, yeah It makes a really nice aws cost graph as the person who's supposed to look at all the cost graphs Um, there's also another red hat project. They're working on called cost management That's like a software as a servicey thing Um, and that makes for really nice graphs for each of the accounts because you kind of get to see it Do this and it's every 12 hours sort of thing So that's that's kind of what we get to look forward to is the user experience of bringing pools hibernation Cost management bringing that under rackham's purview so that at the fleet level You now have controls to say hibernate all these all these dev clusters on Fridays Um, or or, you know, delete them if they're just dev clusters We have we're going to be baking those controls into rackham So that you have that sort of level of capability to drive it out there to the fleet What they're doing on the cli is is great. I'm more of a ui guy As you could tell, um, um, my my chops in the cli are are lacking a bit But the fact of the matter is what we're building out what we're generating this interest on your show chris is to help Define, where do we go next here? How do you make this to look right? Do you want to do you want to do a checkbox on all dev clusters and say hibernate right now? Do you want to be able to scale those down to three? You know shared master workers or single node clusters? Like let's start thinking through how we really achieve that out of fleet capability You know thousand clusters that we're managing that kind of stuff. Yeah Yeah, and and the ants bring in ansible Um, which should be coming up anytime is is going to be even more powerful because like really my script should be ansible Like it should be like waiting and checking to see that resources are exist and then continuing And it would it might even make it run faster for all I know so Nice Yeah, that'd be awesome And once you have all this kind of codified, you know all these best practices codified for you know dale He can just kind of use this tool that's been provided to him by the you know Whatever whatever infrastructure provider you have at said company or for us. It's the cicd team kind of says here's your cloud account Um, we have one shared cluster that everyone has access to and he can just put up pools and use this and know that He's going to just conveniently have a cluster there when he needs it and whatever he does It'll be about as cost effective as it can be within reason So if you can kind of codify these best practices in a way that that everyone can just so easily Do the the most cost effective and easiest thing and secure thing Um, that's really great because it takes a lot of weight off of like me checking the bill every month and saying Hey guys, you left a bunch of clusters running Um, yeah, exactly. It makes it really easy to codify it So Kevin take me through some of that user experience if anybody has spent time in the acm console They understand what a buttery experience that is and how smooth it is and a lot of that comes to the eye and the articulation of Kevin And I want to hear, you know, what are we what are we looking forward to? Tell me a little bit more about what you've been working on and this project called cluster keeper I think is is that the right name? Yep awesome Um, so we won't be looking at UI a whole lot. This is a a cli based tool that I came up with Much the same as dale gurney came knocking and said, hey, why don't you take a look and see if your squad can maybe spend less money? um on your cloud costs So I thought, uh, you know cluster pools and hibernation are really interesting ideas We want to start using them But I did recognize that there was going to be some overhead Uh in using these in our day-to-day activities um So for example in this terminal Um, I have my oc targeting that shared cluster where our cluster pools are set up um So if I look at the deployments cluster deployments specifically The ones for this demo You'll see they their names they have these auto generated bits at the end right five characters And that's even in the urls for these clusters. So Um, they're very, you know, these names are hard to recognize or to memorize Um, and then as you get into using cluster pools, you're probably going to be recycling your clusters more often So on top of that these names are changing all the time Um, and anytime you need to manage the power state or the life cycle of these clusters You need to target your oc or your pool control back to that. Um, I call it the cluster pool host So the cluster where all those cluster pools and cluster claims and cluster deployments are defined um So I created this cli called cluster keeper to try and deal with some of those issues um So that was the cluster deployments if we look at the Just get the cluster claims um By default we just get the names of the claims um, if you actually wanted to see The actual cluster deployment you would need to maybe look at the yaml output for that for example um, so the first um cluster keeper command i'll show you then is list um, you can say claims Cluster claims claims. I usually use cc. So I it recognizes a bunch of aliases um, and I'm not going to go over the configuration really, but basically all I had to tell um cluster keeper was uh, the server url For this cluster goal host the namespace That our cluster pools and claims are in And that our back group name so that my squad members can also access the same stuff Um, basically then once you run it the first time it's going to walk you through um Logging into the cluster pool host So anyway, that is all set up. So if I run my list cluster claims Um, I get this display where I can see the actual cluster deployment that's associated with the claim I can see its power states, uh, whether it's running resuming hibernating um I can see the lifetime so gurney set a lifetime of eight hours on the one he created at the beginning And dale's lives for 12 hours. We have the age column here. So you can do the math and see how much time you have left um So that's collecting kind of all the relevant information for you so that you don't have to go through and find the associated cluster claim um I'm sorry cluster deployment or cluster pool And that's that's these are yaml attributes But you'd be sitting there with json output or yaml output sorted through jq pipe sort this blah blah blah cut Exactly. Um, and yeah, like none of these things are difficult to do. It's just time consuming if you're doing them day after day You know switching clusters. So And my oc was already targeting This cluster pool host, but the the nice thing about the ckc li is that anytime it needs to Access the cluster pool host. It has created a context in your kube config so Right, so normally I could say oc context Sorry oc config Current context And that's ck so the ck context is the the context that has created when it for when it talks to the cluster pool host I also have a short form because I don't like typing that long string. So ck current tells me I'm pointing to the cluster pool host So this is probably my most used command right because I'm frequently checking You know What is the power state of the clusters I need to use? Are we using more clusters than we should be etc? um So then If we talk about When I want to actually use one of these clusters. So if I didn't have cluster keeper As I mentioned, I would have to target my oc to the cluster pool host Then I could use lifeguards. So um I think dale pointed out the get credentials script I could run that to get the credentials that's going to create a directory associated with that cluster that has a kube config file It has a oc login script if I want to use password based login um So I would have to use one of those methods to connect and then maybe I would There's also a credentials file if I want to console url um So I wanted to kind of get that down into one step for cluster keeper. So um The idea is that everything is keyed by the cluster claim name. So on my team, we're using very simple names for our cluster claims. So um I might rather than having like a numbered claim. I might just call it prototype or demo or dev something like that Um And so I mentioned that ck list is probably my most used the most useful would be ck use So if I want to Use say this cluster here That's actually going to switch my kube config context Uh to point to this cluster Um, so for example, let me run some oc command to show that we're on this cluster, right? um So quiet And he's typing it all correctly. Oh my god I really should have copied and pasted that uh, but you know on openshift clusters. We have this Infrastructure resource called cluster that has various information. So here we see cluster info We'll give you that too actually cluster dash info. Oh Okay, well, I'll try that later. Um, but as you can see here the the url matches, um The cluster deployment name that t s s c h Um, so I wanted to show you this one I've used before so I wanted to show you if I tried to use uh claim number two What actually happens behind the scenes? So um ck use intentionally switches your kube config context to use that cluster But any all other ck commands try to not mess with your current context at all Um, but what happens when you use something like this for the first time? You'll see it's creating context. It fetches those credentials automatically Um prepares the kube config. There's a bit a little bit of fiddling that it does with that Creates a service account so that you're not constantly having to log in. You have a reliable session Um, and then it actually backs up your personal kube config file before updating it to add the new context For this cluster claim and it switches um so A cluster keeper is all centered around kind of these contexts for cluster claims. So Um, now that my current context is claim number two I can use commands like cluster claim console and I don't have to give The name of the cluster here. So it will infer that for my current context So what that will do is open. Um So it actually um copies the kubemann password to the clipboard and it opens this in the console. So um, these aren't set up with a proper certificate, right? So, um, I'm going to get this warning Should get two of those. Here we are. Yep Make sure you're alive Really makes you earn it So here I can just type in kubemann. That's not too hard to type and the password is already on my clipboard for me And there's our console um So I wanted to show you, um There's a similar one for accessing a cm or rakam. So Um, this cluster for that dale created with start rakam. I haven't touched it yet. Um There's one small caveat here, which is, um cluster keeper Use a service account. So I need to enable permission on that cluster But to work to be accessed by service accounts. So I'm going to run that first And then if I do ckacm We should have our uh, well, we have we go through the same steps. Um, obviously creating the context matching those credentials Um, and eventually we should see the browser open a new tab And similarly copy the password for us I'll have to go through the same cert stuff Nice. Now, let me ask you a question. How did you know that that cluster had a cm on it? Other than the fact that we've been sitting here watching dale work with it Yeah, that that's just by by convention. Um The tool does look up Uh, the route for a cm from the cluster. So if it if it doesn't have a cm, this will fail Okay, that's good. That's good. Like yeah, you want that If I accidentally do something if I point like a command at the wrong cluster, I want it to know Like, you know, just randomly start installing crd using crazy stuff. Like tell me nope. No a cm available Right Um, so I mentioned um, but it usually tries not to change your context. So say you're working with one of these clusters Um, someone asks you hey, can you run this other task on this other cluster? Um, so whoops, this used to be called cm. So my fingers still sometimes type cm instead of ck So there's a similar command called uh ck with And that takes the the name of it. Whoops. That was the password there. Sorry about that We will be deleting these clusters after anyway um So say I was using claim two So my oc is targeted on claim two and I don't want to change it. I'm busy doing something But I want to run something against claim one um So I can use my cluster keeper with command so that will Extract the kube config for this cluster here to a temporary file set the kube config variable Environment variable and then run this script so that you can carry on With your regular regularly scheduled programming That's handy and What I usually use this for actually is for working with the the cluster pool host So um, some of the things I haven't added in cluster keeper are working with the cluster pools So um, dale has the cron jobs to automatically scale the cluster pool size up and down Um, so I haven't adopted that yet So on my team mostly we've been just keeping that at zero clusters ready Because we haven't been creating new clusters all that often Um, but you can use ck with um, so for example I think I'm probably done with the browser. I'm going to make my window a little bigger here Um, so remember ck is that special context for the cluster pool host. So then I can say Um, I can run a regular oc command to edit say the cluster pool, which was what twitch demo 470 Am I reading that right ck with ck? Wow, okay. So the exception thing comes back to house Yeah So this is just if you didn't want to change your current context, um, and you could you know come down here edit your size for example Um, I like it. No, you can also say use ck And now your context is ck and then you can run your oc commands against the cluster pool host Gotcha. So you're editing the cluster pool Sizing and parameters that gurney showed us at the beginning of the session today where he said I've got these four things already queued up Yep, nice. Um So the other bits, uh that I've added, um I think it's okay if I hibernate any of these clusters, right? Especially the one that you just exposed the password to yes Yeah Uh demo demo claim two is gone now. So it it's it doesn't Real I've cleaned both of them up already with shameless plug life card Um, wow, so there's a hibernate if you want to manually hibernate a cluster Um If I look at the uh, the list of clusters now, we'll see that that cluster is stopping um You have of course the uh I'm gonna switch to use. Oh, well, it's probably that's probably not. Yeah, it's gonna um So cluster keeper tries to help you out and do a lot of things automatically, right? So if you try to use use ck with use ck console ck acm It's going to wake up that cluster. Um, so that cluster was already stopping um The other thing it's gonna So this uh hibernating and running is done by editing the power states spec dot power state on the cluster deployment So I have gotten into some problems with hives sometimes if One of these operations was in progress and I was too eager and I changed the state again So another thing cluster keeper does is it checks for that. So if I tried to Run this this cluster while it's in the process of stopping and it might already be stopped But yeah, so it's waiting up to 15 minutes for it to actually be in hibernating state and then it will restart it Um How much time we got we're we're running close to time. Uh, yeah, you can run a little over if you need to though. Don't worry Run the edge of our seats Kevin keep it coming So that was manual uh running and hibernating. Um, so I did want to mention that on my team We are using another project called hibernate cron job And cluster keeper is designed to work well with hibernate cron job. So I think this might have been presented before on this show, but um, basically it just helps you set up Some kubernetes cron jobs that say I'd like 6 p.m. Every day will hibernate all your clusters Um, so that's what this schedule column Uh is all about in this uh list display um, so If I wanted to um Turn on scheduled hibernation for this cluster I enable schedule And then we will see schedule is true and um that hibernate cron job, um, it looks at a hibernate label On the cluster deployment, um for whether it should operate on that cluster or not. So Um because this one that has the schedule enabled that is set to true um if The schedule was if I then disabled scheduled hibernation This would be set to skip to tell that cron job not to operate on this cluster um So actually Yeah, um, so and then the other aspects. Yeah, that I wanted to mention is locks. So this locks column um on my team we share a lot of our clusters. So um, we do have them scheduled to go to hibernate every day at 6 p.m. But Obviously sometimes that's not realistic. Somebody needs to keep working later than 6 p.m. um, so there's a lock feature so I can say lock that cluster And that works by putting an annotation on the cluster claim So it tells you it uses my username by default and even though, um This cluster normally participates in scheduled hibernation Um currently that's being skipped because of the lock Um, what this also does is Other cluster keeper commands will warn users that this cluster is locked. So it's hibernating now if I try to Uh run it It says, you know, can't operate. It's locked by me um, you know use dash f to force if you really need to and The other thing you can do is you can you can put any old string in for the lock So we've actually integrated this with our build system so Our build system can wake up the cluster that it needs to use for testing purposes it can lock it With a unique id on the build And then we see that lock is in the list When it's finished with that cluster What it does is it will run If it's if it's during normal off hours, so after 6 p.m. Or on a weekend It will run the um ck hibernate command But it doesn't add the dash f so that way if any other build jobs have the cluster locked Um, then it doesn't actually get hibernated Um, awesome. This is starting to inform us right like these use cases are how we want to build out That user interface that's sexy experience for how we drive these things at scale Um, and these are these are exactly the types of scenarios you'd run into where someone's trying to claim But it's already in use or someone's trying to run, but it's in the hibernate mode or going into hibernate mode Um, and so yeah, my mind is just getting excited. I hope that the community is is ready to tap in and play um Open dash cluster dash management, of course is where you want to hang out or the new hibernated.com Uh Thank you there newly mentioned. Hopefully it's needed to all the dns servers across the globe Yeah, you might have to refresh your dns casual, but but not it should work. We're ready to rock Yeah, Kevin, I didn't mean to cut you off, but this is this has been great. Yeah, no worries. Um Well, there's a whole there's a whole list of sub commands here, so I didn't go through them all There is a shortcut for creating a new cluster Um, quickly, it's just you know, it runs through life card. It uses life card. It just um Speeds it up a little bit so you can type one line and then go Nice Awesome, different fun set of tools and resources. I hope we've You know met the level of expectations that you've put on us chris. We always have Thanks for having us back. No, thank you for coming back right like You all do such an amazing job of Putting the right people on the call doing the right thing So I appreciate all your behind the scenes work And and definitely the fact that all this is being done out in the open And this is going to help not just one company that's going to help lots of people. Yeah We're having fun too And these are these are projects that come out of the demand that we find in house And we think other other teams are probably suffering these same challenges You know across the globe to come play with us in open cluster management We do meet on Thursdays for the community call and you'll be able to find out At the open cluster management site Awesome. All right. Well, thank you all so much for joining me today Thank everybody for watching out there. You could turn out for the show lots of Now lots of repos dropped. So go dive into that code base And uh check out, you know, what you should or should not be using potentially for your cluster Uh, yeah until next time. Thank you all so much. I appreciate it Thank you, Chris Stay safe out there everybody. See you Thank you