 Good morning, good afternoon, good evening, and welcome to a very special Ask and Open Shift Admin. I am Chris Short, host of most things on Red Hat Live Streaming. I'm joined by two of my other favorite hosts, Andrew Sullivan and Christian Hernandez. You're all my favorite host if you're watching out there. It's like all your favorite children. Right, yeah, I can't have a favorite child, right? Like so, you're not my children, but you're all my favorite host. The dog is my favorite child. Yeah, exactly. Yeah, anyways. The dog doesn't run up college expenses or anything like that. That's true, yeah. So I'm Chris Short. I work with these two clowns. I mean, people here at Red Hat. Andrew, this is your show. Let's, you know, it's, I know you were joking, but yesterday I was literally thinking, you know, I wonder if we should replace like that intro music with like sending the clowns. Yeah, that's right. Is that public domain? Because I would love to do that. Just for this show. Or Yacky Stacks from Benny Hill, right? Yeah, yeah, that would be great. So yes, hello everyone. I am Andrew Sullivan, technical marketing manager with the hybrid platforms business unit now at Red Hat. And yes, it is a great Red Hat day as every day is a great Red Hat day. So I'm very privileged to be here, not only with Chris Short, the producer with the most, the hostess with the mostest, but also my other other favorite co-host, Christian Hernandez. So Christian, if you don't mind introducing yourself for anybody who may not know you. Yeah. What can I say about myself? What you guys haven't already said? So yeah, also a principal technical marketing here, senior principal, I guess now. And yeah, here, happy to be here as always. Andrew, thank you for the invitation. I've always been admin at heart for most of my life. And so it's always good to get back and talk about admin stuff. I've been deep into the dev stuff as of late. So I'm always happy to join. You sure have there, Mr. Get Ups? Yeah. Yes, right, yeah. So, and there's a reason, a specific reason why I asked you to come and join me on this show. And that's the three of us, all three have a strong ops background. You know, we all three were, you know, customers, right? We were all three administrators of REL, of others types of systems at some point in our careers. And I think that that gives us a lot of background and can give us a lot of perspective on today's topic. And this is one that we love to talk about internally. We get a lot of questions on, you know, coming from our fields, through email, through chat, through all of the different forums. We also hear about it from you all, from our audience, from our customers, so on and so forth. And that is in case you can't see the title of the stream, high availability. And we wanted to take this opportunity where we wanted to use the stream to talk about high availability from a couple of different aspects. So one is, of course, high availability of the applications, right? What you actually care about being available at all times is the application. So we wanna talk about application level, high availability in the context of OpenShift and all of the different things that can affect that, things that can go into it, so on and so forth. And we really want to have this be an open conversation, right? I will fully admit as Christian and Chris or we're on a lot of meetings together, you all hear me say all the time, I don't know all the answers. So I also can only dream up so many use cases in so many scenarios, right? In addition to the ones that I personally have experienced. So I'm very curious for anybody watching, whether you're at home or at the office, wherever you're at. What are the HA things, scenarios, questions that you have? Please don't hesitate to put those into the chat here. As always, I am happy to jump in front of that bus, right? Which is ask us the hard questions. I brought Christian here to answer all those questions for you. Yeah. There you go. So what I like about getting different admins perspectives, especially having you guys here with military background is the different use cases, right? Because like you said, Andrew, not all of us have all the answers, right? I come from e-commerce background, right? Startup sort of thing. And you guys had to worry about other things in IT, like what if a tank rolls over the fiber cable? Like that's the scenario that I would never dream of. So it's always great to have different admins from different backgrounds to get different kinds of stories. I agree to talk about some of our worst outages. I still think I've got some good ones, but let's not dive into that. I think there's also a lot that goes on. This is something that I learned early in my career. There is two, and I know it's easy to generalize, right? But two kind of general personalities that I see when it comes to a lot of these types of admin actions, even HA sometimes, right? And that is there are people who, unless they are told that they can do something, they won't do it. Right, like there's risk of first people. Yeah. And then there are people who, unless they're told they cannot do something, they will do it. Hello! Yeah, I know. And generalizations, they are what they are. They're a generalization, but that mentality and recognizing that is super important when you're talking about some of these things. I used to work with a guy, still a really great friend of mine. He's over at VMware now, and he was a Windows administrator who became a virtualization admin. I was a Linux admin who became a virtualization admin. And like Windows, the common, at the time we're talking Windows 2000, Windows XP server 2003, right? The common fix was just rebooted. It'll come back, it's fine. And I'm like an old Solaris admin in Dac and all this other stuff, right? Dac Alpha systems, like you never reboot systems or rarely reboot them. So yeah, it was a very different mindset shift. And virtualization, especially in my virtualization career, we used a lot of NetApp NFS. And NFS, I scuzzy, right? They all ride the network. So you got to train the network admins, like, hey, don't just reboot this router. Like I understand that desktops don't care when they drop a few packets, it's fine. Chris and his lovely connection. But well, when a bunch of VMs suddenly lose access to their data store for three, five, 10 seconds, really not that much, but it's a big deal. I remember early in my Air Force career, I was helping troubleshoot something at the base level. When I say the base, that's like, think of it as like a metropolitan area network, right? It's massive, there's a lot to it, but I walked into the main data center for the base and I'm looking at one of the like, not to help this person, but like tier two guys and he's standing in front of the router, which is something that he knows literally nothing about. I like where this is story is going over. Yeah, so like they brought me in from my, you know, BGP experience and everything else and they're like, we're having router issues. I'm like, but we're up, right? Yes. So I was just gonna restart the router and see what happens. Now, if anybody knows anything about Cisco routers in the early 2000s, you don't ever restart them. Never, unless you're applying a firmware update, right? Like restarting them will just put you back in the same bad states you had before. It doesn't fix anything. So yeah, I literally had to walk in and like grab his hand as he was about to hit enter on the restart command and like, no, you can't do that. Also, who gave you access to this? How did you get here? Yeah. So yeah, I see your conversation around Solaris zones and they're Christian, you know, you're the hipster of container address. Yeah, I was doing containers before they were cool, right? Was Solaris zones and free BSD jails, right? Like we all remember those technologies. Who thought we would have been here, right? Just going from there and now we're here. So I was like, oh my God, like some of this technology has been around for a while and now we're getting to the point where we could actually really leverage it. So before time gets too far away from us, I want to quickly roll through our top of mind topics. Should be quick this week. I do see Han Solo there. Is there a full D-Stig, Ansible playbook slash Crypto Harden and patch up rail seven, eight vulnerabilities? I don't know the answer to that. There is. I think there is. It is. It's an FY update. Well, yeah. There's always that. But there are, it's called Ansible Lockdown. I had to look up the name. Oh, there you go. It's done by the Mindpoint Group. So let me just find a link to that real quick and drop it in chat for you. I sometimes forget that you were an Ansible, over on the Ansible team for your first year so at Redhead so. Yeah, it was a very, very good experience to have coming from the Ansible community into whatever we're calling ourselves, hybrid platforms. Yeah. We changed the name. Yeah. Like what was our name again? All right. So the first thing I wanted to talk about or first thing I wanted to bring up in case you missed it is we recently announced alongside Nutanix support for OpenShift on AOS. So for customers who have been asking us about this for a while now. So previously you could always deploy OpenShift to Nutanix hardware with vSphere as the hypervisor and have a fully supported solution. Technically you could deploy using the non-integrated platform agnostic bare metal method. And we consider that a non-tested platform which means that there were some limits around the support that was possible. So now that caveat has gone away. So even though this first set of integration released is still a non-integrated deployments, it is now a tested platform. It is fully supported in all of the aspects. Nutanix has a CSI provisioner. They have, let me see. I have a cluster in here somewhere. So if I go into operator hubs or operator hubs, operator hub, search for Nutanix. Okay, I know. So you could see that they have a certified CSI operator in here. So lots of great stuff happening inside of there. I believe that we'll have a session at Nutanix Next. Whenever that happens, I think that's in late September. That's talking about this as well. So if you are a Nutanix customer, keep an eye out. There's lots of stuff that's happening here. So the other thing, or the next thing I wanted to talk about, which is conveniently in line with the question that Han Solo asked, which is if you missed it again, a little bit of noise happening in the community these days. The NSA, the US Intelligence Community's National Security Agency, released a Kubernetes hardening guide, which is kind of mind blowing to think about it, coming from Kubernetes of six years ago. Right, like Pizarreia and an NSA guidance on it. Okay, cool. I'm posting links in the chat here. Cause I forgot to do that on the previous one. So there's the first link. Here's the second link for the NSA hardening guide. And there's a link directly to that hardening guide inside of this blog post. So yeah. And they credited and took a lot from the CIS benchmarks, but they also had some of their own things to add on to it, right? And yeah, it's been a hot topic lately. Yeah, and I'll note that this blog post, which was written by, and it's a little bit terse in my opinion, but so Michael Epley, I know Kirsten Newcomer, so two of the security experts inside of OpenShift and in the BU here, they contributed to this blog post. And there's a lot of stuff internally about how OpenShift already meets many of these guidelines. There's a lot of stuff inside of here that you just get out of the box with OpenShift. I don't have details there. I know the marketing folks and I'm sure that your account teams, if you wanna reach out to them, can access or reach back to us for access to those things. But yeah, it's an interesting hardening guide. I still recommend rates. An NSA hardening guide is not necessarily a... It's definitely not a protector of all things. Yeah, and nor is it a standard, right? It's not a Stig. It's not a HIPAA guideline or anything like that. The compliance operator, all of the other tools available inside of OpenShift are still applicable. This is just the NSA's perspective on that. And I see Freeman, I don't know what version it's actually for, I didn't pay attention to that. Reddit? But I didn't mention... Yeah, I skimmed it, but I don't recall seeing a version of it. I skimmed it enough to see that they're using deprecated PSP, but other than that I haven't seen a version. But deprecated PSP isn't going away until 125, so... Yeah, well, I mean, it goes through that lifecycle of like 30 releases or whatever, whatever that is. 30, like four. So, Neon Eyes... It feels like 30. I see your question here about Bitescent, Bite Received. It's definitely normal to have more Received than Cent. Yes. Especially for like a home internet, right? You're downloading, you know, you're streaming YouTube or, I don't know, whatever the kids stream these days. Twitch, maybe. And yeah, Twitch. You're receiving a lot more data than you're sending in that instance. At least unless you're the streamer. Like us. Yeah. We're sending a lot. No, even then I still download more than I have. Oh, yeah, mine's like a landslide, yeah. Yeah, my family are cable cutters, so we stream everything and we do... Likewise. Yeah, so something like two terabytes of internet a month. And it's all streaming services. We have, you know, anytime the TV's on, it's streaming, you know, some channel. Yeah. Anyways, not to get distracted. So the third and final thing I wanted to talk about and Christian, Christian, I appreciate your opinions here, along with the audience, is we had a question that sparks some debate internally. And that was one of the internal chats. And that's when should I use or when should I create and use more than one namespace? Because like we, you know, you look in the docs and you see mentioned, you know, you, hey, deploy this into a new namespace. Hey, create this here. From an application perspective, when is it appropriate to use more than one? Andrew's response was more or less, it's a mechanism to apply RBAC and other kind of control policies. But there's no beyond that, right? I need to enforce a quota at the namespace level. I need to prevent this user from accessing other resources, right? Namespace resources. Beyond that, there isn't a technological reason, right? You're not, you know, changing performance, you're not, yeah. Yeah, also the network policies, I don't know if you mentioned that, right? Because like if you need network isolation, right between certain components, that'll also dictate what goes where and what you do what. I always say it's a reflection of your environments, like current policies, your environments, like how you do work, right? It's a reflection of all of that, right? So it just depends on how you have your environment laid out, right? How do you have those, you know, lines of businesses, right, laid out. So one team can have five namespaces, one team can work off all of one namespace, it would just depend. So speaking of, you know, resiliency and high availability, I also get a 404 on that, NISA, NSA, CISA page. It's by design, right, secure. Even though I went to the blog and clicked the link to it, not sure how that's happening. So I will alert Alex Handy if need be. But as far as namespaces go, like I treat namespaces like I treat folders direct basically, right? Like I know that I can lock them down to a point, but past that it's not a security feature anymore, it's just a resource utilization kind of collection, collection of things. Or even tracking purposes. Right, like it really depends, like what I have to broaden the question even further, what I have a namespace for dev stage and prod on the same cluster. No, don't run your dev environment on the same cluster that runs your production environment. That's kind of my opinion there, but that's, you know, aside from just organizational purposes, that's kind of just how I treat namespaces, just to organize things or, you know, isolate them if, you know, some namespace needs to be isolated in a certain way versus another namespace in the organization because of regulatory reasons, for example. It's namespaces aren't a security feature, so I did not treat them like that. It is just isolation basically. So yeah. Convenient buckets, right? Yes. Can you run Dev and prod in the same cloud? Absolutely. But remember, if they're in the same cloud account, everybody has access to Dev and prod. That might be the best thing for, you know. What was, there was a company that they ended up going out of business, I think, because they were hacked and all of their production and backup resources were in the same AWS account, so whoever accessed their stuff was able to wipe all of them. Was now wiping all the things. Yeah. That, yeah. You have to be very careful about that. Like think what happens. Think of your, like you have to threat model your things, right? So what would happen if worst case scenario? Attack surface, right? Gotta take all that into account. Yeah, you gotta think about all those things. And if you don't, don't be surprised when something goes bumping the night and everything breaks. Which is a good segue, right? To that threat model or risk assessments also applies to today's topic, right? High availability of, and it's funny because in our little cheat sheet, the notes thing that I created here, you know, one of the first things that I have inside of there is essentially to identify what it is you actually want to protect against. Right. Kind of a critical thing, right? Why do you want HA to begin with? You know, what are you trying to solve? Yeah. So I see here, Freeman, can you run Dev and Prod on the same cloud? Absolutely you can. There's no reason not to. I think it's entirely based off of what are your requirements for protecting those workloads? And also for, you know, Dev to me is pre-prod, right? It's non-production. If it goes down, yes, it impacts, you know, the Dev team and all of that. It's not impacting your customers. So if you have a cloud failure, right? And, you know, we know this happens whether it's due to DNS or SDN, which are usually the two that affect major chunks of the cloud or the internet rather. Even data centers fail, right? How many times has AWS is? Yeah, oh God. Yeah, yeah. What is it? US East 1 fail? US East 1 is the tortured region of the world, yeah. When US East 1 fails, the internet goes down. So yeah, it all comes down to again, that risk assessment, which goes back to high availability. And what are we actually protecting? First, what are we protecting? And second, what are we protecting against? Because just like with disaster recovery, we have to identify our recovery time objective how long will it take for us to get back up and running after a failure? And recovery point objective, RPO, right? How much data, how much information am I comfortable losing? Hopefully the answer is zero, but it may not be. I'm an old storage guy, so zero is always the answer. What's your tolerance, right, for data loss? So you use those two metrics in order to gauge, well, how much infrastructure, how much investment, how much time, how much money do I have to use in order to protect or protect against disaster recovery? And then you take into account the scenarios. Am I protecting against a rack falling through the floor? Am I protecting against a data center falling into an earthquake premise, maybe only applicable for Christian out in California? Yeah, falling to the ocean, yeah. Yeah, am I protecting against a meteor strike that takes out all of North America? Yeah, anything's possible. So, H.A. Disaster scenario, right? Exactly. So H.A. is similar. So disaster recovery is essentially, is in Andrew's definition, disaster recovery is, the site is down, the application is down, whatever that happens to mean, how do I get it back up? H.A. is more traditionally, it hasn't gone down, but there is risk of it going down, right? Of something has happened, how do I bring back those resources to prevent it from actually causing an outage? You could say that DR is a superset of high availability or H.A. is a subset of DR, depending on your perspective. But it's very important questions to answer. I'm reading the chat. I'm sorry, I am not reading the chat, I am organizing tabs. Data center catching fire, I've been there, had a data center catching fire, had the same data center, also flood, that was a fun one, not related to the fire. Yeah, I've done on fire, I've done struck by lightning and I've done military thing happened. Military, that's as much as I can say. Yeah, basically all I can say really, but yeah, just think of all the various scenarios that can go wrong in a war zone. That's, there's a lot, yes. Yeah. I think the funnest one I had to plan for, most fun one that I had to plan for was back to data center flooding. So we had raised floors and the main water pipe was a 12 inch water pipe, chilled water pipe. And the volume of water that was flowing through there, going out to the cracks and all that stuff, it was we have this much time before the room literally fills with water. Wow, okay, fun. So yeah, it's scenarios like that that you gotta take into account. So HA, so Christian, I wanna ask you and get your opinion, coming from, again, Chris and I have a lot of military background, which is a little sometimes different, sometimes the same, sometimes different. Coming from the commercial side, right? What's a scenario that you would have to contend with from an HA perspective versus a DR perspective? Yeah, so from an HA perspective, we're talking about very localized failures, right? So wait, you're talking about things like a rack, losing power, right? Going on battery power and the battery draining for whatever reason on that particular rack. Or you have someone, this actually happened, hitting a pole down the street and your basically network connection goes down, basically internet goes off. Yeah, yeah, well, I mean, yeah, that's where you get less tanks and more people running into poles. I mean, and go ahead, I'll let you finish. Well, it's just localized, right? Localized failures, right? You're talking about like a server, which is going out, right? Just frying, you know, disks failing, right? Server going offline and high availability. It's trying to protect against those things, right? Those localized things worse. Like not my entire data center hasn't gone down. Maybe particular components have gone down, but my site is still up and running because you designed it in such a way where you have high availability, as the name indicates, high availability. Whereas disaster recovery, you're talking about like offsite backups we had in my previous job, we had what we call, oh no, now that I'm about to say it, I lost my project, Greenbrier, Greenbrier, you military guys should know that offsite where we call, I think it's Greenbrier, Greenbank, yeah, or something like that. No, Greenbank is the radio observatory. Greenbrier, I think is correct. Is it Greenbrier? Yeah. So we have that, we're essentially- It's golf resort, essentially. But it was also the backup. Yeah, yeah, backup. All of government, yeah. So where essentially it's like you can basically bring another site up quickly to recover from a disaster, something like the whole date, like you guys' situation where like the data center filled with water or caught on fire, or whatever, like we've always here in California, we had a plan for like the big one, right? Like if an earthquake hit in our site was off for two weeks, let's just say, right? Last major earthquake, right? Like here in Southern California, it took some things offline for like literally like two weeks, right? So it's- What's interesting to me here is kind of all three of us, you notice, immediately went to like physical data center level things that are happening. So here's an interesting question for you, right? Is a pod failing an HA event? Yes. I would say yes, yeah. Right, and if you think about it in the same sense of let's pick on vSphere, right? If a VM fails for whatever reason, it doesn't matter if it's a node failing, right? Whatever happens, if the VM fails, there's mechanisms to restart that virtual machine. So you could argue that from one perspective, and we're all data center system admin ops guys, right? We think infrastructure up, right? We built our careers, certainly I built my career off of designing and implementing infrastructures that's provided as many nines as possible, more or less as cheaply as possible, right? And that's how we kind of went about doing things, certainly until probably, you know, cloud native became a thing. And then suddenly there was this dramatic shift, right? We have the 12 factor application. You look at the 12 factor application, right? What is it, 12factor.net? Yeah, I think it is. Yeah, so you look at this 12 factor application definition, and the goal is to define, to put in virtual Inc. You know, what does an application that is resilient and capable of dealing with that infrastructure level failure? So this was an interesting mindset, and this happens back in the, you know, when we think about application architectures, in the late 2000s, we started to see the three tier applications, right? We started to see this thing where we're more horizontally distributing things, largely as a result of scale, and in going in that aspect, right? The, what was it, the slash dot effect, right? It was one of the first ones, right? Back in the early 2000s of, how can I accommodate this massive influx of traffic coming in? And it turned out that that was actually really well suited for things like, okay, now I'm going to AWS, remember AWS launched in 2008. And AWS, for a long time, never had an uptime of, you know, guarantee an SLA for an individual AMI, right? Or if they did, I think it was two or maybe three nines. Yeah. So the application teams started writing and creating applications to accommodate that failure. If I've got a hundred instances of my application at any given point in time, at least one of them is going to be out, right? As infrastructure guys, we know this, right? I used to have a gigantic object store, right? Literally hundreds of servers, each one with I think 16 or 24 drives in it. And every week, Tuesday mornings, my hardware rep would come in and we'd walk through the data center and we had a little cart and we go through and pop out hard drives and put them in the cart and then he'd go and get new ones and we'd walk back through and put them back in, right? With enough hardware, with enough infrastructure, something is always down. So the applications began to understand how to provide their own resiliency, their own high availability. So I think, yeah. Well, that's, I think that's what's chaos engineering, right? So where, yeah, the whole Netflix story is essentially you build with, and that's kind of like a good thing. I think it came out of AWS because I was always trying to get my developers to not trust the infrastructure, right? Is, since things can go down, like you should write your code to accommodate for that, right? No, exactly. And this is one of those, I have conversations with folks all the time of, the infrastructure is really good at certain things. And yeah, I can keep adding nines, but every nine is going to get exponentially more expensive and exponentially more complex. Whereas the application can be written to be super simple and rely wholly on that infrastructure. Or the infrastructure can be super simple and highly unreliable and the application can do that. How about we find a middle ground, right? Where the application is really good at some things and the infrastructure is really good at some things. So I know we're getting slightly off track here, where it's a related tangent. So just to quickly catch up on chat here. It's a relevant tangent. Yeah. Any idea how to restore a cluster from scratch having an SCD backup? Woof. Whoa, not everything's going to be an SCD. I mean, yes or no, but ooh. Yeah, so the problem is, there is no awareness in that instance that effectively you have a new cluster. So you're going to provision new worker nodes, new control plane nodes. You're going to wipe out that new SCD and replace it with the old SCD. And that old SCD is going to say, who are all of these nodes and where did you come from? And it's immediately going to try and make things look like it used to look like, even though none of those things are actually what they used to be. And won't work as a result. Yeah, so this is why Andrew's personal opinion is at CD backups are not a disaster recovery option. At CD backups are for I lost one, or let me rephrase that. They are only a disaster recovery option if you still have the majority of the cluster in place and it's just SCD that failed. So something went high, haywire inside of there, whatever. You lost two of the control plane nodes, whatever that happens to be, you can recover at that point. But if you're starting over from scratch, that's much harder. And that's where things like getups come into play. Yeah, yeah, because it's, people have a misconception of treating, since we call it at CD, the database of Kubernetes, but then people, I guess unknowingly, treat it like a database. And it's like, you really, you really shouldn't treat it like a database because it's only a key value store. Yeah, it's only a key value. You don't treat it like a regular database. It's not, it's, and the, like Andrew was alluding to the real backup mechanism, right? Is in two places, right? Your application manifests backup in a getups fashion and in the storage, right? So having, for your stateful applications, having that storage resilient, I believe we already solved that problem with things like NetApp and Tachi, AMC, what have you, right? They're really good at those things. And that's kind of like meeting in the middle, kind of like that DevOps mentality, getups mentality, where you have resilient storage and then resilient application development processes like DevOps or getups. Right. The biggest thing is make sure you have a copy of every piece of YAML you've ever applied to that cluster, right? Seriously, right? Like, I know we make it very easy to just go in and click, click, click, put things together for you, but you've got to make sure that you have a process for saving that configuration data somewhere. Yeah. So I'm gonna put us back on track for a moment, yeah. So I just shared, I know, right? In a very obvious, very, yeah. So I just shared in the chat here two links to the two that are up on my screen here. So the second one is actually the first one I'm gonna talk about, which is this KCS surrounds, you know, OpenShift Container Platform, high availability and recommended practices. So this KCS is rather interesting. It walks through a couple of different aspects. This KCS, if you were to look at the history of it, actually started way back in like 2016, 2017, something like that. So back in the 3.x timeframe. And some of that is evident as you kind of walk through this or read through this. So in the 3.x days, and you can see we kind of highlight this here, right? We would often, you know, it was supported. It was entirely possible for you to deploy a cluster with a single control play node, a single ETCD, etcd, you know, instance. And of course, that's not highly available. If I lose that one instance, my cluster is now toast. So I could recover it at that point, again, assuming that I have my worker nodes and the other things still present. But during that time, everything is effectively down. So we always recommended having three or more nodes for that resiliency. Why three or more? Because we need to have a tiebreaker rate. We need to have quorum. And that requires a prime number of nodes in order to do that. So OpenShift 4, we made kind of the command decision, right? It's a highly opinionated thing of we're going to have three control play nodes, always three, only three, never more than three control play nodes. I would say never less than three, but that's on the cusp of changing with single node OpenShift, which has its own set of limitations, by the way. So with three control play nodes, from a cluster perspective, I now have the ability to tolerate a node failing, right? Or if not failing, at least being offline for some period of time, for reboots, for upgrades, for whatever happens to be going on inside of there. So three nodes gives us that ability to tolerate some failure inside of the infrastructure. So a lot of times, and by far the most frequent question that we get asked when it comes to OpenShift high availability is, I have two sites, how do I do OpenShift high availability? And the answer is with one cluster, you really can't because you're still going to have one site that has the majority of the nodes. And if that one site goes down, well, you're right back in the same scenario. And we've kind of alluded to this and Christian, I'll be curious about your thoughts. So we've alluded to this on other shows. I think I talked about it, now it escapes me. I think it was when Brian Bodwin was on, of using fault tolerance or similar technologies, right? So VMware fault tolerance is effectively one VM that is executing in two places. And if the primary one happens to go down, the secondary one basically picks up without any interruption to what's going on. So can I use fault tolerance to have that third control plane node run in both data centers essentially? And technically, yes. In practice, the problem is there's a lot of overhead associated with that, which has a lot of performance implications, which means that there are scale implications to your OpenShift cluster. So technically possible, but not recommended. What I always say when someone asks me, and maybe I'm a little bit more abrasive when someone asks, I have two data centers. How do I set up OpenShift to be HA? And my answer is always install two clusters, right? Because effectively, as you noted, Andrew, you can't, right? Like you're asking how and where you just can, it's just like the limitation of the technology of STD. It's wanting it to work out, it's not gonna make it work out, right? So then you have to kind of change your thinking a little bit when you have some like a situation, right? I mean, not everyone has three data centers, right? So not everyone has the ability to have three, like I totally get it, right? Like it's budgets, budget. And especially three data centers that have connectivity requirements in line with what we have hardware and everything else, yeah. Yeah, so like if the two is the best you can get, change your thinking a little bit and go up a level saying, okay, well, let's focus less on stretching the cluster and more about stretching the application because now you can have two either active passive, even active active, right? If you want to get fancy with it, but you can have different scenarios, right? You need to bring that thinking, I think, up a layer when you start thinking about two-day sit-iners. Exactly, so on the screen here, so what I'm sharing and I posted the link into the chat is a blog post written by Raffaele back a year ago now, August 21st. Wow, I'm surprised we're having fun. So talking about exactly what you just said, Christian, with two data centers, essentially the best way to achieve high availability is effectively an active active type of deployment for your application, right? And you can see this is using a global load balancer, you know, and it sends traffic to both places. So I had a bit of fun and I was trying, I was researching this topic and I was doing some other things and what I thought was interesting. So if I go in here and type in Kubernetes high availability, I can talk and type occasionally. So you see the second link that comes back here is this Microsoft's high availability cluster pattern with AKS and Azure Stack Hub. And you'll note that they recommend the exact same thing. There you go. That traffic manager is only part of the, you know, like, right? Well, traffic manager is a load balancer. You gotta have, well, but yeah, you still need to replicate storage, still need to do all these fun things to maintain that kind of environment, right? That design doesn't change, right? For your data going back and forth. And so my point is this isn't Red Hat being difficult, right? This isn't Andrew, you know, playing ostrich, sticking my head in the sand and saying, no, no, right? This is too hard or you're doing it wrong. This is really from multiple sources, right? The recommended way of doing this and kind of what I was saying before of the infrastructure team, the apps teams, right? We need to work together to find the best solution to the problem. And oftentimes that requires, you know, yeah, I want to be able to tolerate a single node failure in one data center transparently, okay? That means that, you know, certain things need to happen at the infrastructure layer, vSphere HA being an example. But I probably also need to do some things at the application layer. Even in that same exact scenario, if my OpenShift admin hasn't requested things like, and I'm assuming there's silos in place here, but like, have you put in place anti-affinity rules for your control plane nodes? Because you might have 10 nodes in the hypervisor cluster, but if two of those control plane nodes, OpenShift control plane nodes are on one hypervisor node and that one hypervisor node fails, well, you haven't done anything. Similarly, at the application level, do you have, you know, scheduling hints? And what I mean is, you know, hey, I've got this, you know, super critical core service to my application, microservice for my application. And I scaled it out to be 10 pods in size so that way, you know, we're good. We can tolerate a node in the cluster going down. But without something like an anti-affinity rule, you know, request in place on that pod definition or deployment definition, you still have some risk there, right? Because you could end up with more than one of those pods on the same node. You could impact the, you know, the service at that level. So it's important to be cognizant of those things at multiple levels. So there's a blog post, let me, I actually don't have this one up over in this browser, so I'll pull it over here. There you go. So this person talks about similar things of distributing your application across all of the available resources. Right, so I will copy and paste this link into chat. So effectively, one aspect of high availability is spreading it out and making sure that it's spread out. And that includes not just things like anti-affinity rules and that type of stuff for pod-to-pod anti-affinity rules, but also, sorry for bumping the microphone, but also, you know, taking advantage of things like scheduler profiles. So let's see, if I go to the docs.openshift.com and we'll take a peek here and go to scheduler profiles. So the scheduling profiles give the core scheduler a hint as to, hey, I want to spread out or I want to consolidate my pods. Low node utilization means that it is going to prefer nodes that have the least amount of workload on them for pod scheduling. High node utilization means the opposite. It's going to prefer nodes with higher utilization. The rationale here is, well, low node utilization is kind of what we've come to know from hypervisors. I want to evenly distribute the workload across all available nodes. High node utilization, in Andrew's opinion, is most applicable for those times when I've got my OpenShift cluster deployed to Azure. And when I deploy new nodes in Azure, I'm being charged for those nodes, for those virtual machines inside of Azure. So I want to have as few nodes as possible to keep my costs down. And maybe that's one of the reasons why you would want to do something like this. Would, I would imagine that would affect the eviction time when you do like, when you like cordon a node, right? It'll take some time. So that you just have to keep some of this stuff in your head when you're... Yeah, so it won't be necessarily instantaneous. It'll be at the scheduler's convenience, right? Like, think of all the things that the scheduler's doing at the time. As well, yeah. Yeah, so that's a good point, Christian, of, you know, again, I'm a virtualization admin, right? I have a virtualization background, I should say. I'm no longer a VM admin, really. Not yet, we're gonna need you soon, yeah. So in the virtualization worlds, and for those of us who are VMware admins, right? We remember things like admission control policies and our DRS clusters, right? We, you keep a set amount of resources available for those HA events. So does the same thing apply with Kubernetes? Does the same thing apply with our OpenShift clusters? Maybe, maybe not. It really depends, right? Oh, yeah. These are all good questions, right? These are all... Going back to that risk assessment. Yeah, yeah, yeah. Of what am I, you know, one, what is my application doing? And this is a conversation, another conversation that I think, you know, it's not always an easy one because we have to have kind of a robust understanding and a good relationship with the application team. But like, hey, we're, you know, all in on OpenShift. Everything's going in OpenShift, Kubernetes, right? All of the application components. We're gonna bring this, you know, MySQL in. Or, you know, so some single point of failure. Is that the right thing to do? I pick on MySQL because it's a database and a lot of times databases, you know, you have one place where you write to, it becomes that single point of failure. But you have to really carefully consider whether or not containerization is even the right answer for all of those aspects. Does it make sense to keep some components in a virtual machine and rely on something like vSphereHA to give it what it expects and what it needs instead of, you know, okay, maybe I do need to refactor, re-architect the application to take advantage of something like CockroachDB, right? Where now I have that same, you know, SQL interface, same, you know, RDBMS, but it's designed to be that distributed cloud native, you know, type of architecture. Yeah, so I think Eric, Eric Jacobs put it quite crassly, right, is that Kubernetes OpenShift doesn't fix your crappy application design, right? Or I think as Kelsey Hightower eloquently put it, you can't rub Kubernetes on something and make it better, right, there's gonna be some tax you need to pay in order to take advantage of some of these, you know, some of these things that are in place for Kubernetes. Yeah, all these things that sound really, really cool, but it's not without its tax, right? You need to take into account some of the things like the application design, like we'll even the application, you know, take advantage of all these things, right? Or are we just gonna stuff our application in one big container and just hope for the best, which usually doesn't work out, so. Very much so, it is not a magic bullet. And containers, and I think Andrew might be a bit of a rebel in this respect of you, I know. What you were telling against Andrew. Containers, I think sometimes people forget that containers are an application deployment mechanism, rate of, I'm gonna stop sharing since I don't have anything to share at the moment, of I don't need Kubernetes to have a containerized application. And I can use, you know, Podman on my rail node to quickly deploy an application inside of there without having to go through the RPM, you know, install or a DNF local install, you know, without that whole process. So I can use a container to quickly deploy an application. I used to pick on SQL server for this, but somebody told me that Microsoft stopped shipping SQL server in Windows containers. Like if you've ever deployed SQL server on a Windows server, right? And I used to do this, right, with a virtual machine. You go and you pop the DVD into the drive, right? Physically or virtually. And then you open the installer, even if you just blindly click next, next, next, next, next finish, it's like an hour and a half of it doing something. I don't know what SQL server does and why it takes so long, but- If that's why in your timesheet, install SQL server two hours. Yeah. Perfect. So, you know, just installing the app, but not even getting it up and running, not even attaching the new database, all that other stuff, you know, it can be a significant amount of time. Or I can do, well, in Windows, it would be a Docker pull Docker run. And now I have it up and running, you know, on rel SQL server, you know, podman pull podman run. And now I have it up and running. It's as quick as my internet connection, basically. So, yeah, containers can be a deployment mechanism. I can still use containers in my development process and all of those things. Maybe I'm using them to instantiate in my dev environments, you know, a database instance that I can test with, but in production, you know, let the right component handle and provide the right type of high availability for that application service. And it's easy to pick on databases, but there's other examples out there. Well, yeah, the infrastructure should be a reflection of the need, right? You shouldn't have to, you know, you shouldn't have to try to contort your application or your needs into the technology, right? The technology should be in service of, you know, whatever it is that you're providing, whatever service you're providing. So the other, and so circling back, so pods, a pod failing is an HA event, right? And we rely on things like horizontal scalability to provide additional rate capacity, not only for the application to do what it does, but also in the event of failure. So do you all happen to know off the top of your head how long it takes for Kubernetes to recognize that a node has failed? Yeah, I think it's two to five minutes. Yeah, I wanted to say two to four, but now. It's five minutes. Five minutes, okay, yeah. So if a node fails for whatever reason, it's five minutes before the scheduler will take action. Basically, it gets declared unreachable and available and then it'll reschedule that workload. You can shortcut that by doing things like deleting the node from the cluster. Yeah. And so... That's a big hammer, but yeah. Yeah, it is a big hammer. So this is an issue with OpenShift virtualization, right? We have... Hey, I need my virtual machine to restart when it fails. And if Kubernetes takes five minutes to recognize that the VM, the node is gone, but we want the pod to restart before that, how do we do that? So Rhys actually figured out that if you do like an API, bare metal API, you can use a node health check that will quickly identify like, hey, this node stopped responding, it must be gone. And then the action is to delete the node from the cluster. And then all that workload immediately gets rescheduled and we can rely on the API mechanism to add a new node back in. So where I'm going with all this is there's another aspect as well. It's not just, you know, this deploy a pod, leave it alone, Kubernetes has these intrinsic mechanisms that just magically work. Health checks are important. Pod health checks, node health checks, right? Liveness probes, all of those things are critical towards ensuring that your application continues to be available. And I know that- There's also what's the pod, the ones where you specify minimum pods, the fault tolerance of pods, I forgot what's called as pod. Pod disruption policy or budget. There we go, yes, pod disruption policy, right? Where it's like, okay, I have three pods, but I can tolerate only running two. To take some of that weight off the scheduler to give it time to schedule another pod and another node as well. The PDB, pod disruption budget, it took me a minute to remember that one. So it's also really important for upgrades and updates. Because if the basically we're taking down nodes. And so in order to do reboots and all that other stuff. So if you cluster level things, we want to be as disruptive, as least disruptive as possible. Well, if we don't know that we're disrupting it, how can we do that? And in particular, so I think we've mentioned this before, but it's been a while. So when you do those, the number of nodes that is affected by for example, a machine config update or a cluster update is controlled in the machine config pool. So there's a setting inside of there. The default is one node at a time. You can turn that all the way up to like 100% of nodes because it can be number-based or percentage-based. So hey, reboot 100% of the nodes at the same time inside of this machine config pool. That seems bad, right? On the other hand, if I have a PDB in place, it's going to prevent some of those from being rebooted. So reboot as many as possible and then kind of go from there. I'm not advocating for a hundred percent. My any stretch of the imagination. But there's, again, my whole point of this is we have to get out of the habit. And the three of us, and this is why I wanted this here, the three of us are infrastructure folks. We have to get out of the habit of thinking HA only comes from the infrastructure. Right, it's a component, right? It's a component of a larger picture. Exactly. And Christian, you're a great example, as you said at the start of the show, and I didn't even prompt you, you volunteered it. You're getting heavily on the dev side of things, right? With get-ups and all that other stuff. So you see better than, certainly better than me, and I can't speak for Chris, of both sides of that and how both of them have to play together. Yeah, yeah, no, and it's a, like the whole, how long has Chris, you've been involved in dev ops for a while? How long has that whole idea been around? It's like, we need to work closely together more now than anything else, especially in this cloud native world, in this Kubernetes world. There's no more this throwing over the fence sort of thing or where infrastructure guys buy something and put it in place. And they're like, oh, by the way, this is what we're using now and the developers didn't even know, wait, what's this Kubernetes thing, right? Like it needs to be, it's a holistic approach, right? And especially when you're getting to something like, and I don't want to derail it too much, something like ACS, right? Where you're talking about application delivery, supply chain, where there's no such thing as dev, there's no such thing as tests, right? Like, because if you think of a factory, right? There's no, like, if you think of a factory putting a card together. And run it through the proving ground and then you make it in mass, right? Yeah, like there's no, like, Henry Ford didn't have like a dev, right? Oh, right, like it's just, it has to be perfect each time it comes out, right? And that's like kind of like that same idea. And that could only happen when, you know, when both sides of the house are involved. Well, I mean, Chris, you know, Mr. Dev Opsish, you know this as well as anybody. And if you're eagle-eyed, you can see I have like the Phoenix project and all that back there. So one of the most interesting things that came out of all of that was what I learned about the Toyota manufacturing method. Oh, the Toyota Cata is amazing. Micro-author up here at the University of Michigan. And yeah, so, so very much in line with what you just said, Christian, of at any point in time, you know, and if call it Dev Ops, call it whatever you want, call it, you know, being friends and buying coffee and donuts for each other. Yeah, yeah. The opposite of that. Dev team should have that, you know, ability to have a conversation around, hey, this looks wrong. We're going to impact this. And yeah, modern times, you know, I'm getting way over my skis here of, you know, we call it Dev Ops or SRE or, you know, whatever, whatever in vogue term is. But there's a ton of, and there's actually, I used to, I had this conversation with a number of customers previously around Brene Brown, yeah, Brene Brown does a phenomenal TED Talk around vulnerability and relationships. And you kind of have to have that willingness, that openness between Dev and the Ops teams, right? Infrastructure and the application teams to be able to be successful and all of that in this cloud native world. So I know I'm getting all mushy and stuff like that, you know, we're administrators. It's a team effort. I think that's the biggest thing, right? Very much so. You have to have a cohesive team in any application infrastructure combo these days. If you don't trust the other person sitting aside from the table from you, then you need to do things to build that trust. Or vice versa, maybe they need to do something to help build that trust. And whether they're self aware enough to do that is the question. So yeah, to kind of drive home to Dev Ops point, right? This isn't anything super new. Not at all. This is just a different way of looking at things nowadays, right? Well, I always say it is like, now we actually have the technology to do all the things we've been wanting to do, right? And so now we have the ability to do like, I remember talking about in my previous job, talking about autonomous data centers, right? Like a data center should be autonomous. And if one goes down, it doesn't matter, right? We lose, you know, those records are over here and we can reprocess those orders. And, you know, that was hard before a lot of this stuff, right? Like we're talking about this stuff. Yeah, it was hard for like, not only for the admin side, but also for the Dev side, right? Like we're talking like before Amazon, right? Before all these things were like, we have these ideas. Now we're the ability where, you know, anyone can do it, right? Like you have a credit card and some Kubernetes knowledge and you can actually build a pretty robust system now that we have the technology. So yeah, it expands more than just Kubernetes and all of that. One of the last projects I did before I came over to the vendor side and Johnny, who now works for Red Hat as well, he and I were co-conspirators on like regional disaster recovery for virtual desktops. Wow. And again, it goes back to, well, as an infrastructure admin, I can make the desktops, the VMs available, but what about the profiles? What about this? What about like, it's, anyways. So I'm reading through chat here. I think we addressed all of the questions. I do see, so Khalid, I see you asked about de-scheduler. I think I briefly mentioned de-scheduler. So de-scheduler is good for, like I have a node that is overutilized, however I'm defining overutilized. So going back to that, spreading things out in order to achieve, you know, a balance so that no one node affects an unfair or an undue amount of resources. So that can be a component of that HA strategy. Is there any others that I missed in here? I mean, we answered Khalid's question about, not Khalid, sorry, Manicatan's question about not just doing like a NCD restore to re-standup the cluster, which we touched on earlier in the conversation. So Alejandro, how is data consistent across two clusters? So that was going back to our global load balancer, you know, having two clusters and two data centers. So this is one of those, there's kind of two ways to do it. One of them is, from an ops perspective, infrastructure perspective, right, punting it to the ops team. Like, you know, CockroachDB, right? I mentioned that before of having a database instance that is naturally cloud native. You know, something like that. And there are certainly others out there. Cockroach just happens to be the one that comes to mind because I was recently fiddling with it in Operator Hub. And they do have a certified operator, by the way. You know, another way would be to do something like storage replication. You know, use ODF, OpenShift Data Foundation, which is the new name for OpenShift Containers Storage. And, you know, replicating that data from one cluster to another. That gets a little weird in Andrew's opinion because it would, on the secondary site, it's either, because it's two separate OpenShift clusters, but one storage cluster. So you might have to do some things inside of there. It might even be an external instance where you treat it as an external instance that's replicating data. That goes back to the GitOps thing of, you know, do I just use GitOps to reintroduce that PVC, that PV in a controllable fashion? Yeah, it's always, like you say, you can punt it to the devs and say, okay, well, design your application. This is what you got, right? You could also do it on the infrastructure-wise where essentially it's an active passive where you have your application, you have your things running, you're replicating your data over either, you know, using the NetApp snap mirror or using ODF replication, whatever, right? Your storage system needs to replicate off-site. And then using GitOps, you can, you know, when a failure happens, you just, you know, patch the storage, reapply the manifest, and you should like come up, right? So there's a, then that's like kind of like the active passive or hot, cold, whatever you want to call it, design between two data centers, or you can get really, really cool like Netflix and basically have two data centers and they're autonomous essentially, where, you know, you have data replicating forward, you know, cross-replication, and the application is smart enough to know that, hey, if this site goes down, I need to read data off this instead of, you know, set up over there. There's many ways you can do that. Autonomous. So I just noticed that we are, we're running past the hour. I know we don't have a hard stop today. Chris would have already stopped us. Yeah. But I don't want to go too long in the interest of respecting everybody's time. So I do want to say, you know, one, please for anybody still watching us, please feel free to submit questions into a chat. We'll address those as best we can. You can also reach out to me at any point in time, andrew.sullivan.redhat.com, or on social media, I'm on Twitter. Practical Andrew, all one word, as you've seen me in the chat here. So practical Andrew on Twitter. Don't hesitate to reach out. Chris, thank you for posting the Discord in there as well. Discord is a great way to keep in touch with us. Yes. So Christian, thank you so much for volunteering to jump under the bus with me. Yeah, he had no problem. So I do appreciate you joining us for your perspective, both from the ops, as well as the dev side, as well as your experience. So again, audience, don't hesitate to reach out at any point in time. Well, I'll hang out on the chat here for a few more minutes, even after the stream ends. So you're welcome to send us any other questions. That being said, Chris or Christian, your show is next week. No, it's tomorrow. It is tomorrow. Tomorrow, yeah. I'm actually, so I'm following suit with Andrew. So tomorrow, get ops guide to the galaxy. I'm having Josh Packer from the ACM team, right? Andrew had someone from the AC, Jimmy from the AC. Now I'll go to the actually engineering route. I decided to wind up you and get the engineer. I'll get outside. So what do you know? Yeah, because well, I actually got an engineer. So it's not a competition though. And so we'll be talking about ACM and the integration with Argo CD and get ops. So kind of a topical, right? Or we're talking about ACM, right? More on the ops side of get ops. So if you're gonna check that out tomorrow at noon Eastern. No, 3pm Eastern noon. Noon my time. We period. Sorry. So, and I will be tuning into that one because it's, I know ACM has some ability to apply YAML files, right? Kubernetes objects that are inside of a get repository, but that is not get ops. That is not Argo CD. So I'm curious to hear about the differences between the two different capabilities. Yeah. So I see a promo apologies for mispronouncing anybody's name. Is OpenShift supports every kind of software or applications? So the answer there is yes and no. So OpenShift is Kubernetes. Kubernetes is an orchestrator for containers and containers are basically a Linux kernel. It's a process that has been isolated from the rest of the system using some kernel features, C groups, et cetera. So in that respect, yeah, if you can execute it on a Linux system, it could technically go into OpenShift. But it's more complex than that because Kubernetes has its own set of things that it wraps around that around scheduling and all of that other stuff. So the answer is almost certainly yes, but the bigger answer is there might be a lot of additional work that has to go into it for it to be Kubernetes ready. Even if it's a traditional monolithic app. I will kick you, ACM is only included with OpenShift platform plus, with the other SKUs it is an add-on. Yeah, platform plus is, what is it, a bundle, right? So think of it as a bundle. You get all the, yes, that's right. Yep, so it is. So OpenShift platform plus, Chris, you're muted. Chris is muted, yeah. So OpenShift platform plus is OpenShift container platform, OCP plus ACM, plus ACS, plus Quay, or Key. Key, depending on what side of the world you are. If you're from the UK or Australia. For New Zealand. For New Zealand. Did we answer the question, I know it was very broad, from a Perbo about is OpenShift support, does OpenShift support every kind of app? Yep, cool. So I see Neten, best storage solution and cloud-native with high availability and fault tolerance. So best is always subjective. And it comes down to what best meets your needs, balancing availability, performance and efficiency. So availability, of course, is kind of how many nodes, for lack of a better term, how much failure can I tolerate before the storage is no longer functional? Performance is pretty obvious, right? Both throughput and latency. And then what was the third one I said? Availability, performance and efficiency. So efficiency would be for every bite that I store, how much disk space is it consuming? So I'm gonna pick on ODF because it's in our portfolio and I'm not offending anybody except ourselves, right? So with the defaults, with ODF, and let's say I have three data nodes and I think the default is, or the minimum is three. So for every bite that I store, it's being replicated to three nodes. So my application stores one gigabyte, my disks have three gigabytes, one gigabyte on three different nodes consumed in order to protect that data. Writing that three times has performance implications. It has to land on all three nodes. There's a network traversal that has to happen there, as well as they all have to validate that they got the data and then respond back that they got the data, right? So that has a performance implication. From an availability perspective, on the other hand, I can now tolerate node failure. I can lose one of those nodes and it keeps on trucking as though nothing ever happened. And when I bring that node back, it will re-sync everything and everything's back to being protected. So there's a bunch of different solutions out there, if you go to the marketplace and look, there's a number of similar solutions that are available to be deployed into the cluster. Really comes down to those things. On the other hand, there's a lot of external storage, right? Basically all of the major storage vendors now have the ability through CSI to connect your PVCs to their storage solution. So, Chris, you mentioned NetApp before, right? NetApp has both asynchronous and synchronous replication, just like every other storage vendor. I'm just picking on NetApp because Chris brought them up. So, you know, maybe- I was a big NetApp fan, by the way, so I was a big user of NetApp. So, maybe that meets the needs of your application. If I'm deploying into the cloud and I'll pick on NetApp again, NetApp has their cloud volumes thing with the major cloud providers. So you can connect to and consume NetApp storage on the major cloud players. So it really comes down to, you know, what meets your needs across those three major things, right? So again, availability, performance and efficiency. Right. There you go. There's another question. There's a lot of information digest. Yeah. Another question. Can CubeCuttle start driver work with Hotman? I think there's... Yeah, I don't understand that question. I don't understand that question either. Yeah. That it makes me think, well, with Podman, it probably doesn't answer the question, but by the way, with Podman, you can take your containers that you're building with Podman and export them as a Kubernetes object, right? So you can like export this as a deployment and you can even pipe it into CubeCuttle apply. So, not sure if that answers your question, but I decided to throw that out because it's actually pretty cool. And you taught me that too, as well, which I appreciate. Yeah, so a Perbo wants to learn more about OpenShifts. Where can I get study material? So I always highly recommend Learn.OpenShift.com. So Learn.OpenShift.com is freely accessible to anyone and everyone. You don't have to register for anything. And on that page, there will be a huge number of... I'm going to share my screen again. So if we go to Learn.OpenShift.com, so we have all of these different scenarios on here. So like, oh, I want OpenShift Basics. Okay, I want to learn about, here, deploying applications from scenarios. And I hit Start Scenario and I am dropped into, you notice I didn't log in anything. I didn't have to do anything. It was like five seconds even with me running my mouth and I now have a whole tutorial to run through here. So the application slowed down because you were talking. Yeah. So I always, this is always the first step that I recommend to folks of getting your feet wet, getting started. So also through developers.redhead.com, if you have a developer entitlement, you are also entitled to up to 16 cores of OpenShift. Correct. So anybody can go completely free of cost and have quote unquote real OpenShift available to you. It is not supported. You can only use that entitlement for single user non-production work, but you can absolutely do that. I will note that you cannot actually entitle it yet. There's an error on the back end that we're still working on getting sorted out. I know I've talked about this for like a month and a half now since Tushar was on. So they're still working on getting that sorted out, but you do have a 90-day eval, 60-day eval, something like that. So you can, but yeah, you're absolutely able to get up and running and learning OpenShift, no cost, very easily. Yep. And yeah. I will keep you 60-day, yeah. Thank you. So yeah, you've got the, you've got Learn, you've got the Sandbox, you've got, oh, Try, the third one. Try it up. Yeah. So we go to... Try so you can go to all the various places and say, hey, I want to put my thing over here and you can... There you go. Yeah. Developer Sandbox as a managed service, from Red Hat or elsewhere, we have managed services on AWS and Azure and the other ones, but yeah. There's ways that we can manage it on your cloud or you can manage it yourself completely. And if you need Sandbox to play in, we give you that for free for 30 days. It does have some limitations, but it's pretty open right now, right? Like you can get in there and kick the tires on it real hard if you want. Yeah, yeah. So the Sandbox as the title indicates is really geared at developers, folks, who I just want to get in and I want to start deploying pods, writing code, right? It includes the code ready workspaces inside of there just to kick the tires and see what's going on. And I don't remember if it's 15 days or 30 days. 30 days. Yeah. So you can get in and do exactly that. Out here with like the developer, even though it's a developer's entitlements as an administrator as an ops person, if you want to get started deploying, kicking the tires, testing, learning about the OpenShift deployment process, OpenShift administrator experience, all of that. You'll probably want to go with the self-managed thing out here. So like if I click the start your trial, let's see what happens. I don't know if I ever quick start my trial on self-managed. So it took me over to console.redhead.com, which I'm already logged in. And it tells me exactly that, right? You know, hey, I want to get started with your assisted installer and goes through and does everything for you. If you're curious about assisted installer, we have a couple of shows on assisted installer. So definitely check those out. Definitely love assisted installer. Yeah, it is great. It is not yet GA, but it is the easiest way to get a OpenShift cluster deployed, in my opinion. Especially if you're a lab user, like a lot of us are, right? We have our own little home lab. It's great to like the fastest way to be able to get like a cluster up and running and just, you know, to kick the tires sort of on OpenShift. Yeah, yeah. And so again, that developer account, which this is my developer account, I'm not logged in as an employee right now. You can absolutely go in if you have everything you need from a credential perspective, you notice I can go in, I can deploy to AWS. So you get that same 60 day eval entitlement or 16 core developer entitlement. So if you want to see what OpenShift looks like in AWS, by all means, you know, test out the OpenShift IPI experience in AWS. So the question from Nagendra was, can you use Podman as a driver for Minicube? Oh, that was my question. And that is a not yet, I don't think. Yeah, they put their planning on it. Yeah, we're planning on adding support for that. You can do kind clusters with Podman, we've been told, although we haven't seen docs on that yet. No, there's no docs on it. I've tested it. Yeah, you can do a single node kind cluster. Well, so the contemporary to Minicube is code ready containers. Right. So if we do a code ready containers, can't type and talk. So code ready containers is more or less that same experience for, again, local development with OpenShift. So it does require you to have a relatively powerful machine to run on. So just like, you know, Docker on Mac OS or any of that stuff or Minicube for that matter, it will create a virtual machine and deploy OpenShift into that virtual machine. So I don't remember what the minimums are. I think it's like four CPUs and 10 or 12 gigs of RAM for that virtual machine, which means that your physical host should have, you know, probably six or eight CPUs and you know, 16 gigs of RAM. But that is certainly an option. We didn't mention that one earlier, but it is definitely an option. So here, new technology beyond the mini shift experience. And I'll plunk this down here. Yeah, I already jumped one. Oh yeah. I see that. All right. Oh, so I deal. I think this will be the last question because we're running way over our normal time. So don't hesitate to reach out, again, android.sullivan.redhat.com. So in a disconnected environment, how does Red Hat know the number of clusters used by a customer? And the answer to that is we don't. Nope. So yeah. So essentially, we rely on you. We trust you, the customer to entitle your clusters through subscription manager, right? So console.redhat.com now, where you go in, you paste that cluster ID in there and you tell it how many nodes there are or how many cores are needed and associated with an entitlement. If you don't do that, we will never know. The downside is if you ever call us for support, basically it's an unsupported cluster until you entitle it. So we would basically, you couldn't get support for it without doing that entitlement. Yeah. So. All right. Awesome. Yeah. Thank you so much. Yeah. Tune in for Christian's show tomorrow at three Eastern. Three Eastern, sorry. I misspoke. You're fine. Coming up later on today on the channel is Red Hat Enterprise Linux Presents. I believe we'll be talking about some network manager type stuff. So tune in for that. We have a new host coming along there. And then, yeah, we're actually, we're having a Red Hat recharge day is what we call it on Friday. So we'll be off air completely on Friday. We will. I'll be around on the internet though. So if you have any questions, feel free to poke me on Twitter or short at redhat.com is always available to you for question asking. I can get you right for the right person, which could be Andrew or Christian. And yeah. Or many others. Or many others, yeah. But yes, definitely reach out if you need help with what you got going on. And speaking of Friday, keep an eye on the blog. So open.com slash blog, which we'll redirect wherever it's at now. So we, every week, following these streams, we have a blog post that summarizes everything that we talked about with links to the recorded video and where we talked about it. So keep an eye out for that. Usually they come out on Fridays, assuming I can get everything ready and over to Alex on the, on the blog team. It'll be staged and ready to go out Friday morning. So, but yeah. So thank you so much to our audience. Appreciate everybody staying with us today. Thank you again to Christian. Very much appreciate you coming on. Be sure to tune into his stream tomorrow, 3pm Eastern, 12pm Pacific. And I'm not even going to attempt the translation into UTC. Keep an eye on the streaming calendar. I know Chris is dropped. Thank you. Chris is dropped the link in there a couple of times. And with that, thank you everybody. Have a great rest of your day, great rest of your week. And I'm gonna steal Chris's line. Stay safe out there. Thank you. I'll do the traditional live streaming line of please like, subscribe and share. So thank you and we will see y'all soon.