 Let's get into the big topic of open source, something that we actually have in mind. This is so awesome. We are an open culture that is actually able to fix that process that a developer or let's say... As the Kubernetes ecosystem really brings... Welcome to this week's Ask an OpenShift admin live stream. So today is January 26th, which means that we are now officially fully into 2022. So if you didn't know already, yeah, it's 2022. Johnny, I don't know about you, but it's hard to believe that we're already a month in. You know, this is our third stream of the year. I know, man. I feel like every year I'm like, oh, I can't believe it's already this late in the year. You know, but like time flies when you get old. Yeah. So I'm telling you now, Johnny, Valentine's Day is coming up. You know, so my wife's birthday is one month away. I'm trying to get ahead of these things, right? 2022 resolutions. Time to set that calendar reminder up, you know. I feel like this is like an annual fight with my wife and I where I forget like, oh yeah, crap Valentine's Day. Yeah. So we've been, we've been married for almost 20 years, right? And my wife is like, my birthday is what I prefer. Like, you know, we can, we can tone down Valentine's Day. It's okay. Yeah. She doesn't seem to mind that, you know, 20 years in. That's awesome. So hello everyone. Welcome to the stream. This is the ask an open shift admin office hour, which means that we are here. Like if you have ever had a professor, a manager who had, who had office hours, we're here to answer your questions. So whatever it is that is on your mind, things that you want to talk about with regard to open shift, that's what we're here for. You know, ask those questions, feel free to message us in chat on whatever platform you happen to be watching us on the software that we use behind the screen behind the scenes rather. So we're here to restream, make sure that we get those regardless of wherever you're at. And then our wonderful producer Stephanie in the background. She does a great job of making sure that Johnny and I don't don't miss questions as well. So today we are very happy to be joined by Annette Cluitt. And Annette is one of our, I think your title is principal architect Annette. But I'll let you introduce yourself. Yeah, I'm Annette Cluitt. I'm in the same team as Andrew and Johnny. I'm in the hybrid platform business unit and recently have been doing a lot of work with disaster recovery solutions that work with open shift, as well as persistent data and how we make all that work together with advanced cluster management. So my background with Red Hat has mostly been integrating storage capability or solutions into open shift for the last five years. Yeah, I know Annette you usually have some of the most popular summit sessions around storage integration and all that other stuff with open shift so I'm happy you're here I'm excited to talk about today's topic. There's there's some really interesting and really cool tools things that I have learned a lot just from our few interactions with each other as we've covered this topic so it's it's something that's exciting to me. So our hope nine I see your today's first crazy question can you access user metrics data via the OCCI. I don't think so not directly. I think you could connect to one of the Prometheus pods and do a query that way but I don't think there's like an OCCI command that will extract that info. And Andrew Andrew has a question as well about what should you do if you're in a disconnected open shift cluster. I'm assuming he's referring to DR like the. We'll we'll hold on to that one for just a moment that Andrew, and I'll also note that we have done a couple of streams on disconnected. So we'll I'll be sure to note those in the blog post when it comes out on that. We want to just answer that quickly though just so I'll be showing it but essentially a connected or disconnected. The only real difference is whether or not you've you know married the images that are needed for all the operators and the custom resources is really no no different than any other operator or capability. Yep. Yeah, so the last time that we did our disconnected deep dive we reviewed how to do that operator mirroring including how to prune the registry so that it's just specific ones so. Yeah, I'll be sure to include those links on the blog post. The week in which we're getting through the blog post back backlog so be sure to keep an eye on cloud.openshift.com slash blog for those. So last week's blog post will be posted this Friday or the blog post associated with last week's stream I should say and Dean for anybody who doesn't watch or hasn't seen Dean's blog. He did a phenomenal follow up himself. So he didn't get through all the stuff that he wanted to talk about so he he went and recorded like another 50 minute video or something that talks about a whole bunch of stuff so. In our blog post we link over to his blog post on the educate code at UK if you want to go ahead and go there. But yeah he did just a phenomenal job I know Dean's anxious to come back and talk with you all again and answer those you know our hope nine all those crazy crazy questions. So let's kind of quickly get through our top of mind topics for today so that way we can focus on a net and you know application disaster recovery. So the first thing that I want to talk about today is a couple of CV ease. Let me see if I can share my screen here. See I want this guy share. So we recently announced a couple of CV ease 2022-0185 and 2021-4034. And I spent I've already spent I don't know an hour and a half this morning kind of digging into these and in particular if you scroll down and look at these. You'll note that OpenShift is not listed in the affected platforms but rel8 is the same thing is true of this one over here. So we see rel8 but we don't see either OpenShift or CoroS. So I had a couple of conversations with engineering this morning at a couple of conversations with product management this morning about what does all of this mean right. So OpenShift is affected but to varying degrees based off of what's inside of here. So if we scroll up here to which one is this is 0185. So we can see here OpenShift container platform where the default restricted SCC is used the issue is not exploitable. So OpenShift while CoroS is rel8 based and therefore it would have you know obviously the same kernel OpenShift is not affected. So effectively what I've been talking with product management about and we've since reached out to the product security team is you know can we still get OpenShift listed in this you know list of affected platforms even though it may say not affected you know with additional details there. Because I think you know most folks don't necessarily read through all of these notes up here you know for every CVE rather the first thing at least certainly for me. First thing I look at is my platform affected in this list and then if it is then I go back and read all of the details there. So just be aware for 0185 that one is not or does not affect OpenShift but it does affect rel. For the other one 4034 technically OpenShift is affected but to a lesser degree effectively because from what I understand we don't we don't encourage or we don't I should say want folks to do you know SSH or something like that into the nodes. So therefore you can I won't say it's not affected but it's just lesser affected I need to get some more details on that. However, I did see this morning that the fix for this has been incorporated or already built into the the core OS build. So as of this morning or maybe late last night. So look for one of the future Z streams I don't know if it will be next week C stream or if it'll be the week after that. But in one of the soon to be released streams that should be fixed inside of there. Johnny we have a question in chat. I think we got it like we're just it's more along the lines of like the disconnected DR and sounds like upgrade process so I think we got to handle. Got it. Yeah, caveat if you've customized your SEC that's a really good point I hope none. A lot of folks are using customized SECs to usually loosen permissions. In particular, I know some folks do that with somebody asked me last week, is it possible to have an NFS export that is used by external systems as well as by pods. And the answer to that is yes you have to manually create the PV. But you also will probably have to modify your SECs in order to basically allow the right UIDs both internally and externally to be used that you don't end up with crazy permissions conflicts in between them. So very good point our hope nine thank you for highlighting that. Let's see the next one that we wanted to talk about here is micro shift. So micro shift was and if we look at the blog post here January 19. So it was last Friday was when they published this and it made a bit of a splash. I think there was a hacker news thread I think there was something on Reddit as well that talked about it. So if you haven't heard of this or you weren't familiar with it. It's a it's it's not an official thing yet it comes out of our. What do they call themselves Johnny. Oh, I don't know. If you would have asked me directly out of known but like it's it's one of the just development teams. Yeah it's like an advanced engineering team or something like that's that works they may even work out of the office of the CTO. So effectively they're targeting you know how to deploy open shift onto things like as you see here a raspberry pi which is something folks have been asking about for a long long time. So this one was really cool it was really exciting. I haven't had a chance to try it yet I only have one raspberry pi and it's a pi three not a four so I haven't had a chance to try it but it's something that I'm looking forward to being able to try in the future. We'll make sure to post all of those links I'm sorry I haven't been paying attention and posting links. I'll post this micro shift one first into Twitch here. And then I'll post our two CV ease just after that. I'm really excited about the micro shift thing this is you know like when you think about like a small binary that you can install on you know an iot device or a laptop or something like that you know what I mean something not not quite the edge data center you know big machine something very tiny and you can run you know in an open shift API it's pretty awesome. Yeah yeah and they walk through kind of this blog post is really in depth right they they walk through what is micro shifts right how it communicates what it deploys all of the different components that are associated with it and kind of why it's different and if you click on this here. So if we click on that link it'll take us over to the GitHub repo that has everything that's inside of micro shift including you know the documentation to deploy it and all that other stuff so it's you know we get asked a lot about you know how can I put open shift on to the super low resource edge devices and by super low resource I mean things that are like you know two cores and eight gigabytes of RAM so you know you even in line maybe x86 instead of our arm but in line with like those the new the right the pie force. So I know lots of people will be interested in that. Yeah. So let me catch up on chat here tiger done to. Yeah Microsoft doesn't support all API types CS that is correct. So it will be a subset of the total components of open shift it won't be a full open shift deployment with, you know, I don't know precisely what's in there but it may not have metrics for example metrics may be an add on day two depending on the amount of resources that you have available so again look through the docs you know check out what's there see how to deploy it all the packages that are there. Just be aware that it it almost certainly will be a subset of of the open shift APIs. And one thing big picture like if you look at it long term it can integrate with ACM so the advanced cluster manager so like it will present itself as it as a managed cluster through ACM once once we get down that path. So it's pretty cool. Oh yeah that'll be an interesting one. So ACM if zero touch provisioning and all that stuff gets integrated with something like micro shifts. That'll be really cool. I know man, like just straight up transformers you know it's gonna be awesome. Let's see the next one get ops one dot four dot zero. I you probably know more about this one than I do Johnny. Yeah so the get ops operator updated one dot four dot zero like you said, there's a lot of big changes that are coming out in that. And the biggest ones are obviously like the helm upgrade but there's subscriptions are now a resource that will be that'll have health checks against it. So just something to be aware of. And then another piece of this is it looks like there's some RBAC stuff I haven't had a chance to mess with it but like the, you know, right now you have to kind of manually implement some RBAC so that we can have a cluster level management at some of your lower level Argo deployments. And so it looks like there's a lot of that stuff built in so I'm going to mess around with that talk to Christian and those guys and get a better handle of how that's going to go down and I'm excited about it. I just checked my calendar there is a get ops guide to the galaxy tomorrow so I suspect Christian will be talking about that during his live stream. Yeah, he's got that and they're talking about hashi corp vault so that's going to be a really good one they're going to have a two series on that one so that's that's going to be awesome. Oh nice so talking about secure secrets with get ups and all that. Yep. Very cool I may have to tune in if I have the opportunity. Usually that Thursday afternoon about when his show kicks off is when I want to sit down and listen to our show in order to create the blog post and all of that. I can't do both at the same time. I can't I used to work with a guy who he could sit and listen to a podcast and listen to a phone call at the same time. And I my brain just doesn't work that way. I'm very singular. I love you all in the street you know I can't even talk and type at the same time right status of tecton pipelines. Should I use tecton pipelines or Argo CD workflow. I don't know the answer to that. I think they are probably slightly different use cases but it is outside of my area of expertise. So tecton pipelines would be a replacement for a Jenkins pipeline or something like that Jenkins workflow for building an application. Whereas an Argo CD workflow I think is more is not necessarily used for builds but again I'm purely guessing on that Johnny. Yeah I couldn't say for sure. Yeah I but I think you're right on track. So there is there is another stream that covers that. Which one is it it is the. I'm sorry. No go ahead. I was gonna say Christian jumped in and he said use tecton. Oh there you go. Christian I hope leg day is going well. If anybody doesn't know usually Christian listens to our show while he's in the gym and for whatever reason Wednesdays are our leg day so Christian I've been on the same team for a long time now. And last but not least dev conf. So dev conf sorry I got you all your drinking Johnny. So dev conf which there is two dev confs there's one in the US and there is one in in Czechia. So the one happening right now or is it right now or is it this weekend is in Czechia. So I know Christian has actually gone and presented at it. You know back when we could travel. I know it is titled dev conf like implying developer conference but there actually is quite a bit of material and quite a bit of things that will be interesting to us as administrators. So I will I'm still sharing so dev conf if we go to. Schedule. And I happen to already be at the page schedule is live. So you can see there's a ton of stuff that's all here. I don't know if this is a paid for or a free access one but look through the session catalog there may be a lot of things that are interesting to you. I saw that there was several get ops stuff in here that I was hoping to be able to check out. Yeah, it's another great conference. It's another one that is I think this one is hosted by Red Hat as well. It's at our site over there. And now the name of the city completely escapes me. Bruno. Thank you. Yep. Yeah. And she was playing for my team. I've got two or three teammates actually presenting at a dev conf tomorrow. So if you want to check it out, you know some of the validated patterns and then some extensions on get off. So it's going to be pretty good. Yeah. Yeah. Just looking here on my screen. I see Daniel Daniel does. He always has great presentations. The DM is on the engineering team. You know, Diane. So if you are on the Kubernetes slack for open shift, you'll see Vadim in there quite a bit. So Diane is our community manager and stuff like that. So there's there's quite a bit of a red hat representation. There's also quite a bit outside of red hat as well. So yeah, give it a look if you're interested. And that is that's all we got. That's that's all of them. So as we mentioned at the top of the show, as you've seen in our title and all the social media and stuff like that, you know, what we intend to talk about or what we want to talk about today is well application centric disaster recovery. And this is something that Annette has had to correct me on a number of times because it's different doing application disaster recovery than doing like cluster disaster recovery. And it's a very important distinction as well. So, Annette, I'll hand over to you and I'm, I'm excited to hear and to learn about this. Okay, yeah, thank you, Andrew. So, first off, there was a question in the chat and I'm, I'm having issues figuring out how to answer the chat but is is red hat. Advanced cluster management going to be discussed in terms of disaster recovery and the answer is yes. So, I thought what I do first, just to sort of start the discussion off is go ahead and just look at sort of the different scenarios. See here. So, in terms of is it is it loading. Yeah, can people see now. Yep. Yeah, looks good. Okay, so in terms of just sort of application high availability. If you look at installing, say, OpenShift in a in an environment that has three availability zones. This would be, you know, like AWS or GCP is there out of the box. You know, a deployment as long as there's topology labels, you know, OpenShift is going to spread the important sort of resources that need to have quorum, meaning they need to survive an AZ outage. It's going to do that out of the box, right. So, this is something you get right away. Now, some of the deployments say bare metal or say VMware don't, you know, out of the box maybe make use of topology labels, but in the future they will. So, the where it says the third bullet there ODF that's we call that's OpenShift data foundation used to be called OpenShift container storage. So, that also in terms of deploying the the storage the OpenShift data foundation that also makes use of the topology labels and and lays itself out across the AZs so that you have have the ability to protect against an AZ failure. So, you know, having high availability is critical, right, because you want to be able to survive a site outage. But in the case of if you're if this is a single OpenShift right so this is a single cluster. If we look at maybe we want to have multiple clusters. This is this is where we need to have some amount of replication not not just at the storage level, but we also need to have replication or the ability to to reinstall the application on the alternate cluster. So, it's it's about the Kubernetes resources and it's also about the persistent data. In the case of the solution with with red hat and using OpenShift, we're going to we're going to do that orchestration with advanced cluster management. So, that's currently the solution that that is actually available as of today it was all the components were released earlier. Actually late last year, but available this year. So all the components are available. And it's not totally clear from this diagram, but I'm going to do a demo so you get the idea on the alternate cluster. You are not actually running the resources, meaning you're not consuming, you know, CPU memory until you actually need to use the application on the alternate cluster. The other thing that may not be clear here is that that arrow on the bottom that can go either way. So, you could have some applications that are essentially protected on the set the second cluster and then other applications that are protected from the second cluster on the first cluster. So, and that just can I can I recap just to make sure I'm understanding correctly. So we have two distinct OpenShift clusters with and each one has its own ODF deployment inside of there. So it's not not a spanned ODF deployment or anything like that. No, no. And then we have ACM so advanced cluster manager managing each one of those clusters. Correct. Yeah. So, and, and for the applications and I have a feeling you'll probably cover this or feel free to tell me, you know, wait. So for the applications, are they deployed? I'm trying to figure out how the applications are deployed and controlled. Is it through ACM? Yeah, and this this this diagram, you know, lacks a little bit of that but the applications in order to for this this solution to work have to be deployed via ACM. That's the ways you can deploy applications you can use a get ops you can use, you know, there's whatever ways ACM will allow you to deploy applications you have to it has to be controlled as well via ACM. Okay. That helps make sense to me. So, okay, good. Essentially, what we'll be doing is so we deploy an application to say cluster one using ACM and then we use the tools via ACM to set up replication of both the configuration as well as the data from cluster one to cluster two. Yeah, yeah, 100% and the, the new bit about this is we have sort of the glue which is upstream project called ramen which I'll discuss more about the upstream project ramen DR is this sort of it has supplied the new operators and the new custom resources that ACM uses to do that that orchestration. So it's, it's it's pretty slick and again I'll get to a demo pretty quicker so we can see it in action. And real quick in it I'm sorry is the management cluster your hub cluster in this context this is for our viewers. Okay, thank you. I mean, so so currently, this is a three cluster solution. Within a short time, we'll be reducing that to two clusters, as well as it's really important that your hub cluster function is able to be restored if for some reason it goes away. So you can either look at you know, doing some kind of backup on your hub cluster, or having some way of making sure that your hub cluster, the function of the hub cluster has to be available. Right. So it's possible it could even live in one of the other clusters but, but there's, you know, right now. This is a three cluster solution of which if you lost your hub cluster you become sort of headless. Yeah, and just to poke at that a little bit more. Literally this morning we got asked about, you know, can I deploy ACM to single load open shift. And it's one of those things technically yes but are you sure you want to do that. You know, it becomes a single point of failure and even, you know, with single note open shift doing an update means that everything goes down. So, you know, minimum you probably want to compact cluster three nodes, and you know, just be again cognizant of the limitations there and how that might affect availability. Yeah, I totally agree. I think for POC's and things like that, you know, proof of concepts where you're wanting to reduce your footprint could certainly be a good solution. You know, you are this whole thing as I first showed on the first slide, you always have to think about availability, you know, at every level, right. So, so just not wanting to confuse it too much. So this is clearly an asynchronous solution from the point of view of persistent data. So what that means is, you're going to use the concept of snapshots and an interval snapshots to so say every five minutes, you're going to snap the delta change of each persistent volume from one cluster to the alternate cluster, so that you have sort of a warm standby. And because of that, you know, if you had to use the alternate cluster persistent data, you are going to be missing whatever is not in the snapshot. So if your snapshot intervals five minutes, you could be conceivably losing up to five minutes of data because it never got to the other side. And a quick clarification I've seen a couple of folks ask, is this its only active passive? Is there any active active capabilities here? Well, this solution I think of it, well, the way I think of it, if you think of it that each cluster can protect, you know, an application on the alternate cluster, it is sort of active active from the point of view that it's both directions. But if you're asking, can the global traffic manager or a geo load balancer be able to send connections to both applications at the same time? No. Because the way that this works, and again, I'll show you in the demo, but the storage is only essentially promoted on one side or one cluster at a time. So I, because the storage is only promoted and available on one cluster at a time, I can't have, you know, the storage essentially being being replicated and available at the same time on both clusters on a per application basis. Yeah. Yeah. And Jared asks, can you do this without ACM? Technically, yes. You can do it. The the asynchronous replication method currently is using something called CEPH mirroring. So via CEPH commands, you can set this all up yourself and schedule and do all of the orchestration for, you know, you would have to reinstall the ACM resources. I mean, the excuse me, the Kubernetes resources, you'd have to make sure you don't have any security context problems. So yes, it is possible, but it's not currently the way that Red Hat and, you know, OpenShift via ACM and using our storage OpenShift data foundation is going. Currently, the solution is totally dependent on ACM. Yeah, so I'd say the second part of your question there, Jared, is it more complex without ACM? Is it definite yes? Yeah, you're basically down in the weeds of CEPH commands on a per volume basis. So you're basically having to do all of the orchestration yourself. So coming in the future, I just want to just put this out here. It's not available yet. We're probably looking at this being available in about four or five months. So one of the, you know, I guess, issues or things is the asynchronous mirroring of the data. So this solution, which will be called MetroDR MultiCluster for Red Hat, will use ACM in a very similar way. It'll, everything about installing the applications and the automated fill over all that. But the difference is this will use a stretched storage cluster. It'll be from OpenShift data foundation. And if you're familiar, it'll be called CEPH. It'll be a CEPH cluster. And it'll have definitely the ability to do synchronous mirroring. So your RPO is going to be zero, essentially, from a persistent data point of view. But it does require that the data components of the storage are latency restricted. So, you know, maybe a couple hundred miles apart for the locations or less. And you also have to have a witness location or a consensus location, which we call the arbiter node, which is at a third location. So the third location can, it's not as latency sensitive, but, you know, you do have to have a third location. That's required because just like at CD for OpenShift, the storage requires quorum. So even though you have two OpenShift clusters, and, you know, they're going to be fine once you failover, there's no quorum issue at CD. There would be a quorum issue for the storage. So I just want to just, we're not going to go over this anymore today, but I just wanted to know that we are working on a synchronous solution as well. Okay. And just to rewind a bit, somebody asked, let me see if I can find the question. Somebody asked if this is like, is the whole state of the cluster in sync between the two, you know, cluster one and cluster two. And the answer to that is no, because they're distinct clusters. They're not sharing an STD or anything like that. Correct. Yeah. Yeah. And the same is true with the underlying ODF or the SEF underlying ODF in this instance, right? It's two distinct ODF deployments, but we use them. No, no, this, okay, so the prior slide, 100%, yes. Okay. This slide, no. This is an external, what we call ODF cluster or AKA SEF cluster, and it is stretched. And that's why it has latency restrictions. And it's also why it needs that third site to be the witness for the quorum. Got it. Sorry. I was, I was paying attention to chat and missed when you switched over to this. Yeah, yeah. Thank you. Like I said, this is three to four months down the road. But I just, you know, for people who absolutely need a synchronous solution, which, you know, may be true, or then, you know, this is coming and uses a lot of the same custom resources and operators in terms of the integration and automation via ACM. But it just has a difference in the storage plane. Got it. And while we're on ODF for just a moment, Bonkstock asks if the SEF cluster is running on top of OpenShift, and it is. So if you're using ODF, it is SEF managed via Rook and... Yeah. Yeah, it's a little bit, so this one, in terms of this solution, yes, this would be what we call ODF internal. The storage would be running as pods and those pods would be hosted on OpenShift and all of the alerting and managing of that storage cluster would be done via OpenShift. This solution, though, is hooked in, so it's hooked into OCP. So your storage classes for creating volumes, that being file block or object buckets, that's all available to each one of those OpenShift clusters to do that on the storage. But there's some alerting about the health of the cluster, but you're not getting as much alerting because it's an external cluster. So you'd still be managing the SEF native cluster. In some extent, you'd still be managing it outside of OpenShift. Like I said, there's some alerting and Rook is definitely still the Rook SEF upstream project. Rook is still doing a lot of the orchestration, but at the level of just storage classes and hooking into the external cluster. But it is an external SEF cluster versus this is totally managed and deployed and monitored by OpenShift. So with the external SEF storage, when you have that third site, not necessarily the Arbiter site, but when you have these sites, do you have to have an increased amount of storage for the external cluster? Or is it like you can just kind of go same-same and then it'll balance out as it's failing over? I don't know if I'm asking that the right way. If you're trying to fail over to one, do you have to have site A's capacity as well that's just kind of sitting there cold? Are you talking about the storage plane? Yeah, the storage plane. I'm sorry. On the storage plane, so this solution, the way it works is that it requires four replicas of anything you do. So every time you do a ride, it's going to be replicated four times. Right? Okay. So what that does is that what we do is we have what we call a min size two. So any volume can survive and be used for reading and writing as long as there's two replicas available. So if you think of just like having to switch from the cluster one to cluster two, and let's say in the same location as cluster one, you still have your two replicas for the ODF cluster and you lose those. You still have two replicas available on the other site. And because the Arbiter node is not either of those sites, the Arbiter node can reach consensus. And so therefore all of your volumes that you would switch over and use on cluster two once they became available, they would all be, they'd be two replica now. They went before replica, but they would all be able to be used for reading and writing. Gotcha. Thank you. Any more questions before we go to the demo? I have lots, but I don't want to, I want to see the demo. Okay. All right. Yes. Yeah, I may have confused things. I don't mean to do that. No, no, you're fine. And, you know, for anybody in the audience, please don't hesitate to send in questions. I see your question there, IDDQD. All right. Sorry. Sorry. No, no, go ahead. Yeah, I just wanted to ask stuff if she could queue up the video. And so what I'm going to do is there's no audio. I'm going to talk, which means I may lose my place here, but I'm going to try to talk to the video. I also have the same video recorded on YouTube with my audio. So if you can go ahead and start it, Stephanie, maybe it's already started. There we go. Okay. So that's going to start talking. So we have two managed clusters. The application is currently deployed on the primary cluster. And underneath we have OpenShift Data Foundation and on OCP. We also have a third cluster running advanced cluster management. You notice there's OpenShift DR. That means OpenShift Data Disaster Recovery and we'll get into that. So the process that happens here is the administrator is going to initiate it on the hub cluster, the failover. After that, the metadata for the alternate cluster PVs will be put into the secondary cluster. Then we're step three, we're going to go ahead and demote the storage, which will take the application down. Then ACM is going to redeploy the app on the secondary cluster. The IOs will be redirected via some kind of geo load balancing. And then we delete the app on the primary cluster so it's no longer running. So in ACM, I've already imported two clusters. I'm going to rename them to cluster one and cluster two. They're Amazon ones in US East two and one is in US West two. So approximately 50, 60 milliseconds apart. And what's required is you need to connect these clusters. So I'm going to connect the cluster via an add on with ACM that works very well now called Submariner. And you can actually deploy it from ACM using a cluster set. And we can see that the connection is healthy. So this is connecting the private cluster and service networks of the two OpenShift clusters. And this is required for doing the replication. So if we now look at the different clusters and what operators, I talked about the operators and if you could, you know, if you're disconnected, you just need to be able to install them. So we have OpenShift Data Foundation, DR cluster operator and Submariner. On the second cluster, we have the exact same thing. The DR cluster operator comes with ODF and we have Submariner. So if you're looking to purchase or use this capability, it does require OpenShift platform plus and the ODF advanced SKU. On cluster three or the hub cluster, go to all projects to look at our operators. So this is where I put advanced cluster management. We also have a new operator with ODF for multi cluster orchestration. This is to set up the mirroring between the storage clusters. And then really important for doing the orchestration, we have the OpenShift DR hub operator. And we're going to look at the custom resources or APIs that that brings. So in terms of how we're going to test this or do the failover, I have installed a simple application using GitOps. Hold RBD loop, really simple application. All it does is just write a 4kb file every second to the storage. That allows me to be able to major any outage or anything. If I look at the topology in ACM, it's currently running on cluster one and it has one pod and one persistent volume that it's running to. If I go to the terminal, top is cluster one and the bottom is cluster two. On the bottom and we can see that if I look at the same namespace, there is nothing here. Meaning none of the resources are running yet on cluster two. They're all running on cluster one right now. So the next thing I want to show you is how we're going to failover. So to failover, we're going to go to our hub cluster and we're going to go into the operator DR hub operator. So here we're going to use a really nifty custom resource, which is namespace scoped. So this is on a per application basis. So it's called DR placement control. So this is a brand new custom resource from the upstream Raman DR project. And it allows us to set the action that we want to do for this application. And before I do that, just let me show you the RBD loop is tied in to a Grafana dashboard. And so it's able to show you what cluster is currently active and it shows you the IOs. And we'll see some more information on the dashboard after we failover. So going back to the failingover, which we'll get there in a minute, or I don't know what I was saying on the original video, but it takes a while. Okay, so we're back here. And what we're going to do is we're going to go ahead and change the action to failover. And in the CR in the animal, it tells us that it's going to failover to cluster two. So that is something you set in the custom resource, which is what is my failover cluster. As soon as I save the changes to that YAML, the actions I showed in the flowchart are going to start now. And we can watch it in a couple different ways. If we go back to the terminal, we can see that if we now look for the resources on cluster two, they're already there. And if we go back to our dashboard, we can see it just switched over. The green is under the failover cluster. So let me explain the 39 seconds. So the 39 seconds that shows there for this Grafana dashboard is the amount of data loss or the time of the data loss for this application. And again, it's writing 4KB every second. So what happened was the last snapshot on a five-minute interval, and again, it's settable, was 39 seconds before the failover. So I didn't have that data on the alternate cluster, so therefore it's lost. I basically brought the application up on the snapshot that was last replicated. If I now go back to ACM, I can see that my application now is on cluster two. So after this, what we want to do is we want to try and fail back now. So to fail back is going to be as easy as failing over. It's just going to be a different action in the custom resource called DR Placement Control. So what I'm going to do now is just change that action to relocate. And down in the YAML, it doesn't show. It tells you what is the preferred cluster for relocate. So in this case, the preferred cluster to go back to will be cluster one. So once I save that, we can go back to our terminal. We can see on the bottom that it's terminating the resources. And if I do a watch to look back on cluster one, I can watch right now it's not quite recreated yet. My dashboard doesn't show that it switched over yet. And just while this is happening, in terms of relocated, it actually does a sync on the volume. So you don't actually lose any data on a relocate. So as we see the resources come in there, we're going to basically bringing up the application again on cluster one. And as soon as it's running, go back to our dashboard, see it switches over. And that 40 seconds in this case doesn't represent data loss because we synced the volume right before it represents application outage or RTO. So just wrapping up here, we did all the actions in the flowchart. And I think we're done, Stephanie. I'm always impressed by that because it's entirely a Kubernetes centric thing, right? You're integrating with your operating against Kubernetes APIs to create all of those. We've got a few questions here that came up while you were narrating the video there. So Sachin asked a couple of good questions. So first, and it was from a while ago, is this dependent on a particular version of ACM? Like is it ACM 2.4 or later? Yeah, let me just mention all the versions. So this capability is available today. So not smoking mirrors or showing you something that will get to you sometime. And the versions would be ODF 4.9, which is OpenShift Data Foundation. OpenShift 4.9 and ACM 2.4. Okay. And then for the ramen DR operator, the disaster recovery operator, is that deployed to the ACM hub cluster or is that deployed to the source and destination clusters? It's deployed in all three. If you go back and we'll definitely get the link to the video. I can put that in chat and someone can post it. It's deployed in all three. So you have the, it's called the OpenShift DR cluster operator and the OpenShift DR hub operator. So you deployed in all three. I need to start calling it by the correct name instead of the upstream project name. Yeah, that's fine. I mean, you know, I mean the good part it is, it is an active project. And I know the contributors and maintainers are giving, are briefing the CNCF data protection working group. And there's a lot of interest, you know, for other companies, they might change out the replication method, but to use the orchestration. So that somebody asked earlier, and that kind of reminded me that I answered in chat, but we might want to bring it up. So OADP, which is the OpenShift API for data protection. Does that fit in here at all? Well, someone asked earlier and I said it's possible. If you were not using ACM to deploy your application and to do the orchestration, you could use OADP to basically take a backup of your Kubernetes resources and then restore those Kubernetes resources to the same cluster or alternate cluster. And well, it'd be alternate cluster in this case because you're doing replication to a different cluster. And you could, you know, again, you would have to do all the demoting and promoting of the storage and enabling of the mirroring and all that via VSF commands, but you could do it. There are some new custom resources for volume replication. Like, you know, so actually you might be able to move up some of that to doing it via Kubernetes or OpenShift. But again, even if you can do it within a custom resource, it's going to be per volume and it's going to be all manual. Okay. And just so in case anybody doesn't know OADP. So my understanding and in that I'll rely on you to make sure that I'm right here. OADP is, it is a downstream of Valero and the goal being to provide an OpenShift API to kind of collect all of the objects, Kubernetes objects that make up an application. Well, yeah. And it's partner agnostic and there is no, and I think you introduced me to this term, you know, there is no data mover associated with it. Well, there will be. Yeah, so a couple of things on OADP OpenShift, yeah. So it exposes that the Valero API is totally right for backup and restore and I think scheduling. One of the things that does have that really ties into ODF and our implementation of the CSI features is you have the Valero CSI plugin. So out of the box when you install ODF, you get volume snapshot classes both for file and block. And those volume snapshot classes then can interact with the Valero CSI plugin. Only are you backing up the Kubernetes resources. You're also able to create snapshots at exactly the same time so that you have both the persistent data and then because of the snapshots would live in the storage cluster. And then you can use those snapshots to restore or basically, you know, restore back to a PVC to be able to link up with the Kubernetes resources to basically recreate the application. Okay, so that's that's interesting. I think we Johnny, I think you and I have already talked about having a show dedicated to OADP as well. So that'll be a super interesting one. Yeah, yeah, and it's a good time. Yeah, because I'm not sure exactly but very soon it will go from being a community operator to a Red Hat supported operator in Operator Hub. Very cool. So I see we have one question and apologies if I put your your name. I'm my children know I'm terrible with names so Alini Alini. In case of an actual disaster to cluster one, is it possible to trigger the failover from ACM afterwards. Absolutely that yeah that's why currently the ACM hub is a different cluster at a different location. In fact, all the triggering would be of the failover would be from that third location. In the future, you know, the ACM team is looking at ways of backing up that hub being able to restore that hub, having HA on the hub so there's all kinds of solutions because you can see that you know the hub and the ability to trigger. So if one site is totally lights out, you can still trigger the failover actions and get the alternate site with the applications running. Now you just got to have DR for your ACM and then you got to have DR. Turtles all the way down. Yeah, yeah, exactly. Yeah, well the ACM team is hard to work on that and I think we'll see a viable solution in the next version of ACM. Yeah, and and so and actually just asked exactly that you know what what happens if the ACM location fails. Yeah, well we're all thinking that yeah we all we all have been taught about single point of failure for sir. Yeah. So we've only got a couple of minutes left I think I told you all, well you, the folks that I'm here on the stream with that I do have a hard stop today I'm going into the tower for the first time in two and a half years I'm going into red hat tower. So I do unfortunately have a relatively hard stop. So any questions that you all have please go ahead and send them in now we'll take the last few minutes to address those. And then while you're typing those up, I do want to highlight to everyone that so next week we will be Johnny will be the center of the star of the show will be talking about validated patterns. Eventually I'll stop calling it platforms Johnny I swear. That's okay. Yeah, validated patterns. The week after that, which is February 9. We will be talking about what is on our schedule. Oh, we'll be talking about the performance add on operator. And then the week after that, which is February 16. We will be our stream will not be on that day. And the reason it won't be on is because that's when the what's new and open shift 4.10 live stream will be happening. So next three weeks are really busy really interesting topics. If you aren't already subscribed on whatever platform you're watching on I would definitely encourage you to do that. You can also go to its red.ht slash live stream. So that will take you to the live streaming landing page, and it has the calendar that has the big, you know, the big live streaming schedule on it. It is a Google calendar so if you click the in the bottom right hand corner there's a thing that says add this calendar. So you can add that to your calendar and then you can get reminders for all of those as well. So thank you Stephanie for posting the link in there. And Andrew, I just want to acknowledge that someone had mentioned or had requested that we have like a security topic. Going over like how to, you know, secure clusters cover Fibs encrypting comms secrets, etc. So so and yeah, we can definitely have one of those coming up like there's I'm sure we've got like. You're right each one of those is its own rabbit hole and then it can spin off to its own right it can just get insane but yeah we'll do something. Yeah, so actually, it's probably a pretty good time to do that because there was just some shuffling of team organization on the product management side so now we do have a team that is dedicated specifically full time to nothing but security. So we can get the get them on here and have a good conversation around all of that so the compliance operator and all that stuff. Andrew which tower. Oh, yeah, red hat tower here in downtown Raleigh. Very good session very good. Yeah, yeah, yeah. So I was telling Annette before we started that I had to, I probably spent an hour digging around for my badge last night because I just I didn't know where it was. I dug through all of my backpacks my wife makes fun of me because I have too many backpacks. You know, and did finally find it, but. Alright, well, I don't see any other questions that came in so when. Thank you Johnny for highlighting the topic request. If there are any other things if anything comes to mind for anybody who's watching this after it's live. Please don't hesitate to reach out you can reach me at Andrew dot Sullivan at redhead.com or on Twitter at practical Andrew. That's all one word. If you've seen me chatting in Twitch. That's that's my also my Twitter username. Also through Johnny under the bus. So Johnny J O N and why at red hat, no H and J rock TX one right. Yep, right. Yeah, thank you Stephanie for throwing up our information on the screen there. By all means, you know please reach out to us at any point in time we're happy to field those questions. We'll include a net in those as well I won't I won't ask her to publish her information publicly. But yeah by all means, please don't hesitate to reach out to us and we will address those questions. When is the OADP stream tiger tiger I feel like you have a vested interest there for some reason, I don't know why. So I don't think we've scheduled it yet. We were waiting for OADP to go GA, which I haven't checked on that so I don't know when precisely that will be but I do have it penciled into our schedule for, I think, March, but it is penciled and that means that it's likely to change. Yeah so when we realized that FIPS has to be implemented at install. Yeah, unfortunately it does. Should I find IDD QD. I will rewatch. Should I find it on YouTube. So yes you can find it on the red hat YouTube channel the open shift YouTube channel or, confusingly the red hat open shift twitch channel. One of those will have the stream. So do be aware that if you happen to prefer twitch twitch does retire those after I think 60 days. So we also publish a blog post with each one of these. So if you keep an eye on cloud that red hat comm slash blog will have a blog post that will have a link to the to the stream embedded inside of there as well as links to specific time stamps for questions and topics and stuff like that. Oh Stephanie tells me it's 30 days. All right so and in case anybody doesn't know Tiger was Tiger was an intern with our team. Two years ago Tiger three years ago. It was before the before everything that happened out here in the world so and he has since joined Red Hat so congratulations by the way. Nice. Nice congrats welcome. Yeah, so all right well thank you very much everybody I hope you have a great rest of your week a great stay safe out there and all of that's Johnny anything from you. No just have a great week and we'll see you next week. All right and and that I will give you the last word. Yeah, so I'll a clue at CLEWETT at red hat.com just in case. Thank you. Thank you.