 Good morning. Good evening. Good afternoon wherever you're hailing from welcome to another episode of redhead advanced cluster management presents I am Chris short executive producer of OpenShift TV. I am joined by not the entire Rackham team, but close Scott Barron's likes to bring the heat on these calls. So hold on tight and Scott I'll let you take it away with all the intros and everything grab your hats Chris, you know, we like to bring the thunder Bring the pain for those of you in the Northeast under six feet of snow We're going to try to heat things up today with Rackham and some thrilling life-cycle scenarios around clusters and Submariner disaster recovery. So really packed in a dream team We keep setting the bar higher and higher and your content keeps getting better and better So we're just trying to keep up and I just keep having to find brighter and smarter and more talented people to bring in here So I'm gonna kind of move around so I got Randy go ahead and jump off and give yourself a quick intro Sure Scott, thanks So my name is Randy Bruno Peaver Jay and I'm a software engineer from the cluster life cycle squad here at Red Hat and cluster life cycle is one of a few squads working on advanced cluster management and today I'll be introducing red hat advanced cluster management. Nice. So Randy's gonna help set the table in just a minute Let's introduce Ryan Ryan cook Ryan cook with the office of CTO Today, I'm gonna actually be showing off a new application. We're working on called scribe What scribe is gonna allow you to do is to move data between clusters Do disaster recovery and do fan outs of multiple clusters? Nice, and I got Ralph here. Go ahead Yeah, that's possibly open shift architect in open in Red Hat consulting and today I'm gonna show you a disaster recovery scenario with Cochran's DB and Marina and Rackham if you've ever read an open shift blog that deals with disaster recovery. I'm certain Rafa wrote it Or we basically told someone to write it but a huge brain trust of information there We have another individual that hopefully is gonna join after he literally puts out a fire and that's Michael elder our senior distinguished engineer and in Rackham Been here since the start of Rackham when it was back in their IBM's preview as MCM So I've been in this space for a couple of years. Hopefully it's all good on his end And the fire alarms that we were hearing in the pre-call rollup. We're just a yeah, I'm hoping they're false Yeah, false alarm So so with that, let's let's turn the mic over to Randy Who's gonna kind of set the table in the cluster life cycle make sure people understand what we're doing in that space and bring Them back up to speed with Rackham building up clusters. Go ahead. I got just that just sharing the screen Beautiful awesome got a couple of slides. Don't go to sleep folks. We got demos queued up Very cool. So as I said before I'll be talking a little bit about Red Hat advanced cluster management From the perspective of a squad member from cluster life cycle and this will involve Version 2.2 of our release, which is the most recent Coming out the beginning of March, he's getting ahead of himself. We still have about a month a month till GA very good a little ahead of myself, but Let's get it going right so advanced cluster management is just one part of the larger OpenShift container platform Right, but by building on top and leveraging the OpenShift container platform ACM is able to bridge the gap between high-level container control via orchestration and Basically the next logical step, which is high infrastructural level control Cluster scaling workload and application management policy-based governance and monitoring all from a single console ACM That's it. We're the top of the cake. We're the purple icing on top. Yeah, exactly So essentially ACM will be leveraging one cluster as a focal point for control and we call this the hub cluster And that's where ACM is installed and through a process called importing We then put other clusters under the purview of the hub and we call this our managed clusters, right? So with advanced cluster management, we can import or manage Kubernetes clusters from major cloud providers like IBM, Amazon, Microsoft, and Google And we're also capable of provisioning and installing All those clusters from the context of that hub cluster control plane so from a single place and you can also provision and Create VMWare clusters from that control plane Sorry Bermedel assets or VMWare resources essentially on-premises resources also from your hub cluster And from your hub cluster, you can view your managed cluster health. You can deploy applications You can enforce policy and you can also target your clusters for upgrade deletion or deep provision so a lot's going on from this control plane and we wanted to give our developers the control and a Sensible amount of ability to dive deep but also have a high level of Control from the console so they don't have to do too much legwork so I'm going to Run by the Environment right now or ACM Provision use case From OCP Or the open chip container platform you can access an installed instance of ACM right from the application menu and For this case, we're going to jump to cluster lifecycle and today we're going to be provisioning a cluster and then upgrading it right now I have two clusters imported my local cluster, which is basically the hub and an abstraction of the hub managing itself and One that I provisioned earlier Right now. I'm going to create another one So you've already got two on Amazon right on every WS your local and Importing one so you run through create You're quickly getting into the scope for you to have multiple clusters that might need to talk to each other Right, they might need some multi-cluster networking, right? Okay, cool. So show us what's happening here Absolutely So you can see here. This is our creation wizard of Like I mentioned earlier, we wanted to abstract a lot of this process Down so that's very easy to just Grabbing go and create a cluster, but also we want to give our developers the ability to access the YAMLs and to further customize How they want the installation to go So right now I'm going to provision for another AWS cluster I have my router connection, which is just my credentials for AWS already set up and If I wanted to I could go ahead and jump into my install config YAML and Make further edits on the YAML directly or the cluster YAML. I don't need to So I'm just going to go ahead and create and look at that smooth interface pattern fly rewrite that we Got here in 2.2. You got provider icons all the pattern fly bells and whistles Console elements just look awesome. Nice work there, Andy. And that's a good point This release we spent a lot of time most of our time realigning The UI so that we matched the pattern fly Styling that was already happening in OCP and create Not synergy, but a connected styling through the products. Yeah consistent experience, right exactly So yeah, so this is going to take some time to provision Luckily, I already have this Nice little demo cluster sitting around And we can quite easily see when our cluster has a upgrade available You can jump directly to the OCP console for The cluster, but you can also upgrade from your Rackam console by clicking this upgrade button and selecting your next version Right and once I do this It's going to send signal to the cluster itself and then the table will upgrade this this actually takes a moment But once it gets back status from the cluster that upgrade is happening This status will change to upgrading in process or in progress. You'll see it here as well And that takes just a few seconds Nice, so it's all integrated Tight finish look and feel Create upgrade destroy detach all the life cycle things that you can do You're at a point Randy where you've demonstrated a new problem you know like a New opportunity we call it where we can take advantage of these clusters and do multi cluster of things I See Michael here video on hopefully everything's okay at his house. There was a fire alarm going off Yeah, do you want to jump out on me a little bit? Yeah, do you want to give yourself a quick introduction and help us understand where we're going next sure my name is Michael elder I work in engineering on ACM and wanted to Introduce this concept From what Randy showed us what ACM can do managing across many clusters And we're going to see some some really neat examples of what you can do with as many clusters But I wanted to set up this concept of Submariner as we before we kind of see that in the inflow So that still makes sense Scott. Yep, perfect. All right. Let me start my screen share So yeah, we're delivering a tech preview of Submariner in our raccoon 2.2 which is coming out in about a month So kind of talk us through like what do we gain in there? What are the bells and whistles we're getting with Submariner? Absolutely, and and I still have a habit of calling it Submariner And I know a lot of folks like to call it Submariner And it wouldn't be a good community project name if there weren't multiple opinions on how to express it So, you know leave your Leave leave your opinion in the comments. I don't know how you think it should be How it should be pronounced, but this project whatever you want to call it brings a neat capability It lets us actually think about Okay, I've got many clusters available clusters only as important as the workloads they run and How do you make them more valuable for workloads make them able to connect and communicate more effectively? So whereas if you think about a traditional Kubernetes cluster, you've got Networking that extends across the cluster right pods have a certain network services within that cluster have a certain network and Even though pods containers run on different nodes within the cluster They have a consistent networking layer. Submariner just extends that concept out and says hey We should be able to do this across any number of clusters in this picture We've only got two but clearly as Randy has already shown you it's really easy to stand up many clusters and bring them under management and What Submariner or Submariner you'll be able to actually bring those clusters into their own consistent networking domain Under the covers the way this works is through and let me go ahead and put this in present mode in case Some of these pictures and icons haven't come across clearly. We've got clusters It presents this logical view of networking. It might be my only the only one that's not seeing a screen share right now. Oh Even better. Let's see That sets off your alarms No, apparently Is the way to go here. Let me bring this back and let me see if I can It's the Internet of Things and they're all tied together Share screens set off alarm off you go. Yeah screen is visible though Okay, I think we're seeing a deck now, but Michael had to maybe chase down an actual fire alarm I've never seen demos fail In in real actual catastrophic ways just mostly like code related great like yeah Talk on this lie if you want Scott. Yeah, go for it. It's all you. Yeah. Oh We went at one. Yeah, so this one. Yes Submariner gives you a way to create a network tunnel between clusters In particular as you know with OpenShift we create an overlay STN. That's where the pod Network is and where they get an IP from right so now with Submariner you can create a network tunnel between these SDNs in different clusters. It's like putting a router a switch router between These two SDNs and now magically the pods can talk to each other So you get three main features with this product you get the basic IP to IP communication which is called your pod IP networking and then you get service discovery So if you have a service in one cluster, you can discover Discover it from another cluster using a DNS call. So very native to what already happens in OpenShift you just get a different domain to do Multi, you know cross cluster DNS lookups and then you get load balancing also If you if you open a TCP connection to a service IP you get load balance to the pods Behind that service, but those that that service and the pods live in another cluster And finally you get a network policy for cluster-to-cluster communication So you can still define The network policies that we are familiar with but in this case between Across clusters. I think these slides goes a little bit more in detail about how the network tunnel is actually implemented So you you you pick some a cup some nodes to be the gateway Okay, and the network tunnel is implemented as an IPsec tunnel so these nodes need to be routable between each other the rest of the nodes don't have to be routable and then The the operator the submarine operators Changes a little bit of the routing rules in the normal working nodes so that when you try to connect to a pod in another cluster The communication flows through the gateways You have a picture because I wouldn't be able to do any of this. This is this is remarkable Thank you for having people like Rafa and Ryan Cook in this world that can do it. Do you want to start? your environment Rafa where you've got a demo with the Sure cockroach DB scenario Yes, I'm gonna. I don't know if you had slides in front of that. But I'm gonna see real quick so you should be able to see my Screen with my presentation here So I want to present this idea of cloud native disaster recovery. This is an idea I've been trying to I've been reasoning upon in the last months and I'm coming up with this definition to make a little bit of contrast between traditional disaster recovery and cloud native disaster recovery but in traditional disaster recovery most in most cases the disaster detection is is there is a human decision behind behind the behind Triggering the the disaster procedure. So we see that something is going wrong with the side Okay, it's time to to to kick off the disaster recovery procedure It's a human decision in cloud native. We want we would like that to be completely autonomous So the system the text that one of the data center is down and reacts to that The disaster recovery procedure must in traditional disaster recovery can be automated Sometimes it's automated, but many times that I that I see it's a mix of automation and human actions We want it to be fully automated RTL and RPO which is the are the main KPI to Determine how good your disaster recovery is and stands for time that you know the services down and and the The time a much transaction time do I lose RPO means so much back in time? My state is when I recover We want them to be either zero or near zero So we don't we don't have This continuity in service and we are always completely consistent. We never lose state And when I was explaining this to other Another audience they told me all these statements are too bold. We can really be zero No, I'm not selling here. This is scientifically accurate. We are gonna have zero downtime and The near zero downtime because the look global of balancer needs a little bit of time to we You know reconfigure itself, but zero absolutely zero data loss Also traditionally the process owner for defining this disaster recovery procedure is around the stars team, okay Big but but in the future, I think this responsibility will move to the application team and They will be responsible of deciding how their application is and it needs to do deal with disaster recovery and Then traditionally the main technical capabilities that enabled the disaster recovery procedure We're found in storage, right? So backups volumes sink this kind of capabilities, but in the future, I think the capabilities will be found in networking and It's West communication is very important for this new wave of application to establish clustering so that a database that cluster across different regions and Some arena is a way to establish this is West communication capability when you're running on open shift And then a global of the balancer that can sense the health of the application that is low balancing So so a smart global of balancer that can do this These out of autonomous disaster detection are the capabilities that we're looking for So going going ahead with the demo This is the infrastructure that I prepared So here if you can see my pointer, I have a Contra cluster. I just learned that I should call it a hub cluster, but it's the same concept It's where some arena is running. It's what I used to set up the other clusters that are running in three different regions in AWS and I also have in this cluster a Community global of balancer operator, which will observe what these clusters are doing and Program the program are route 53. Okay, which is our global balancer. It's the DNS in AWS And then in this cluster what I prepared is is the tunnel Okay, the submarine or tunnel. There is so they're all connected Cluster three is also connected with cluster one So that's where we're starting from and I'm gonna show you Here I have my clusters so the local cluster is the hub cluster and then I have these three clusters and I love the the feature that yes that Randy was showing a little Minutes ago yesterday this cluster were not Up to the latest so I selected all of them and then I click this button here That now is great and I just kicked the upgrade for all of them at the same time Yeah, thank you what do I need to do zoom in Okay, yeah control plus plus a couple times. Yeah, okay, so I was saying I selected these three cluster and upgraded Just with a single click. It was really cool So next What I what I did on this cluster I deployed the cockroach DB so cockroach DB is a new sequel database that allows you to deal with this Distributed geographically distributed situations and it's able to maintain consistency and availability When some of its instances go down, okay? It has other other interesting feature like linear scalability and and Essentially Full consistency when when there is when there is an outage. It's able to reorganize the internal Data structures that it manages So the way the way I deployed here is there are three instances for local Local leader detection and then there are Local quorum and then there are obviously three regions. So for a total of nine instances of cockroach DB That form a logical a logical database a logical instance of a database so we can see it here I have it Okay, so I have it here. This is This is a web application served by the global of balancer, so I'm not sure exactly which regions is giving me this UI But it's showing me the state of the cockroach DB database and I can see that It tells me where the Where the Regions are so it this this database has some awareness of the geography where it's deployed Okay, so now I'm gonna kick off some workload on this database Okay, so I have three clients That are set up in this way Well, you can do us a favor It's sometimes the audio is getting a little loud when the microphone is too close If you leave it at your chest Chris can adjust the volume so that it's it's perfect for the listening audience. Thank you So I have I have a Client per region that is pumping transactions to to the database. Okay This is the TPCC standard benchmark test for transactional database So you can use this to measure how good the deployment of co-currency base, but we don't care about that Now we we just want clients to be Doing some operations on the database and what we're going to do is to take out a region by isolating the BPC on which The openshift and co-currency VR installed and we are gonna observe how the cluster the co-currency big cluster reacts. Okay So This is where the demo may go wrong. So let's Let's hope Everything is fine. Okay, so Focusing in they're like, oh really okay, I can't wait So as you see we are Pumping these transactions. I don't need to explain exactly how the test works, but we can we are you you see that we are getting stats on the on the latency now on this monitor here on this console, I'm going to run the command that isolates the cluster On on on the third region. Okay, so I'm going around and when you do When you bump up the zoom on that one too, so we can see Okay, I want to say is control plus plus here as well. Wow, it just died on me. Oh For you Come back up. All right, so it's here So no zoom I think Let's see Here we go. No, I didn't see it. Okay. You're doing a little slow. Okay, so you should be able to see a little bit more Or a little bit better. So I'm gonna run first these two Okay Just to make sure let's Let's see that the variables have been Yeah, okay So these two commands will make We'll add these deny rules to all ingress and digress traffic from the VPC which the one of the three clusters is running. I think it's the one in the West region So we now expect this see we're not receiving a stream of logs from this pod anymore This pod was running inside the cluster because we lost connection. It's true You you may see for a while some error But they recovered pretty quickly. See this this one had one error for a second and also this one has one at one error, but they're now continuing working Working correctly If we go back to the console, we see that crux is we as detected that it has a problem with this region here and In in a minute, it's going to decide that the region is actually down, but beside that Cochris to be is still available and still continues to work so No intervention on my side except making the disaster, right, but the service continued to work Because it autonom is autonomously was able to recover so no downtime no no data loss So hypothetically if that one cluster had all the fire alarms in its house go off at the same time Right in the middle of a live stream Then the other two clusters could still pick it up and continue to work. Is that what I heard correctly, right? Okay, that makes no Now if you have ever been in a real disaster, you know that it's painful to to run the Disaster recovery procedure and start your workload and some other data center But it's almost as much as paint a painful also to do When when the data center is recovered the reason that the center is recovered to re re swing back the workload where it where it used to be so instead here we're gonna do the same thing we're gonna Re-enable the traffic between regions To the to the third reason and we're gonna see that the cluster just autonomously recover Without us having to do anything So also the operation of when when things return to normality. It's done. It's handled automatically. So let me grab the Script here. So again, we are now doing an allow rule for ingress and egress in that VPC okay, so The cluster here should sense this well my computer is Now, you know exited because this spot died and now it realized it and So this client died because because this test is not resilient to this kind of failures But if you had if you had a client that it was actually receiving traffic from a global balancer like a front-end when the front-end comes back up you it will start serving again, okay and Here Cockroach should Come back should start recovering these Undereplicated ranges these are I think the the word that cockroaches for its own partition But and as you can see these other clients have kept working without any issues I Don't know if we have time to wait for these undereplicated ranges to recover fully Yeah Scott if you want we can come back later to this I think we should let it percolate and then I think there's another opportunity. We'll come back to it later. Yeah That that's it's tremendous. I mean the the work you've done To to build that out knowing that rackum includes the submarine or tech preview code knowing how that's all coming together with multi cluster Opens up a lot of opportunities. I've seen Ryan Cooking up Some other interesting scenarios. Sorry. I'm on a dad a dad joke rule this week. So I'm gonna keep Ryan do you want to take the screen share and show us what you've been working on with scribe and you might be on mute? Definitely on mute. Okay Scott you were not lying. I'm just showing so right now Ryan can you increase the font size though? Oh, I just build out these clusters in the last I just finished a couple minutes ago So, uh, let the panic and fun begin So do we need some more firearms or are we actually cooking with fire here? I think I had my own mental fire alarms going off but Disaster recovery scenarios with actual maybe possible disasters going on. Yeah, see I don't have a fire alarm I just have My child next to her blasting music. So I have my own little disaster going on So just to kind of set context of where I'm gonna start off today I'm I come from an ops background and a lot of us have been there Pager goes off at three in the morning and You have to respond Very early on when I first saw a CM it brought in some capabilities that Would have made my life a lot easier back in the day Just the ability to go from one data center to another Without intervention. So what I'm gonna do is today. I'm gonna show that But I'm gonna add in this component called scribe. So to kind of start things out Let's talk about scribe. I'll put this in present mode so What scribe really is and what I'll be showing today is Our sync within scribe and what we're gonna do is we're gonna have one site serving kind of as our primary data center And another site will be our failover data center And what does our sync capabilities going to do is it's going to replicate our storage? Just over and over on whatever schedule that we'd like to set at the time For this demonstration today. I set a pretty aggressive schedule of every two minutes Send the data from Virginia to Ohio So it's really cool. Just to see, you know, how fast your meantime to recovery could be And you know, that's really dependent on your application as well One thing I do want to add in as well. I know Chris short will be very excited about this The replication is completely managed by YAML. You just take this YAML shove it in ACM ACM takes over and so it's that get ops capability of managed storage Replication that's awesome And so what are some use cases I've talked about disaster recovery, but you know You might have a data center where you're safe for example running a bunch of containers Storage and you might have a provider running GP2 Those kind of mismatch when it comes to storage but with the use of scribe You can actually set up the replication between those two completely different storage classes So to get things started right now, I'm going to move the screen over here and exit slides all right, and so Today I'm going to show off this amazing docu wiki site It's gonna write a quick hello to everybody and What we're gonna do is we're gonna fail this application from Virginia over to Ohio and As I showed off at the beginning of this call. This is a brand new cluster. So what we need to do is We need to add in our failover data center. So right now. I only have a primary location as you see I I'm gonna try to zoom it so we can see some fun labels here But the purpose is our docu wiki application which I just showed and then the site equals primary We're gonna actually use that to do our placement of our Storage replication. So I'm gonna go ahead and import our failover cluster And that's our purpose docu wiki and then our site is failover So we're gonna generate the command to kind of add our cluster in I'm gonna copy this and I'm gonna go back to my terminal. I Believe this testing is good enough. We'll just accept it as is if it breaks It makes the stream much more fun so I'm going to export my cube config or Figure out my contacts currently. Wow. I can't remember how to kubernetes today Okay, so what we're gonna do is we're gonna add our failover cluster in This is gonna take a couple minutes just to get all of the, you know, kind of required components placed so in the meantime, let's go ahead and Take a look at what we're gonna actually do Like that one out. So as I was saying at the beginning, we're gonna kind of use Rackums cluster placement capabilities as you see here, we have cluster replicas set at one And I'm gonna try to zoom this in even further and what this means is if my application were to fail on The one cluster in which it's running Please run it somewhere else that is currently running To me, this is the most powerful Extra admin to your team ever it's someone that's gonna say hey We need to switch over without you actually needing to wake up at 3 in the morning and switch over So what are we gonna do with our data? So we're gonna take our primary sub M1 cluster Which is using the GP to CSI storage class and we're gonna send the data over to our Sub M2 cluster running OCS So it's really cool that we're actually going between two different storage class providers And then lastly, I'm just gonna show you a bit of the yaml But we're gonna take a look at that more in depth when we actually deploy it So as you see here, this is the Icon that you'll see within the operator hub when you go to deploy ACM Or RACM and we're gonna have RACM actually place our replication destination on to our sub M2 cluster and then our sub M1 cluster is gonna get it's gonna be kind of our source of all of our data So at this point, let's take a look and see if our cluster is loaded in It's in a ready state. It's still starting a couple of the remaining pods But it's nothing that would stop us from moving forward So let's deploy subscribe components So my default any open shift cluster automatically gets the scribe operator installed So the CRDs are available any time a cluster is added into ACM So it's really cool that that's completely automated for me. Let me have a better export There we go Okay, so we're on our ACM cluster now And what we're gonna actually do is we're gonna take and we're gonna apply our replication YAML. So F and then we're gonna do ours ACM replication and We first start out with when we were using scribe with the R sync capabilities of Deploying our primary or actually I'm sorry deploying our failover The reason why we do this is because that failover is gonna create our service address for us and so when we go back to manage applications, you'll see The replication application with an ACM that we just deployed as you see here It's going to begin to start populating throughout our cluster You'll see this little box here turn green momentarily Nice, so you got it imported as a cluster. You just built moments ago applied your YAML to generate this You know application resource which is going to represent the failover scenario that you're about to drive through Exactly and the cool thing about it to kind of bring everything back together I'm using Submariner as the pathway to send the data between the two clusters. That's awesome And in this case you're also using applications, but not actually deploying pods your application is just deploying some additional Components in this case that replication destination, but it's not a traditional app as we think about deployments pods stateful sets, etc Right, right and this replication destination is going to kind of create a Kubernetes job Which is just kind of start up shut down start up shut down as as they kind of see fit as it's needed so as you see our little checkbox turned green and So let's go ahead and take a look at what was actually created Going down here went to the wrong one the joys of life Look at our actual YAML for our The resource animal so here is ours replication destination and the important thing to see here is that at our destination we're going to use OCS as our volume snapshot class as well as our storage class to You know write our application to and then down here at the way bottom. I hope you can see it summary there we go There is the Summariner cluster IP address that allows me to be accessed between the clusters using Summariner And then finally you'll see an SSH key because we're using our sync. It is SSH replication between the two sites So let's go ahead and add this address in the SSH key to our repository And this repository will be available after the call for everybody to kind of play with or poke at and see Everything in there. So we'll get our Replication so as I said earlier We are going to be incredibly aggressive with it and we're going to try to Replicate our data every two minutes. This is the fun part about the schedule Depending on your application you might not need to replicate it every two minutes, but You know, it's simple linux cron to be able to establish that relationship And then lastly we're going to take a SSH key from the failover cluster. That's a secret that's created by scrod And we're going to bring it over to our source cluster. Um, you can actually by default bring your own keys but I didn't want to load those into Git and then have my repository out there and during the demonstration somebody Decides to mess with my cluster out in the world when they're scanning for create for keys. So So let's go ahead and get secrets for docu wiki in the context of failover from sub m2 And then you will see that there's this secret here So i'm going to take that and i'm going to output to yaml And then i'm going to write it to a file into our replication source rsync under a yaml file so Go ahead and take a look at this As you see it is an ssh ssh pub and um Here's the secret I'm going to scrub out a couple fields before I load it into git so Okay, and now within acm. Let's go ahead and create the source application So what this is going to do is acm is going to find the replication for the primary And then deploy all of those components onto our primary cluster So going back to acm now This will update momentarily and we should see a Primary cluster and it's going to get the ssh key and the replication source So the cool thing is here now there's going to be a volume snapshot that's going to be created on our failover cluster Like how ryan says the cool thing when there's already been like 17 in a row, but Keep doing it. Keep keep wowing us with more cool things Hey, i'm just absolutely blown away by you know The ability that acm gives me to place all of my stuff without having me having to place it Um, I feel like I was somebody's get-ups to what one point in my life You've lived that you've been that I've been there So, okay, so our replication source has been created and what this is going to do I'm going to split my terminal Just so that hopefully I can see when the pod starts And then we can actually see the replication Taking place And I will increase the font in one second Okay, so I can do a git pods I'll actually get replication source session docu wiki And we're going to see that at 48 there is going to be a sync. So that should be like now So we should see a Scribe pod being started momentarily Chris, do you have drumroll? I wish Zing in there too You know it Chances are if I had a drum roll it's probably copy written and would get nabbed and we would get kicked off a youtube or something So I need to find like an open source sound effects thing Work to do. Yeah. All right. So at some point, uh, it's going to be 148 or 248 So 248 And this should start Just waiting waiting waiting I feel like the waiting is actually more intense when it's uh live then, you know Absolutely, right like note We embrace failure on the show, but waiting is always harder when it's live. Yeah So let's uh, let's troubleshoot live and see if we can see what's going on. Okay, so Waiting for snapshot to be bound It happened it did the work while I was panicking so Let's see if um Get pbc-n docu wiki All right, I worry of good dad jokes. They watch pbc never replicates I think so. I think that's the thing. It's almost like the tree falls in the woods So, okay, so we have our application here So while I dance in the background, why don't we just uh rafa if you want to show the populated, um Uh cockroach db Let's do that. Yeah. Yeah, I'll have something in a moment. All right um, I'll take the screen again yeah, so A few minutes after we We stopped watching this it actually Recognized that these clusters were back alive and if you remember we had about I think at under replicated ranges and very quickly they were They came back to be fully replicated and healthy. So the cluster Cockroach db cluster. I'm not an expert. So I'm not probably making justice describing how it works But once it it recognizes that those Nodes that were not reachable are back online It will essentially Resync the state with those nodes and they they are back to serving workload So, um, if you want To be able to replicate this demo We have all the information here. I'm not sure chris how we can Share these links, but um Give me access to slides. I can drop them in Yeah Yeah, and uh, I'll give you access. So it should be able. Um, they should be available later Uh, if you if you want to try to replicate this anyone should be able to Uh, run this demo Nice, and I know rough. You've been working closely with the cockroach db team On some of the right, right Thanks for Reminding me that I actually wanted to thank them because they have been very helpful in Advising me on how to set up this cluster out to run the demo out to run the tpcc testing the good results It was a overall a wonderful collaboration um Maybe while we're waiting appointed, maybe I didn't make well enough is this this pods they all talk to each other They don't talk through a load balancer. They need each one needs to be able to talk to the other one individually so They establish a web of connection or a mesh of connection and this mesh is enabled by the submariner network tunnel that we talked before so It's nice to see how Rackham allows you to create the clusters and deploy the applications on the cluster and submariners allows the communication Between the application deploying in the cluster when when the application needs to do east west type of communication And then if you have a load balancer in front of the clusters, you can now serve workloads in an active active way Yeah, it's been interesting. I mean i've been watching your work From afar but seeing the improvements and then understanding the approach Um articulating the performance characteristics along the way troubleshooting some of the latencies But getting to your original goal, you know slide two what you've talked about which is no human intervention RTO Rpo near zero. I mean you you're like really pushing that that model which I think is is pretty unique It's it's awesome to see it coming together with these tools Yeah, yeah, thank you. But and that's what I want to be able to enable for our customers, right? It's This this rpo and rto near zero is something that only the big web scaler like google facebook have or the new, you know unicorn startups But I think we have now enough technology that is accessible to everyone So that if you are willing to invest a little bit of time you can actually set up this kind of architecture in in more traditional enterprises And so I wanted to to showcase how that was possible with this demo There are obviously other products that can do it but Cochris turned out to be very good at this kind of You know handling this kind of scenarios. Yeah, almost a perfect fit Ryan should we cue back over to you or take some questions from the From the audience. All right. I I don't know if my mic's still hot. Is my mic so hot? I can hear you. Yeah, okay All right. Cool. Um, so it's actually helpful to you to define a volume snapshot class if you intend to use one um, so my issues were uh around volume snapshot class not being kind of defined so, um So if you look here, um, Volume snapshot Okay, so we should have that If I don't get a Volume snapshot in a second, then we may have to just get a Questions Well, either way we can we can definitely handle some questions on here. There were some good ones in the chat Chris, I don't know if anything bubbled up to Yeah, let's let's see here Start back at the beginning um There's a couple that that I would jump on I think one of the first ones that popped out in the chat was about ACM and high availability So and the question was basically is ACM highly available It is highly available across availability zones So out of the box when you deploy it it will take advantage of anti affinity Zones within the cluster if you are in a hyperscalar cloud, it'll put Set of pods across different these if you don't have az's it'll at least spread them across different nodes But really the question was asking about multiple data centers or region high availability across multiple regions And for that pattern You can have two hubs that are sourcing the same policies from github or from an object store We saw that pattern with a subscription coming out of a get repo Or we've seen that in prior prior twitches, uh twitch streams So you can definitely do that what we don't have out of the box today is the agent failover behavior So if I have cluster in region one attached to hub one Cluster in region two attached to hub two, they won't automatically fall back over to the hub We experimented with architectures with that in the past but The complexity around it made us shy away from trying to put it in the product But if you're interested in or have you know, real use cases where you're trying to do that reach out to us And let us know we'll work with you there And then the other question was about service mesh And submariner and how those two things related now Really, there's somewhat orthogonal right service mesh brings a set of capability around registration discovery And submariners submariners really about establishing the network bridge I wonder rafa, you know based on the applications and what you've done with cockroach tv Have you found a place where Where those two intersect really well or some examples that show them intersecting well? I think Well, I see the service mesh mostly working a layer seven And and the submariner it's working the layer three Yeah, it's true that you can also use the service mesh for doing tcp, but really where its shines is layer seven Yeah, our product roadmap has multi support for multi cluster in for the service mesh in this year Sometimes probably the second part of the year We haven't yet explored Whether submariner is a requirement for that or not it may be that it isn't So if you just need to do layer seven, you may be able to do it without submariner But I think in the end Used together will will allow you to do To handle both the layers three four, you know policies network policies and layer seven configurations So they they they definitely can work together We I have not heard yet of use cases where they are needed together Okay, yeah And chris, I don't know if there were any other questions or scott that you guys noticed that you wanted to draw out So we got a question Discord yesterday, and I'd like to ask that if you don't mind The user's having problems with distributing secrets Imported an existing cluster when I do that though, there is no cluster deployment to reference in my sync set Or select your sync set. Am I just missing something? Question mark Great question. Yeah, so If you're using hive sync sets are a way to push content down just to kind of make sure we're on the same page with What's going on there? The cluster deployment api object is what we use when we create a provision of cluster you create one and behind the scenes Hive will go off have a provision of job You're off to the races We use a sync set to deliver the initial agent payload into that cluster But once the agent from open cluster management connects back to the hub It will it's a pull model. It's connecting back to the hub and asking For state to apply and we do that partially So if there's ever a disconnect that cluster can continue to enforce policy or other desired state So if you want to if you simply import a cluster Was it created through hive cluster deployments? Then you can actually use A couple of different ways to deliver content to it In the product we really focus on that concept of a subscription So you saw Ryan use that to deliver content down And in that model the subscription comes from a source github object store, etc Helm repository And you match it to a placement rule and ACM will automatically deliver that content into the target cluster You can also use a policy So if it's something that you just want every cluster to have a sort of a foundational bit of config It's not specific to a particular application A policy works in a very similar way If you want to go underneath the hood you can take advantage of an api object called manifest work that's a very Low-level primitive that we use to deliver part of the agent mechanism And manifest work is declared in the open cluster management api repo So it you have to put that specific object in the specific namespace for the specific cluster So it's not as flexible as policies or apps which really use placement rules to dynamically Place and adjust content But once you've imported sync sets won't work because they're not They don't have the same information that they would have for a hot cluster deployment, but what we typically Walk users through is using policies and apps Makes sense Can I add a consideration on this? Yeah, sure. Go ahead. So secrets and githubs is is still an evolving You know situation situation we there aren't always Yeah, there aren't Perfect solutions there, but it's evolving the other thing to keep in mind is at least that I notice is when you start having multiple clusters managing them with rackham You need to be able to make sure that the same secrets are Are propagated to all of the cluster where where they are needed For example an application that needs to connect to a database and it's living on different clusters may need the same secret to connect to that database or I may have to create certificates in all of this cluster from the same pki so that Stats trust can be established between the pods that live in this cluster for example in my example of co-currency db all the communication is over tls And so I had to generate all the certificates from the same pki so you need you need to find a way to have Is source of truth for your secrets that all of your cluster can be connected to and from where you can source all of your secrets I like To solve this problem. I personally like ashikov vault a lot. I think it It's a good tool for solving this problem I'll stop there I I agree with that. I think vault's a very handy tool You can also use an object store bucket and create an acm channel to the object store bucket So you can protect the object store bucket So it's not storing secrets and get Another technique I've seen is using the sealed secret API which is a community api You encrypt the payload with the public key of a particular server And when the sealed secret arrives at the server It's decrypted with the private key. So it's easy to Protect it for a particular cluster We're interested in feedback one of the things we have kicked around in the labs is creating a multi cluster sealed secret You'd encrypt it once with a key on the hub and then based on placement behavior We would basically re-encrypted with a public key for every cluster that we distribute it to So that you don't have to encrypt the same sealed secret yourself For every potential cluster that's going to need that certificate or secret information That's a good topic That's probably its own hour long sessions Yeah, I feel like we could get uh christian hernandez from get ops happy hour and us all together to talk about Secrets in get ops just in general and see how you know that conversation pans out That might be a good opportunity in the near future. Yeah Hey, that that was a fun day. Rafa and Ryan. I appreciate you guys And randy. Thanks for the demo, you know to kick us off. You guys are phenomenal. Chris. Thanks for hosting us Do we want to Ryan? Is anything else happening on your screen or anything we want to look at or are we all done there? No, I think this is just going to force me to come back in the near future So be prepared Perfect. I will I'll encourage something So yeah, uh, thanks all I believe um I'll share the link to this in discord. So the person that asked in discord. I think they're watching but I don't Know for sure. Um, I think that covers it as far as questions go Anything else anybody wants to share before we sign off here? Appreciate everyone's time and uh, and a good conversation Yeah, yeah Yeah, good luck. That's a next time. Yeah till next time. Yeah cluster deploy Fire alarm go So thank you all very much for joining us today. Thank you all for watching out there It's been a very fun day of streaming here on OpenShift TV Be sure to check us out tomorrow morning starting out with the level up hour at 9 a.m eastern And uh, we'll be rocking and rolling from there when in doubt check our calendar. I'll drop a link to that in chat right now And uh until next time everyone stay safe out there Cheers, thank you