 Hello everybody and welcome to yet another OpenShift Commons briefing today. I'm really pleased to have Annette Kluwit and Andrew LeCure, Andrews with Crunchy Data and Annette is the principal architect here at Red Hat. And they're going to talk about database disaster recovery and how to make it easy and this comes out of the smart cities initiatives and I'm going to let them introduce themselves and their topic and they've got a great demo. So I'm really interested to see that and how it all plays out and if you have questions ask them in the chat wherever you are on YouTube and BlueJeans or on Twitch and we'll relay them to the speakers and have some live Q&A at the end. So with that Annette take it away I know you guys have put a lot of effort into getting this one going so thank you very much for being here today. Yeah, thank you Diane. So first off, if you hear a grandfather clock in the background that's me. I'm Annette Kluwit, I'm with Red Hat in the data foundation business team and be going over a lot of things that support this environment including the smart city demo. Andrew? Good morning. Yeah, my name is Andrew LeCure. I am the Director of Operator Engineering at Crunchy Data. In this role I am responsible for the development and implementation of our Crunchy Postgres for Kubernetes product also known as PGO, which I'm going to be going over in a little bit here this morning in the next few slides. Thanks Annette. Okay, I'll just start out just to give a bit of an overview about Crunchy. So who is Crunchy Data? Well really at the end of the day the way I like to describe it, our business here at Crunchy Data is Enterprise PostgresQL. If you're looking to deploy, maybe if you currently deploy looking to expand your usage of Postgres or just venturing into Postgres for the first time and looking to make Postgres a big part of your enterprise, we are the partner that can make that happen. We do that in a variety of different ways. First and foremost includes the expertise we bring to the table for everything and anything Postgres related. You can see that on the slide here. We have contributors internally here within Crunchy that do contribute directly to PostgresQL and the Postgres source. In addition to just a wealth of other vast knowledge and expertise we have within the organization, but in addition to that knowledge, we also bring to the table the tools and technologies needed to ensure you can get Postgres deployed in your environments in your enterprise environments according to your specific requirement and needs. This not only includes bringing things like certified versions of PostgresQL itself which can be deployed within your environments, but also providing the tools and technologies needed to facilitate deploying PostgresQL within your environment. In a variety of different environments where you might be deploying PostgresQL and that includes cloud environments which we're going to be touching a bit on today. In Crunchy here, we have a few products along those lines. The slide there did mention our Crunchy bridge solution which is a fully managed PostgresQL cloud solution. But today we're going to be talking about Crunchy Postgres for Kubernetes which is our Kubernetes based solution for managing Postgres. So what is Crunchy Postgres for Kubernetes? Well, we like to describe it as declarative Postgres. And really what this is, it's an open source solution. We also call it PGO that allows you to facilitate and streamline the deployment of production ready PostgresQL clusters by simply declaring what you want your clusters to look like. So within the solution we have what's called a declarative API which allows you to define simply via a spec exactly what you want your production grade PostgresQL clusters to look like. Next slide. So to dig into this a bit, what do we mean with a fully declarative solution? You know, really what that means is anything you might need to deploy to ensure your database architectures are fully production ready to go, ready to meet your needs, your requirements. What we're talking about here is allowing you to have a convenient, easy to use way to define via what we call a spec exactly what you want your architecture to look like. So for instance, you know, to touch on a couple of topics that are going to be relevant to our conversation today, if you want high availability or if you need disaster recovery, for instance, the solution we provide makes it as simple as defining exactly what you need for those elements within a specification. And from there, you know, this solution Crunchy Postgres for Kubernetes or PGO automatically takes those specs, configures, deploys those databases, you know, exactly how you have them defined. You know, so for instance, say you want a connection pooler, so your clients can connect to your database through a connection pooler like PG Bouncer. It's as simple as defining that within the spec and from there, our operator solution will process that spec and deploy the components within your environment to make that possible. You know, and again, I also touched on other functional areas, you know, this applies to as well. So disaster recovery, you know, we're talking databases here, right? Staphal applications, it's critical that your database is safe and protected. So you want to ensure you can easily create backups and restore from those backups in the event of a failure. And again, that's what this solution is designed to do. It's designed to make those things seamless and streamlined where, you know, you can easily define backup schedules and get that data redundancy you need. High availability is another big part of that architecture as well. You know, facilitating the ability to easily spin up replicas. That way, if your primary database goes down, we can fail over, keep your database available and ready to go. And that's really what the solution is providing at the end of the day. Providing users with a convenient, easy to use way of defining, you know, their database architectures across all pertinent functional areas, whether that's high availability, disaster recovery, monitoring or whatever it might be, you know, and the operator takes control for that point and makes sure that those, you know, database architectures are deployed and configured as needed according to your specific requirements. Next slide, Annette. So a big part of this solution, you know, so it's one thing for us to be able to take a spec that you've declared, you know, you've defined what you want your database to look like and get it out there and running. But that's only one part of the process, right? Once it's out there and running, we need to make sure it continues to be available, you know, and that your data is accessible as needed, you know, and that means ensuring that your database clusters can heal as needed. So to just throw an example out there, you know, so what this means is this solution is constantly monitoring, you know, your database environments, the databases it provisions to make sure they're healthy and configured according to your needs. So for instance, to provide an example, say your cluster needs a connection pooler, and that's what's shown on the slide here. And for some reason, someone out there, you know, chaos can exist in any cluster. Someone deletes that connection pooler, that connection pooler deployment. Well, what that means is the operator is going to immediately detect that, recreate that deployment and make sure your connection pooler is available and ready to use, you know, immediately after that occurred. You know, and that's just a theme within our solution in general. You know, once you declare your cluster and the operator builds, you know, and deploys a database architecture according to that specification, it's going to constantly monitor it, make sure it stays healthy, and make sure all the Kubernetes resources are in place as needed to ensure that database remains healthy and continues to serve your database needs. Next slide. In addition, you know, the declarative approach to defining your PostgreSQL clusters, you know, also enables GitOps workflows, because what that means at the end of the day, you're defining your databases using simple YAML specifications that could easily be stored in any version control system. That way, you know, because, you know, as we know, database requirements differ across different environments, different needs. So your database is for development versus QA versus production, you know, might all be a bit different. But by giving this declarative method of defining your databases, you can easily store your database configurations in a version control system where they can then be easily integrated into your continuous integration or continuous deployment pipelines, you know, to easily provision the databases you need at any stage of your software development life cycle, you know, to give you the databases you require and ensure the data is accessible that you need for your users and applications. Next slide. So another big piece of this solution, you know, is ensuring, you know, that your clusters continue, can be updated without interruption. You know, so again, we said, you know, it's one thing to get a database deployed that is out there and running. But that's not going to be the end all be all for your database, right? You're going to need to reconfigure it. You're going to need to tune it. Maybe you deployed it without a connection pooler, and now you decided you want one. This solution is designed to make that all seamless. So, you know, as you need to evolve your database, architectures, your deployments over the course of a cluster's lifetime, you can continuously do that without interruption. You know, and through a rolling update strategy that we've implemented within the operator solution, this means you can safely make changes to your clusters, and we will safely roll out those changes to all your instances, whether it's PostgreSQL configuration changes, changes to the architecture as whole. You know, our solution is designed to safely and cleanly roll those changes out in a way that avoids disruptions to your data and ensures your users are still able to access the data, access the data they need. Next slide. So, there's some of these bits up. So, you know, what are we bringing together here at the end of the day? You know, when we talk of, you know, talk about the crunchy PostgreSQL for Kubernetes, you know, really this solution tackles, you know, the critical functional areas you need to ensure you have a production-ready database system in Kubernetes. You know, first and foremost is high availability, right? And we've touched on this a bit, but at the end of the day, you know, you want to make sure your data is available when you need it and where you need it, you know, and the PostgreSQL operator solution makes that seamless. So, you know, the ability to add additional replicas to your clusters, you know, the ability to fail over to replicas in the event of a failure and make sure your data is always available as needed, you know, that is implemented in a way that is seamless and basically transparent to the end user. You know, so again, basically you're defining within our solution, you know, the elements you need. I want high availability, you know, I want multiple replicas, I want redundancy, you know, but we take that, we take the process from there and wire that all up to make it happen. You know, we make sure those replicas are properly replicating from, you know, the primary database. We make sure they're configured to be able to fail over to in the event that there is a failure, you know, when your data needs to remain accessible. You know, so that's a big part of our solution, you know, because we're talking data here, right? And first and foremost, you know, we need to make sure your data is available when you need it and where you need it and the high availability parts of our architecture are what makes sure that happened. But disaster recovery is another important piece of that too. You know, and this is another area the operator greatly facilitates your ability to ensure you have a strong disaster recovery solution for your database because at the end of the day we need to protect against disasters, right? You know, chaos can occur, things can happen. And at the end of the day, you know, data is paramount. You want to make sure your data is safe. It's available, you know, that way, if anything goes wrong, you can properly recover and get back to the place you need to be. And crunchy Postgres for Kubernetes facilitates that, you know, whether it's making it easy to schedule, you know, backup so you know your database is, you know, effectively backed up using proper backup schedules. Whether it's making sure your backups not only exist in, you know, or making sure there's redundancy in your backups themselves, you know, that way if you lose an entire environment, you don't lose your backups along with it. And really just all at the end of the day making sure those backups are properly taken and can effectively be used to recover your cluster in the event of a disaster, which is what it's all about. So another important part of the architecture here too is monitoring, right? Because we want to be able to detect problems before they manifest themselves in a big way, right? And that takes keeping an eye on the cluster. One of the things Crunchy Postgres for Kubernetes provides is an effective monitoring solution. You know, it's a monitoring stack that once your database is deployed out there, you can continue to keep an eye on the health of your cluster and continue to tweak and evolve it according to your needs. And this also means identifying problems like I said before they manifest. So you can get in front of issues with your database and ensure your data again is accessible as needed. Security is another big part of our architecture too. You know, not only is it important that your data is accessible, but it's important that access to that data is secure, right? And it's properly locked down. And our operator solution really builds in security from the ground up, you know, with all elements of our architecture, we're making sure all access to your data is done in a secure and controlled manner. Whether that means enabling TLS by default, using certificate-based authentication, scram passwords or whatever it might be, you know, our solution is designed to be secure by default. And again, when we're looking at deploying production grade clusters, you know, that is important. That's a critical part of our architecture here. You know, and the last piece here I want to mention is convenience, right? Because, you know, at the end of the day, we want to make it easy for you to be able to manage your databases in these environments, right? So say you have a production database out there that you want to clone into a dev environment so you can do a bit of testing or troubleshooting. Again, that is seamlessly done with Crunchy Postgres for Kubernetes. Or if you want to customize Postgres QL or your architecture in any way possible, you know, again, we make it easy to do through this solution. Next slide. So to sum up here, you know, again, we've touched on a few of these things already, but Crunchy Postgres for Kubernetes, you know, at the end of the day, it's a fully declarative and GitOps ready solution. You know, we make it so you can easily define the architectures, the other database architectures you need, you know, in a fully declarative way. And we take on the management of your database there to create a fully seamless experience for deploying, you know, production grade Postgres QL clusters. You know, and that makes it, you know, that includes making it easy to get started too. You know, we want to make it so as soon as you spin up a new database, not only is that database ready to use, but it's ready to be used by applications and consumers and end users. So, you know, we do things like automatically provisioning secrets with credentials that you need. So you can just wire those things right into your applications without ever having to look up a username or a password or create Postgres accounts. Again, the idea just making it as seamless as possible to get up and running. So your end users, applications, whatever they might be, can easily get up and running using this solution. You know, an easy to upgrade. That's another big part of this too, you know, as Kubernetes and OpenShift continue to evolve, you know, we're evolving along with it. You know, we've baked in the functionality to ensure that process is as seamless as possible for end users. You know, so you can easily upgrade your crunchy Postgres for Kubernetes solution, you know, as Kubernetes, OpenShift continue to evolve as well, get all the great new benefits that come with it, you know, with as minimal of effort as required or as possible. And at the end of the day, you know, when we sum all these things together, what does it really give you? You know, it's production grade, enterprise ready Postgres QL, right? You know, and it's ensuring we're, you know, allowing you to deploy the Postgres clusters that meet the specific needs of your enterprise, your specific requirements, their secure lockdown, you know, and make sure you have your data available when you need it. And that's really what it's all about at the end of the day. So, yeah, I think that pretty much sums it up, Anna. All right, thanks, Andrew. Thank you. Yeah, so I'll speak a little bit more about what Andrew was talking about in terms of deploying the latest version of crunchy data Postgres. Before we do that though, let's speak about the storage. So the group I'm in and Red Hat, OpenShift, our data foundation, this is one of our main offerings is OpenShift Data Foundation, used to be called OpenShift Container Storage. And I want to point out here that this is very much built from upstream open-source projects. One of the main projects that does the orchestration that you may have heard about, just graduated from CNCF is Rook. And Rook is orchestrating the entire deployment of the other very mature upstream project, CIF. And the two of them together can be deployed in a Kubernetes or OpenShift environment. And it is totally, as Andrew was saying, managed within that environment. And we'll take a look a little bit more later in the demo at some of the components. But two projects and being very close to OpenShift Data Foundation as a product, I can tell you that everything that we do downstream is basically created and tested upstream before we pull it downstream. So good use of open-source. And then the operator framework, which has become really prevalent and makes it really, really easy to deploy. In the case of what I'm going to show you in the demo, deploying Kafka, deploying Postgres, the Crunchy, deploying OpenShift Data Foundation, all of this is operator framework and has also matured, I'd say, I mean, operators have been discussed for the last three or four years, but now I would say most applications that are serious about being in Kubernetes do have an operator-based deployment. And not only is an operator, just to speak about it for a minute, not only is an operator deploying, but an operator is reconciling back to the state that you want, which is a very powerful concept in terms of managing your applications. So again, just to look at the components, we already spoke about Rook and Suff. The other one that we've included with OpenShift Data Foundation or used to be OpenShift Container Storage is an object gateway that has some nice features of being able to sort of bridge two different environments, say Azure and AWS, and be able to mirror objects between two different clouds. So all those three are going together to form the storage solution that Postgres is gonna use as well as, I, you'll see in the demo that we're also have Kafka. So the way that we use the operator framework, both for administrators and users is there's more than one operator. So we have what we call an OCS meta operator and that operator is gonna sort of bootstrap the other two. So the Rook-Suff is the one that I spoke about is an upstream effort and then Nuba Core is also upstream. But the OCS operator is sort of, again, doing the reconciling, doing the management and it's always watching, if the OCS operator is not running and ready, then something has gone astray and you need to look at the other operators to figure out what has not deployed correctly. So it's a multi-level management but works very nicely for getting all of the orchestration done and then maintaining and also as Andrew spoke about being able to upgrade which is one of the features of using an operator. So getting to the topic of this particular session, if you look at the disaster recovery continuum, the way we're seeing it at Red Hat is that you're starting on the top right, backup and restore, whether it be outside of OpenShift and Kubernetes has been a solution for quite a long time. And the question is always, how much do you backup? How often do you backup? And then from a restore point of view, how long does it take to restore and can you restore back to a known good state? So in terms of Kubernetes and what OpenShift Data Foundation has done, we've integrated highly with traditional backup vendors, say Spectrum Protect Plus or NetBackup or Trilio, Castin. And we've done that via the CSI interface which allows you to do snapshot and clones. So that is quite mature. We released the capability to do that about a year ago, supporting the CSI standard. And we can provide more information about that, but as I go down, we have a regional DR. We're calling that, essentially this is a multi cluster solution and then Metro DR. So to have a little bit more context here, as I said, for backup and restore, this is done using OpenShift Data Foundation, the storage solution, and we're doing it with snapshots and clones. So if I create a volume mounted to an application and then I want to have a crash consistent snapshot, I can use the snapshot capability either directly through the UI or a YAML to back it up or I can hook it in to, like I said, the traditional backup vendors are all pretty much now having a CSI capability to both initiate the snapshots as well as to use those snapshots to create clones for restore. And we have a whole list in Red Hat of ISVs that we've done particular testing and created video solution guides and others that did to see how you do that. So regional DR is a solution sort of in development. We have some pieces of it now at Red Hat, but it's really the idea that we have multi-cluster. So the requirement here is we have two clusters, Kubernetes or OpenShift, and we're going to asynchronously replicate the data for the persistent volumes between those two sites. Now there's a lot of stitching and a lot of orchestration that goes into that because it's not just the data replication at the persistent level, it's also at the Kubernetes object level or resource level that you need to do replication. Some ideas, you know, how you could do that with upstream is using like the Valero backup and restore APIs, but this is a solution that we see being available probably by the end of the year as a sort of through the advanced cluster management, which will do the orchestration for creating multi-site and then be able to also orchestrate the data replication and the Kubernetes resources to be replicated from one cluster to another. So getting to the left here down there is sort of a, from a storage perspective is a synchronous solution. So that's why we call it Metro. It has to be within a certain latency or distance. I usually say a couple hundred miles, no more than a couple hundred miles between the sites. And the other main thing here is that we have this arbiter or consensus location. So the storage of OpenShift Data Foundation backed by SEF needs to have monitors which keep track of the cluster and they need to have consensus. We also know or that LCD, which is used for the control plane of OpenShift needs consensus. Something like ZooKeeper, which is currently used with Kafka needs consensus. So you need a site that you can put a consensus node at and be able to keep for even in the case of what I'm calling a data site failure. So this is a solution that is available now with the latest versions of OpenShift Data Foundation. And it is a pattern which you do find where sites are a couple hundred miles apart. And the consensus location can be probably twice as far away, maybe 500 miles. So it is something you can do today to have resiliency. And if we look at sort of what is the recovery time objective and the RPO recovery point objective, which is more about, did I lose any data? It is possible to get a recovery point objective near zero depending on how the applications you have, really it's about the applications. Because persistent storage is great and it can recover. But if the application doesn't recover, it doesn't matter that I have the storage. So these are just some of the sort of the ways that you could recover. There is one issue, which it's not really a red hat or it's a community issue, which is if you are having an application pod mounting an RWO volume and you fail to know that it's on the cube, it loses status of that volume. It thinks it's still connected and it will not release that volume so you get a multi attach error. And that is listed in the case there. Workaround right now requires a force to delete that pod to allow it to recreate on an active site. So just a little more sort of information about how this would lay out. So we see that on the left hand side, we have the ODF data replicas and we have the monitors. So at a minimum, each site would have in this case, two monitors and then you would have at the third site, your fifth monitor, this is for the storage. Same thing for SCD, you could have a master at each site and then you have their third site and the monitor pod actually can use a toleration to schedule on the master, even if it's unschedulable. So the other thing we have going here is we have the usual OpenShift and for services, those would be placed in each data center. So if we lose a data center, we see here that because we have two data replicas left, the way that the storage is configured for this arbiter mode is you, every volume will have four replicas as we showed in the prior slide, but every volume only needs two replicas and monitor quorum to continue to be able to serve reads and writes. So assuming the application survives, then you're going to be able to continue to take inbound connections at a very rapid recovery. Again, this is based on how your application recovers but the storage will recover basically almost immediately. One way to make your application able to be what I call zone aware is it's a relatively new capability. I think it came out in Kubernetes 1.19, but it's been available since OpenShift 4.6. It's called topology spread constraints and it does require that you have replicas. If you only had one instance of your application, it wouldn't really matter, but as long as you have two replicas, you can use the topology Kubernetes zone label to spread your replicas for your application among the zone. That can be a hard affinity. So that, in some cases, you don't want your replicas to fail over to the active zone because the next time if you fail, you could lose a zone and all your replicas would be on the same zone. So this is a really powerful concept. In the demo, you'll see where I've applied it to Kafka and we also, Crunchy Postgres uses a very similar capability. The application that we're gonna take a look at to sort of challenge ourselves here is called Smart City, Green City, and some of the colleagues in my group in OpenShift Foundation created this demo and it has a lot of pieces to it, so it makes it sort of challenging to recover. But what we have here is if it was deployed, the demo doesn't have two different OpenShifts or multiple OpenShifts, but you have an edge environment which is basically collecting whatever data you're trying to collect, whether that be image data or other data. Then you run it through a model, in this case, a license plate retrieval model, and then you're going to basically put it onto the edge Kafka bus, go through Kafka, Merrimanker, come down to the core Kafka bus and then various applications are gonna pull from that and then Kafka consumers are going to basically take the messages and then write them into the Crunchy Postgres database. And since I don't really mention in the demo, I just wanna say that I deployed this using the latest Crunchy Postgres version, version 5.0. It has a new, I think it's a new custom resource Postgres cluster and I found it extremely, I mean, I'm not making a pitch more than what Anna did, but I did find it extremely easy to do, to create a replica as well as make the replica placement have zone anti-affinity and it was a really pretty easy experience. So this is using the latest Postgres or Crunchy Postgres version 5. The other thing if you've used Crunchy Postgres before, there's no longer, you don't have to install a specific PGO client for doing the kinds of things that Andrew went over. You can use a kubectl or OpenShift CLI commands to do everything. So following this second stage, so we start with getting the data at the edge move it through the mirror maker, maybe do some special things with the data before you pass it on. And then there's some object buckets involved where data is stored and pulled from. And then eventually we're going to do some calculations and similar to, I live in California. I don't know if, but anyway, all of the California toll locations as you go in and out of San Francisco area are now doing this and they basically recognize your, get your license plate and then I have an account and they charge me for going past, you no longer stop at any toll booths. So for this situation, we're going to challenge a few of our applications. Crunchy data is, like I said, it has a primary and a replica and each one of them. And we'll see in the demo how they're made to stay in their particular zone. And we're going to fail one and see that it switches to primary. The other thing going to do is Kafka. So Kafka is part of the solution. If Kafka is not able to recover, then it wouldn't really matter because it sort of, has the messages and it's providing all of the other apps with what they need. So the way that you can make Kafka currently is you need to have zookeeper has to have quorum. So again, using a toleration, I placed one of the zookeeper pods onto the arbiter or consensus location on the master. So that zookeeper quorum will be kept. So if we lose two of the Kafka replicas, two of the zookeeper replicas, we've still got, and within the Kafka cluster config, you can set a custom kind of attribute that allows you to say that Kafka can continue to operate with two replicas. All right, so I think we're still on time. We're doing good for time. I'm just going to rock and roll with your video. Okay, if I need to. There we go. The demo today is going to be for an application that is called smart city. And as you can see, it has a lot of different apps that make it up. The main components that we're going to look at being resilient is the Kafka core cluster, which is in the bottom square, the Postgres database using Crunchy, and the storage, which is not shown here, the storage from OpenShift Data Foundation. So on the top, we have what would be called edge locations. Those would actually be separate OpenShift clusters. But in the case of this demo, we're going to emulate that with what we call a safe node. So everything on the top will be on a single node that doesn't fail. And on the bottom, this is where we're going to be failing OpenShift nodes to see what happens. So if we go to the OpenShift cluster and the console, we can see that we have quite a few operators starting with the Red Hat AMQ streams. That's going to give us Kafka. And again, we have two Kafka clusters, edge and core. We also are going to use Grafana to look at the data and be able to see that the application is running. Local storage is used by OpenShift container storage to create the storage cluster using SAP. OpenData operator is where we launched the Grafana as well as SuperSat, which is going to be used for a dashboard that data scientists would use. If we want to look at further the components, we can go ahead and start with the nodes. We have three masters and we have five worker nodes. As I said, the one on the bottom is the safe node. So it's not going to be failed. We wanted to look at how they're divided. They're divided with a topology label. And this topology label is also with all of the components used to be able to be zone aware. And if we look at the arbiter, this node would be at a location, a third site, so that it could actually act as an arbiter and reach consensus for both. At CD, we'll see ZooKeeper, as well as the storage, the monitors need consensus. So this is our consensus or arbiter node. If we then look at how the data nodes are divided, we have a label for them called data center one. Data center one has two worker nodes in it. The Kafka core pods are here. The storage pods are here. And then the last one would be the data center two. And again, it's going to be using the topology label, we define a second data center. So we basically have three zones, an arbiter for consensus, and then two data centers that would reflect different sites that were within 100 to 200 miles apart. We also can take a look at the pods. And for the pods, let's start with the edge Kafka. So here we have three that are going to be our edge cluster. And in this case, as I said, they're all on the same node. This is a safe node that is not going to be powered off. And this is emulating another OpenShift cluster that would be collecting essentially the data, putting it onto the Kafka bus and then getting it via mirror maker over to the core. If we now look at the core, so these are the core, each one of these is using a OpenShift Data Foundation volume for its storage. And if we were to look over here to the right, again, the nodes are all different. So these are the four nodes that represent data center one and data center two, and there's a Kafka pod spread on each one. The way that that is done to make sure that it's just not arbitrary how they're done, it is done with topology spread. So this was added to the Kafka cluster. This is something you can add. And the operator will use that to schedule. So we can see that we're using the topology Kubernetes IO zone in the top. We're using it to make sure that there are pods on both zones. And then the second where it uses the Kubernetes IO host name, that is to spread. If there's more than one pod per zone, then it will spread the pods among the host. So that's exactly what we saw. We saw four Kafka pods on four different hosts. So we know now that the Kafka is spread there. Let's go ahead and look at the storage. So for that, we're going to use label and we'll start with the actual storage devices. So there's going to, again, similar to Kafka, there'll be one per node. And this is again using that topology label. And the way that the storage works is it is configured for every volume to, every volume is created with four replicas, but the minimum size that a volume can continue to support reads and writes is two. So we can lose two of the four here and the storage will still operate totally fine. The last thing I want to look at here is Postgres. So we have two replicas here. This third one is for backup, but these two replicas are again using topology spread. And we can see here they're on two different nodes. If we inspect the, how the placement is being done, we can look into the animal here. Here's how the topology spread is done. First, we're placing it on a node that's in either data center one or data center two. And then second, we're using the pod anti-affinity based on the topology label. So now let's see what happens when we have a failure. So we've got a couple of terminal windows here and I want to explain what you're looking at. Top terminal windows just showing a view of the nodes. Right now we have three masters, the next two workers are in data center one by label and the next two are in data center two again. And master zero, the very first one is the arbiter or consensus node and it would be placed at a third site. It can be at a higher, what we'd say latency or distance. So it could be maybe 500 miles away, maybe more. And then the two sites that have the data center one, data center two, they need to be not more than maybe a couple hundred miles apart. And then the last node is my safe node. Right under that is a view of the two crunchy postgres pods and they are because of topology spread or pod anti-affinity, they are in each a different data center or zone. And one currently is master, the one that is in data center two and the one in data center one is a replica. Down at the left is the view of the storage. This is actually what we call self status. It's looking very good right now. All of our monitors are in quorum and all of our storage is up. On the right is a Kafka cat, which is showing us the message on the topic called LPR, license plate retrieval. And then we can see that right now, we're definitely continuing to get that. So what does that look like from a dashboard point of view? This is what the Grafana dashboard is looking like. And this is obviously a demo and it's not really London right now. But if you notice the car images will be changing, that would be cars going by these locations. An image is collected and then that goes to the model through the edge handed over via the Kafka mirror and to the core and then stored in the database. And then this particular Grafana dashboard is pulling this data from the database. So what we want to do now is we want to create a failure and see how long it takes to recover. So to do that, I have over here a vSphere and we're in a cluster called perf one. And what we need to do is take down data center two. Our primary Postgres replica is on data center two. So we want to go ahead and take that down to show sort of the worst case of what could happen. It'll also impact Kafka and it will definitely impact the storage as well. So I'm powering down two nodes, not gonna power down the master at that location just so that we continue to have access to CLI. So now on the bottom right, we can see that Kafka cat has stopped. We're also starting to see some things happen in the storage here. The storage has gone into warm and we are seeing on the top that we have at least one node that's not ready. Soon we'll have two. It takes about 60 seconds. There we go. And we can see that down and again, the opposite, right? The Kafka's starting to recover. In the middle there with the crunchy Postgres replicas, now we have two masters. So it has already switched over and made the opposite zone now that is active, the master. And we can see that Kafka has slowly recovered now. If we look over to the left, we can see that we do have, we still have quorum on our moms because of our consensus site. So we've lost half our storage. We basically have switched replicas on Postgres. Kafka has totally recovered now with two of four replicas. Now, if we go back here, it doesn't look like we're quite, oh, there we are. So we've now recovered and we are now continuing to receive images from the edge. And this data is being pulled from the database. So by the time we got back here, it had already recovered. The recovery is somewhere in the range of a couple of minutes. It's quite quick. So just to prove though that everything really is down, we can look at the pods and see what their status is now. So that'll be Rook-Saf-ost. And we see that we have two of them in pending, which means they're not able to schedule because they need to stay on the zone that's currently down there. The anti-affinity is a strict anti-affinity for zone. The monitors again are used to keep track of the cluster. And again, we have two that are pending, three that are running. Important, one of them is on our consensus node or our arbiter zone at a third location. So it is keeping the quorum for the storage cluster. Let's just go ahead now and look at Postgres. So we'll look at the label here. And we have one in a terminating, one running. The one in terminating is on the data center two and the one running is on data center one. We saw in the terminal that switchover was very, very quick within seconds. And then lastly, if we just look at Kafka, we see that we have two running and those two are on data center one and the two on data center two are in a terminating state. The two that are working or using Opushift data foundation, stuff what we call RBD volumes and that is keeping the core Kafka cluster going the two replicas. So in summary, to run our smart city application, we have the images are changing, the counts are changing and we were able to recover. Thank you. Well, if you wanna share back your screen in that, we've had a couple of questions in the chat but mostly I think we've answered them in the chat but just maybe to reiterate someone was asking about the cast in 10 solution. Yeah, sorry, I was on mute. Yeah, I think I gave that to Dan. I gave him a link in the chat for cast in K10 which is definitely a vendor that we have worked with. So I was just gonna say these are some resources if you, Diane's gonna make the side set available and there is a read me for setting up the smart city demo. It's not currently configured to be highly available as I did, but certainly if you're interested reach out to me and I can give you the deltas for how to make it highly available. We also have a guide, a couple of guides on how to configure Opushift container storage for using an arbiter and how to recover. So and then Andrew, did you wanna talk about your resources? Oh yeah, sure thing. So the links there should basically bring you to the Postgres operator documentation. So the bullet point, second from the bottom there that should take you to our various documentation we have out there and then that final link is a repository where we have some examples. So we have basically a suite of examples out there that demonstrate different use cases, of how to get the Postgres QL operator up and running across a wide variety of different use cases. So it's a great place to start for anyone that's looking to experiment with the operator. So yeah, that's actually the section I used as well. Again, if you don't see something I know right away I didn't see how to make it the replica zone aware but if you know, feel free to reach out to one of us if you don't see something in those examples. Okay, well I think that's it Diane. There's one question that just came in and it was probably the question I asked you guys beforehand, cause I love the demo, the smart city demo cause you know, everybody's been through a tell booth, everybody's had their license plates scanned somewhere it's always interesting to see how it all works in the background too. But Dan's sort of asking whether or not Red Hat consulting or maybe the data foundations group has done any of this as a service set it up as a service yet? Not yet, but good question. We are looking to take this sort of, this is one of the demos that we're working on but we're looking at taking these forward working with the OpenShift Data Hub group and such. So yeah, again, reach out but yeah we're going to make these more available. Right now it's just a read me but looking to make them available in a more persistent way. All right, well I know a lot of work went into getting this up and running and making this demo work and making it HA so I really appreciate the time you guys took to do this and to walk us through this and as always crunchy, thanks for all the work that you've done, one of the earliest users of the Operator Framework and getting Postgres operators and helping us debug the early days of the Operator Framework so you have a place that's near and dear to my heart always. So we'll definitely have you back and we'll be doing more of this OpenShift Data Foundations talks in the upcoming future, so look for that and we'll post this video up on YouTube with all of these wonderful links and resources for you to get started. So please do reach out to Annette and to the good folks over at Crunchy and we will look forward to hearing about your use cases for these applications. So thanks again everybody and have a great week. All right, thank you. Take care. Thank you. Thanks Annette.