 Well, hello everybody and welcome to yet another OpenShift Commons briefing today. I'm really pleased to have Annette Kluwit and Andrew LeCure, Andrews with Crunchy Data and Annette is the principal architect here at Red Hat and they're going to talk about database disaster recovery and how to make it easy and this comes out of the Smart Cities initiatives and I'm going to let them introduce themselves and their topic and they've got a great demo so I'm really interested to see that and how it all plays out and if you have questions ask them in the chat wherever you are on YouTube and BlueJeans or on Twitch and we'll relay them to the speakers and have some live Q&A at the end. So with that Annette take it away I know you guys have put a lot of effort into getting this one going so thank you very much for being here today. Yeah thank you Diane so first off if you hear a grandfather clock in the background that's me. I'm Annette Kluwit I'm with Red Hat in the Data Foundation business team and be going over a lot of things that support this environment including the Smart City demo. Andrew? Good morning yeah my name is Andrew LeCure I am the director of operator engineering at Crunchy Data. In this role I am responsible for the development and implementation of our Crunchy Postgres for Kubernetes product also known as PGO which I'm going to be going over it a little bit here this morning in the next few slides. Thanks Annette. Okay I have to start out just to give a bit of an overview about Crunchy so who is Crunchy Data? You know we're really at the end of the day the way I like to describe it you know our business here at Crunchy Data is Enterprise PostgresQL. You know if you're looking to deploy you know maybe if you currently deploy looking to expand your usage of Postgres or just venturing into Postgres for the first time and looking to make Postgres a big part of your enterprise you know we are the partner that can make that happen you know we do that in a variety of different ways you know first and foremost includes the expertise we bring to the table for everything and anything Postgres related and you can see that on the slide here you know we have contributors internally here within Crunchy that do contribute directly to PostgresQL and the Postgres source you know in addition to just a wealth of other vast knowledge and expertise we have within the organization but in addition to that knowledge you know we also bring to the table the tools and technologies needed to ensure you can get Postgres deployed in your environments in your enterprise environments according to your specific requirements and needs you know so this not only includes bringing things like you know certified versions of PostgresQL itself which can be deployed within your environments but also providing the tools and technologies needed to facilitate deploying PostgresQL within your environment you know in a variety of different environments where you might be deploying PostgresQL you know and that includes cloud environments which we're going to be touching a bit on today you know in Crunchy here we have a few products along those lines you know the slide there did mention our Crunchy bridge solution which is a fully managed you know PostgresQL cloud solution but today we're going to be talking about Crunchy Postgres for Kubernetes which is our Kubernetes based solution for managing Postgres so what is Crunchy Postgres for Kubernetes well we like to describe it as declarative declarative Postgres and really what this is it's an open source solution we also call it PGO that allows you to facilitate and streamline on the deployment of production ready PostgresQL clusters by simply declaring what you want your clusters to look like so within the solution we have what's called a declarative API which allows you to define you know simply via a spec exactly what you want your production grade PostgresQL clusters to look like next slide so to dig into this a bit you know what do we mean with a fully declarative solution you know you know really what that means is you know anything you might need to deploy to ensure your database architectures you know are fully production ready to go ready to meet your needs your requirements well we're talking about here is allowing you to have a convenient easy to easy to use way to define via what we call a spec exactly what you want your architecture to look like so for instance you know to touch on a couple topics that are going to be relevant to our conversation today if you want high availability or if you need disaster recovery for instance the solution we provide makes it as simple as defining exactly what you need for those elements within a specification and from there you know this solution crunchy Postgres for Kubernetes or PGO automatically takes those specs configures deploys those databases you know exactly how you have them defined you know so for instance say you want a connection pooler so your clients can connect to your database through a connection pooler like PG bouncer it's as simple as defining that within the spec and from there our operator solution will process that spec and deploy the components within your environment to make that possible you know and again I also touched on other functional areas you know this applies to as well so disaster recovery you know we're talking databases here right staple applications it's critical that your database is safe and protected so you want to ensure you can easily create backups and restore from those backups in the event of a failure and again that's what this solution is designed to do it's designed to make those things seamless and streamlined where you know you can easily define backup schedules and get that data redundancy you need I'm high availability is another big part of that architecture as well you know facilitating the ability to easily spin up replicas that way if your primary database goes down we can fail over keep your database available and ready to go and that's really what the solution is providing at the end of the day providing users with a convenient easy to use way of defining you know their database architectures across all pertinent functional areas whether that's high availability disaster recovery monitoring or whatever it might be you know and the operator takes control for that point and make sure that those you know database architectures are deployed and configured as needed according to your specific requirements next slide in it so a big part of this solution you know so it's one thing for us to be able to take a spec that you've declared you know you've defined what you want your database to look like and get it out there and running but that's only one part of the process right once it's out there and running we need to make sure it continues to be available you know and that your data is accessible as needed you know and that means ensuring that your database clusters can heal as needed so to just throw an example out there you know so so what this means is is this solution is constantly monitoring you know your database environments the databases it provisions to make sure they're healthy and can be configured according to your needs so for instance to provide an example say your cluster needs a connection pooler and that's what's shown on the slide here and for some reason someone out there you know chaos can exist in any cluster someone deletes that connection pooler that connection pooler deployment well what that means is the operator is going to immediately detect that recreate that deployment and make sure your connection pooler is available and ready to use you know immediately after that occurred you know and that's just a theme within our solution in general you know once you declare your cluster and the operator builds you know and deploys a database architecture according to that specification it's going to constantly monitor it make sure it stays healthy and make sure all the the the Kubernetes resources are in place as needed to ensure that database remains healthy and continues to serve your your database needs next slide in addition you know that's declared of approach to defining your PostgreSQL clusters you know also enables get ops workflows because what that means at the end of the day you're defining your databases using simple YAML specifications that could easily be stored in any version control system that way you know because you know as we know database requirements differ across different environments different needs so your databases for development versus QA versus production you know might all be a bit different but by giving this declarative method of defining your databases you can easily store your database configurations in a version control system where they can then be easily integrated into your continuous integration or continuous deployment pipelines you know to easily provision the databases you need at any stage of your software development lifecycle you know to give you the databases you require and and ensure the data is accessible that you need for your users and applications next slide so another big piece of this solution you know is ensuring you know that your clusters continue can be updated without interruption you know so again we said you know it's one thing to get it to get a database deployed that is out there and running but that's not going to be the end all be all for your database right you're going to need to reconfigure it you're going to need to tune it maybe you deployed it without a connection pooler and now you decided you want one this solution is designed to make that all seamless so you know as you need to evolve your database your database architectures your deployments over the course of a cluster's lifetime you could continuously do that without interruption you know and through a rolling update strategy that we've implemented within the operator solution this means you can safely make changes to your clusters and we will safely roll out those changes to all your instances whether it's PostgreSQL configuration changes changes to the architecture as whole you know our solution is designed to safely and cleanly roll those changes out in a way that avoids disruptions to your data and ensures your users are still able to access the data the access the data they need next slide so the summit some these bits up so you know what what are we bringing together here at the end of the day you know when we talk of you know talk about the crunchy post postgres for kubernetes you know really this solution tackles you know the critical functional areas you need to ensure you have a production ready database system in kubernetes you know first and foremost is high availability right and we've touched on this a bit but at the end of the day you know you want to make sure your data is available when you need it and where you you know and the postgres operator solution makes that seamless so you know the ability to add additional replicas to your clusters you know the ability to fail over to replicas in the event of a failure and make sure your data is always available as needed you know that is implemented in a way that is seamless and basically transparent to the end user you know so again basically you're defining within our solution you know the elements you need I want high availability you know I want multiple replicas I want redundancy you know but we take that we take the the process from there and wire that all up to make it a make it happen you know we make sure those replicas are properly replicating from you know the primary database we make sure they're configured to be able to fail over to in the event that there is a failure you know and your data needs to remain accessible you know so that's a big part of our solution you know because we're talking data here right and first and foremost you know we need to make sure your data is available when you need it and where you need it and the high availability high availability parts of our architecture are what make sure that happen but disaster recovery is another important piece of that too you know and this is another area the the operator greatly facilitates your ability to ensure you have a strong disaster recovery solution for your database because at the end of the day we need to protect against disasters right you know chaos can occur things can happen and at the end of the day you know data is paramount you want to make sure your data is safe it's available you know that way if anything goes wrong you can properly recover and get back to the place you need to be and crunchy postgres for kubernetes facilitates that you know whether it's making it easy to schedule you know backup so you know your database is you know effectively backed up using proper backup schedules whether it's making sure your backups not only exist in you know or making sure there's redundancy in your backups themselves you know that way if you lose an entire environment you don't lose your backups along with it and really just all at the end of the day making sure those backups are are properly taken and could have effectively be used to recover your cluster in the event of a disaster which is what it's all about so another important part of the architecture here too is monitoring right because we want to be able to detect problems before they manifest themselves in a big way right and that takes keeping an eye on the cluster one of the things crunchy postgres for kubernetes provides is an effective monitoring solution you know it's a monitoring stack that once your database is deployed out there you can continue to keep an eye on the health of your cluster and continue to tweak and evolve it according to your needs and this also means identifying problems like i said before they manifest so you can get in front of issues with your data database and ensure your data again is accessible as needed security is another big part of our architecture too you know not only is it you know important that your data is accessible but it's important that access to that data is secure right and it's proper properly locked down and our operator solution really builds in security from the ground up you know with all elements of our architecture we're making sure all access to your data is done in a secure and controlled manner whether that means enabling tls by default using certificate based authentication scram passwords or whatever it might be you know our our solution is designed to be secure by default and again when we're looking at deploying production grade clusters you know that is important that's a critical part of our architecture here you know and the last piece here i want to mention is convenience right because you know at the end of the day we want to make it easy for you to be able to manage your databases in these environments right so say you have a production database out there that you want to clone into a dev environment so you can do a bit of testing or troubleshooting again that is seamlessly done with crunchy postgres for kubernetes or if you want to customize postgresql or your architecture in any way possible you know again we make it easy to do through this this solution next slide so to sum up here you know again we've touched on a few of these things already but i'm crunchy postgres for kubernetes at the end of the day it's a fully declarative and get ops ready solution you know we make it so you can easily define the architectures the other database architectures you need you know in a fully declarative way and we take on the management of your database there to to create a fully seamless experience for deploying you know production grade postgresql clusters you know and that makes it you know that includes making it easy to get started too you know we want to make it so as soon as you spin up a new database not only is that database ready to use but it's ready to be used by applications and consumers and end users so you know we do things like automatically provisioning secrets with credentials that you need so you can just wire those things right into your applications without ever having to look up a username or a password or create postgres accounts again the idea just making it as seamless as possible to get up and running so your end users applications whatever they might be can easily get up and running using this solution you know an easy to upgrade that's another big part of this too you know as kubernetes and openshift continue to evolve you know we're evolving along with it you know and we've baked in the the functionality to ensure that process is as seamless as possible for end users you know so you can easily upgrade your crunchy postgres for kubernetes solution you know as kubernetes openshift continue to evolve as well get all the great new benefits that come with it you know with his minimal of effort as required or as possible and at the end of the day you know when we sum all these things together what does it really give you you know it's production grade enterprise ready postgres ql right you know and it's ensuring we're you know allowing you to deploy the postgres clusters that meet the specific needs of your enterprise your specific requirements their secure lockdown you know and make sure you have your data data available when you when you need it and that's really what it's all about at the end of the day so um so yeah i think that pretty much sums it up in uh all right thanks andrew thank you yeah so um i'll speak a little bit more about what andrew was talking about in terms of deploying the um the latest version of crunchy data postgres before we do that though let's speak about the storage so the group i min and red hat openshift our data foundation this is um one of our main offerings is openshift data foundation used to be called openshift container storage and i want to point out here that this is very much built from upstream open surf projects one of the main projects um that does the orchestration that you may have heard about just graduated from cncf as um is rook and rook um is orchestrating the entire deployment of the other um very mature upstream project stuff and the two of them together uh can be deployed in a kubernetes um or or openshift environment um and it is totally is as andrew is saying managed with within that environment and we'll take a look a little bit more later in the demo at some of the components but um you know two projects and being very close to openshift data foundation as a product i can tell you that everything that we do downstream um is is basically created and tested upstream before we pull it downstream so um you know good good use of open source and then the the operator framework which um you know has become really prevalent and makes it really really easy to deploy um in the case of of what i'm going to show you in the demo deploying kafka um deploying postgres the crunchy deploying openshift data foundation all this is is is operator framework and um has has also matured i'd say you know i mean operators have been discussed for the last three or four years but now i would say most most applications that are serious about being in kubernetes uh do do have an operator-based deployment and not only is uh an operator just to speak about it for a minute not only is the operator deploying but an operator is reconciling back to the state that you want which is a very powerful um concept in terms of of managing your applications so again just to to look at the the the components we already spoke about rook and stuff um the the other one that we've included with openshift data foundation or used to be openshift container storage is an an object gateway that has some nice features of being able to sort of bridge two different environments say azure and aws and have be able to mirror objects between two different clouds so all those three are going together uh to to form the storage solution that uh the postgres is is going to use as well as um i you'll see in the demo that we're we're also have kafka so the so the the way that we use the operator framework um both for administrators and and users is there's more than one operator so we have a what we call an ocs meta operator and that operator is going to sort of bootstrap the other two so the the rook stuff is the one that i spoke about is a upstream effort and then newba newba core is also upstream but the ocs operator is sort of you know again doing the reconciling doing the management and it's it's always watching if if the ocs operator is not running and ready then something has gone astray and you need to to look at the other operators to figure out what has not deployed correctly so it's a it's a multi-level management but works very nicely for getting all of the orchestration done and then maintaining and and also is as andrew spoke about being able to upgrade which is one of the features of of using an operator so getting to the the topic of this particular session if you look at the disaster recovery continuum the the way we're seeing it at at red hat is that you you starting on the top right backup and restore whether it be outside of open-shift and kubernetes has been a solution for quite a long time and the the question is always you know how much do you back up how often do you back up and then from a restore point of view how long does it take to restore and can you restore you know back to a known good state so it's it in terms of kubernetes and and what open-shift data foundation has done we've integrated highly with traditional backup vendors say spectrum protect plus or net backup or trillio caston and we've done that via the the csi interface which allows you to do snapshot and clones so that is quite mature we release the capability to do that about a year ago supporting the csi standard and you know we we can provide more information about that but but as i go down we have regional dr we're calling that essentially this is a multi cluster solution and then metro dr so to have a little bit more context here as i said for backup and restore this is done using open-shift data foundation the the storage solution and we're doing it with snapshots and clones so if i create a volume mount it to an application and then i want to have a crash consistent snapshot i can use the snapshot capability either either directly through the ui or yaml to to back it up or i can hook it in to like i said the traditional backup vendors are all pretty much now having a csi capability to both initiate the snapshots as well as to use those snapshots to create clones for restore and we have a whole list in red hat of viasv's that we've done particular testing and created video solution guides and others that to to see how you do that so regional dr is a solution sort of in in development we have some pieces of it now at red hat but it's really the idea that we have multi cluster so the requirement here is we have you know two clusters uh kubernetes or open-shift and we're going to asynchronously replicate the data for the persistent volumes between those two sites now there's a lot of stitching and a lot of orchestration that goes into that because it's not just the data replication at the persistent level it's also at the kubernetes object level or resource level that you need to do replication some ideas you know how you could do that with with upstream is using like the valero backup and and restore apis but this is a solution that we we see being available probably by the end of the year as a you know sort of through the advanced cluster management which will do the orchestration for creating multi-site and then be able to also orchestrate the the data replication and the kubernetes resources to be to be replicated from one cluster to another so getting to the left here down there is sort of a from a storage perspective is a synchronous solution so that's why we call it metro it has to be within a certain latency or distance i usually say a couple hundred miles no more than a couple hundred miles between the sites and the other main thing here is that we have this arbiter or consensus location so the storage of openshift data foundation backed by suff needs to have monitors which keep track of the cluster and they need to have consensus we also know or that ecd which is used for the control plane of open shift needs consensus something like zookeeper which is currently used with Kafka needs consensus so you need a site that you can put a consensus node at and be able to keep forum even in the case of what i'm calling a data site failure so this is a solution that is available now with the latest versions of openshift data foundation and you know it it is a pattern which you do find where sites are a couple hundred miles apart and the consensus location can be probably twice as far away maybe 500 miles so it you know it's a it is something you can do today to have resiliency and if we look at sort of what is the recovery time objective and the and the rpo recovery point objective which is more about you know did i lose any data it is possible to get a recovery point objective of near zero depending on how how the applications you have really it's about the applications because you know persistent storage is great and it can recover but if you if the application doesn't recover doesn't matter that i have the storage so these are just some of the sort of the ways that you could recover there is one issue which it's not really a red hat or it's it's a community issue which is if you are having an application pod mounting an rwo volume and you fail to know that it's on the cube it loses status of that volume it thinks it's still connected and it will not release that volume so you get a multi attach error and that is listed in a case there workaround right now requires a force to delete that pod to to allow it to recreate on an active an active site so just a little more sort of information about how this would lay out so we see that on the left hand side we have the odf data replicas and we have the the monitors so at a minimum each site would have in this case two monitors and then you would have at the third site your your fifth monitor your fifth monitor this is for the storage same thing for scd you could have a master at each site and then you have their third site and the the monitor pod actually can use a toleration to schedule on the master even if it's unschedulable so the other thing we have going here is we have the you know the the usual open shift and for services those would be placed in in each data center so if we if we lose a data center um we see here that because we have two data replicas left uh the way that the the storage is configured for this arbiter mode is you every volume will have four replicas as we showed in the prior side but every volume only needs two replicas and monitor quorum to continue to be able to serve reads and writes so assuming the application survives then you're going to be able to continue to to take inbound connections at a very rapid recovery again this is based on how your application recovers but the storage will recover basically almost immediately one way to um make your application able to be what i call zone aware is a it's a relatively new capability i think it came out in kubernetes 1.19 but it's been available since open shift 4.6 is called topology spread constraints and it does require that you have replicas if you only had you know one instance of of your application it wouldn't really matter but as long as you have two replicas you can use the topology kubernetes zone label to to spread your your replicas for your application among the zone that can be a hard affinity so that you know you in some cases you don't want your your replicas to to fail over to the active zone because the next time if you fail you you know you could lose a zone and all your replicas would be on the same zone so this is a really powerful concept in the demo you'll see where i've applied it to um to kafka and we also um post crunchy post-gross uses a very similar capability the application that we're going to take a look at um to sort of challenge ourselves here is uh called smart city green city and some of the colleagues in my group um in uh open shift foundation um created this uh demo and it um has a lot of pieces to it so it makes it sort challenging to recover but what we have here is um if it was deployed the demo doesn't have two different open shifts or multiple open shifts um but you have an edge environment which is basically collecting whatever data you're trying to collect whether that be image data or other data then you run it through a model in this case a license plate retrieval model and then you're going to basically put it on to the edge kafka bus go through kafka mere maker come down to the core kafka bus and then various um applications are going to pull from that um and then kafka consumers are going to basically take the messages and then write them into the crunchy post-gross database and since i don't really mention in the demo i just want to say that um i deployed this using the latest crunchy data post-gross version version 5.0 it has a new um i think it's a new custom resource post-gross cluster and i found it um extremely i mean i'm not making a pitch more than what anner did but i did find it extremely easy to do uh to create a replica as well as um make the the replica placement um have a zone anti-affinity and um it was a really pretty easy experience so this is using the latest post-gross or crunchy post-gross version 5.0 the other thing if you've used crunchy post-gross before there's no longer um you don't have to install a specific pgo client for for doing the kinds of things that um andrew went over you can use a kubectl or or open-shift cli commands to do to do everything so following this second stage so we start with you know getting the data at the edge um move it through the mirror maker maybe do some some special things with the data before you um pass it on and then there's there's some object buckets involved where data is stored and pulled from and then eventually uh we're going to do some calculations and um you know uh similar to i i live in california i don't know if but anyway all of the california total locations as you go in and out of san francisco area are now doing this and um they they they basically recognize your get your license plate and then i i have an account and they charge me for going past you no longer stop at any uh toll toll biz so for this situation we're going to um challenge a few of our applications crunchy data is um like i said it has a primary and a replica and each one of them and we'll see in the demo how they're made to stay in their particular zone and we're going to fail one and see um see that it switches to primary the other thing going to do is uh kafka so kafka is part of the solution if kafka kafka um is not able to recover then it wouldn't really matter because it's sort of you know has the messages and it's providing all of the other apps with what they need so the way that uh you can make kafka currently is you you need to have zookeeper has to have quorum so again using a toleration i placed one of the zookeeper pods onto the arbiter or consensus location on the master so that zookeeper quorum will be kept so if we lose two of the kafka replicas two of the zookeeper replicas we've still got um and and within the kafka cluster config you can set a custom kind of attribute that allows you to say that kafka can continue to operate with two replicas all right so i think we're it's demo time the demo today is going to be for an application that is called smart city and as you can see it has a lot of different apps that make it up the main components that we're going to look at being resilient is the kafka core cluster which is in the bottom square the postgres database using crunchy and the storage which is not shown here the storage from open shift data foundation so on the top we have what would be called edge locations those would actually be separate open shift clusters but in the case of this demo we're going to emulate that with what we call a safe node so everything on the top will be on a single node that doesn't fail and on the bottom this is where we're going to be failing open shift nodes to see what happens so if we go to the open shift cluster and the the console we can see that we have quite a few operators starting with the red hat amq streams that's going to give us kafka and we again we have two kafka clusters edge and core we also are going to use grafana to look at the data and be able to see that the application is running local storage is used by open shift container storage to create the storage cluster using sef open data operator is where we get the we launch the grafana as well as a superset which is going to be used for a dashboard that data scientists would use if we want to look at further the components we can go ahead and start with the nodes we have three masters and we have five worker nodes as I said the one on the bottom is the safe node so it's not going to be failed if we wanted to look at how they're divided they're divided with a topology label and this topology label is also with all of the components used to be able to be zone aware and if we look at the the arbiter this would be this node would be at a location a third site so that it could actually act as an arbiter and reach consensus for both at cd we'll see zookeeper as well as the the storage the monitors need consensus so this is our consensus or arbiter node if we then look at how the data nodes are divided we have a label for them called data center one data center one has two worker nodes in it the kafka core pods are here the storage pods are here and then the last one would be the data center two and again it's going to be the using the topology label we define a second data center so we basically have three zones an arbiter for consensus and then two data centers that would reflect different sites that were within 100 to 200 miles apart we also can take a look at the pods and for the pods let's start with the edge kafka so here we have three that are going to be our edge cluster and in this case as i said they're all on the same node this is a safe node that is not going to be powered off and this is you know emulating another open shift cluster that would be collecting essentially the the data putting it on to the kafka bus and then getting it via mirror maker over to the core if we now look at the core so these are the core each one of these is using a open shift data foundation volume for its storage and if we were to look over here to the right again the nodes are all different so these are the four nodes that represent data center one and data center two and there's a kafka pod spread on each one the way that that is done to make sure that it's just not you know arbitrary how they're done it is done with topology spread so this was added to the kafka cluster this is something you can add um and the operator will use that to to schedule so we can see that we're using the topology kubernetes io zone and the top we're using it to make sure that that there are pods on both zones and then the second where it uses the kubernetes io hostname that is to spread if there's more than one pod per zone then it will spread the pods among the host so that's exactly what we saw we saw four kafka pods on four different hosts so we know now that the kafka is spread there let's go ahead and look at the storage so for that we're going to use label and and we'll start with the actual storage devices so there's going to again similar to kafka there'll be one per uh node and this is again using that topology label and the way that the storage works is it is um configured to for every volume to every volume is created with four replicas but the minimum size that a volume can continue to um support reads and writes is two so we can lose two of of the four here and the storage will still operate totally fine the last thing i want to look at here is postgres so we have two replicas here this third one is for backup but these two replicas are again using topology spread and we can see here they're on two different nodes if we inspect the how the the placement is being done we can look into the animal here here's uh how the topology spread is done um first we're we're placing it on a node that's in either data center one or data center two and then second we're using the um the pod anti affinity based on the topology label so now let's see what happens when we have a failure so we've got a couple of terminal windows here and i want to explain what what you're looking at um top terminal window is just showing a view of the nodes right now um we have three masters the next two workers are in data center one by label and the next two are in data center two again um and master zero the very first one is the arbiter or consensus node and it would be placed at a third site um it can be at a higher um what we'd say latency or distance so it could be maybe you know 500 miles away maybe more and then the two um sites that have the the data center one data center two they need to be you know not more than maybe a couple hundred miles apart and then the last node is my safe node uh right under that is a view of the two um crunchy post crust pods and they are uh because of topology spread our pod anti affinity they are in each a different uh data center or zone and one currently is master the one that is in data center two and the one in data center one is a replica down at the left is a view of the storage um this is actually what we call stuff status uh it's looking very good right now all of our monitors are in quorum and all of our storage is up on the right is a Kafka cat which is um showing us the message on the topic called lpr license plate retrieval and then we can see that right now um we're we're definitely continuing to um get that so what what does that look like from a dashboard point of view this is what the Grafana dashboard is looking like and this is obviously a demo and it's not really London right now but um if you notice uh the car images will be changing that would be you know cars going by these locations an image is collected and then that goes to the model um through the edge handed over via the Kafka mirror and to the core and then um stored in the database and then this um this particular Grafana dashboard is pulling this data from the database so what we want to do now is we want to create a failure and see how long it takes to recover so to do that um I have over here a vSphere and we're in a cluster called perf one and what we need to do is take down data center two our primary postgres replica is on data center two so we want to go ahead and take that down to show you know sort of the worst case of what could happen it'll also impact Kafka and it will definitely impact the storage as well so I'm powering down two nodes not going to power down the master at that location just so that we continue to have access to CLI so now on the bottom right we can see that Kafka cat has stopped um we're also starting to see some things happen in the storage here the storage has gone into warm and we are seeing on the top um that we have at least one node that's not ready soon we'll have two it takes about 60 seconds there we go and we can see that down and again the opposite right the Kafka is starting to recover in the middle there with the the the crunchy postgres replicas now we have two masters so it has already switched over and made the the opposite zone now that is active the master and we can see that Kafka has totally recovered now if we look over to the left we can see that we do have we still have quorum on our moms because of our consensus site so we've lost half our storage we basically have switched replicas on postgres because Kafka has totally recovered now with two of four replicas now if we go back here um it doesn't look like we're quite up there we are so we we've now recovered and we are now um continuing to receive images from the edge and this data is being pulled from the database so by the time we got back here it had already recovered the recovery is somewhere in the range of a couple minutes it's um it's quite quick so just to prove though that everything really is down we can look at the the pods and see what their status is now so that'll be rook-saf-ost and we see that we have two of them in pending which means they're they're not able to schedule because they need to stay on the the zone that's currently down there um the the anti affinity is a strict anti affinity for zone the monitors again are used to keep track of the cluster and again we have two that are pending three that are running important one of them is on our consensus node or arbiter zone at a third location so it is it is keeping the quorum for the the storage cluster let's just go ahead now and look at postgres so we'll look at the the label here and we have one um in a terminating one running the one in terminating is on the data center two and the one running is on data center one we saw uh in the terminal that switchover was very very quick within seconds and then lastly if we just look at kafka we see that we have two running and those two are on data center one and the two on data center two are in a terminating state um the two that are they're working are using um Openshift data foundation stuff uh what we call rbd volumes and um that is keeping the the core kafka cluster going the the two replicas so in a summary to run our smart city application we have um the the images are changing the counts are changing and we were able to recover thank you so i was just going to say these are some resources if you um Diane's going to make the side set available and um there is a read me for setting up uh the smart city demo it's not currently configured to be highly available as i did but certainly if you're interested reach out to me and i can give you the the deltas for how to make it highly available we also have uh a a guide um a couple guides on how to configure Openshift container storage um for uh using an arbiter and um how to recover so and then uh andrew did you want to talk about your resources uh yeah sure thing so the the links there should basically bring bring you to the the postgres operator um documentation so you know the the bullet point second from the bottom there that should take it take you to our various documentation we have out there and then that final link is a repository where we have some examples so we have basically a suite of examples out there that demonstrate different use cases um you know of how to get the postgresql operator up and running across a wide variety of different use cases so it's a great place to start for anyone that's looking to experiment with the operator so yeah that's that's actually the section i used as well again if you don't see something um i i know right away i didn't see how to make it uh the replica zone aware but if you know feel free to reach out to one of us if um you don't see something in those examples okay well i think that's it diane there's one question that just came in um and i and it was probably the question i asked you guys beforehand because i love the demo the smart city demo because you know everybody's been through a tel booth everybody's had their their license plates scanned somewhere it's always interesting to see how it it all works in the background too but um dan's sort of asking um whether or not red hat consulting um or maybe the data foundations group has has done any of this um as a service set it up as a service yet not yet but um good question we we are looking to take uh this sort of this this is one of the demos that we're working on but we're looking at taking these forward working with um the open shift data hub group and and such so yeah again um reach out but yeah we're we're we're going to make these more available right now it's just to read me but um but looking to make them available in a in a more persistent way all right well i know a lot of work went into getting this um up and running and making this demo work and making it ha so i really appreciate the time you guys took to do this and to walk us through this and as always crunchy thanks for all the work that you've done you've won one of the earliest users of the operator framework and getting postgres operators and helping us debug the early days of the operator framework so you have a place that's near and dear to my heart always um so we'll definitely have you back and we'll be doing more of this um data open source open shift data foundations um talks um in the upcoming coming future so look for that and we'll post this video up on youtube with all of these wonderful links and resources for you to get started so please do reach out to annette and um to the good folks over at crunchy and we will look forward to hearing um about your use cases for this um these applications so thanks again everybody and have a great week all right thank you take care thank you thanks a night