 My name is Babak Mozaffari. I'm a distinguished software engineer at Red Hat, and I lead a team within ecosystem engineering at Red Hat. So generally speaking, the scope of what we do in ecosystem engineering is work with partners, and my team is specifically responsible for database partners. So as a result of that, the road of work that Mike talked about in the keynote, engineering aspects of that falls within my team, and we have a team that's been working on that. And we've also had a couple of other efforts. I'm going to share a couple of slides, and all right. So yeah, so outside of Rota, we've done some other work, and our focus has been, from the point of view of the application developer who needs a database, somebody who's deploying workload on Kubernetes on OpenShift, and they want to use databases. And of course, Rota addresses one angle, which is that you're using a database, but you're choosing to use a hosted managed database, a cloud database, and that's one story. The other story is what if you actually want to run that database on OpenShift? And that's where there's quite a few scenarios. One scenario is that potentially from a somewhat naive Kubernetes perspective, you come in and you say, well, this is Kubernetes, right? So I'm just going to have one node running, and then if that node goes down for any reason at all, then that node is going to come back up on a different machine. The pod is going to just come back up. I only need one pod and nothing more than that. And so one of the challenges we have is there's a longstanding Kubernetes defect that prevents that from happening because there's a rewrite once mode that is typically used for storage, not necessarily, but most of the time. And that causes some problems. And we've looked at what those problems are and the solution is available with Red Hat to solve them. So that's the high availability with single node. And the solution to that problem, really what we're looking at is node remediation. So what happens is if you have one pod and that pod goes down, because of the access mode being read write once, Kubernetes can't really bring up that pod on a different machine because it has to verify first that that pod actually went down because the O at the end of read write once means only one machine can mount that pod or that claim to that storage for that pod. So there's a couple of solutions there. There's a machine health check operator and a poison pill operator with an open shift that work together essentially to identify this scenario and to do the recovery in the case of the poison pill operator by re-putting the machine and taking it out of commission, therefore letting Kubernetes know that it's okay to redeploy it. And there is a Medicaid project which extends that support essentially within the community effort to some scenarios like outside of IPI OpenShift that's not supported. But in reality, most of the people who are using production databases are not going to go on with a single pod deployment. So they're typically going to use database replication depending on the database they're using. So if somebody is using MongoDB, they're going to have a replica set. And if they're using let's say CockroachDB or Crunchy or anything else really, they're going to use for example, the Postgres replication that's available. What that means is their starting scenario is typically going to be multiple pods running typically on different OpenShift nodes. And those pods are going to talk to each other to replicate the data among each other. But even in those scenarios, what you're going to see most of the time is only a single one of those pods is a primary that's writable at any given time. And if that fails, one of the other pods takes over. And that's great, seemingly problem solved, except that how many failures can you support? So if you have a typical cluster of three nodes, one of them fails, that's fine. Two of them fails, you're in trouble because all of a sudden you don't have quorum. The cluster doesn't really know which one has failed and which one hasn't. And to not essentially risk to have data consistency or to have data consistency and not have data corruption, what you need to do is make sure you have a quorum and more than half of your nodes are up. So if you have three nodes, you need two of them to be up. So while this scenario gives you HA without downtime, you have multiple pods running and everything is great. If one of them fails, you need it to recover at some point so that you can sustain another failure. And again, that's what doesn't happen out of the box. And that's why again, you would be looking to some of those solutions we talked about like machine health check and poison pill operator. So this is some of the work we've been doing around high availability. We've done this with some validation of this. We've tested it with Cloud Coach DB, which is a cloud native database with MongoDB replica sets on OpenShift. And in both of these cases, we found the recovery issue is reproduced and the solution has been validated with the solutions I talked about. Another thing we've looked at is disaster recovery. Again, this is all work in progress, but we've been looking at disaster recovery with databases and specifically one of the scenarios we've been working on is Cloud Coach DB, which has the ability to set up across region mesh of Cloud Coach DB databases, essentially. So part of the work we've done on this is simplifying the installation and configuration using ACM and Submariner and you can deploy essentially a multicluster deployment of Cloud Coach DB across AWS, Kubernetes instances, GCP or on-prem OpenShift or a combination of all of these. And when you do this, in terms of disaster recovery, what you get is like the ideal golden active scenario where you don't have to rely on backup and restore anything goes down. You're gonna have the async replication behind the scenes having happened already without any data loss. Another thing we've been doing some work on is horizontal scaling. A lot of the positive things that Kubernetes gives you is bringing all those basically advantages of the cloud environment to either your own data center or if you're using it in a public cloud, giving you a lot of things out of the box like having clustering capability and so on. But sometimes if you look at that and you compare it to let's say a simple VM based, you have additional infrastructure, you have additional latency and you wanna make up for it somehow. And one of the ways you make up for it is by having horizontal scaling. And horizontal scaling, one of the scenarios we've been looking at in this case is with MongoDB. And while MongoDB has replica sets that gives you multiple pods. That means if you wanna read data, you have multiple pods to go to. And in terms of the throughput you get, you can get additional throughput by having those multiple pods. But again, remember I mentioned this earlier, there's only one primary node that you can write to. So because of that, if you have a specific use case where you're writing all the time, it's not that you're writing 20% of the time and reading 80% of the time. In a case like that, MongoDB has a sharding capability which lets you divide your data among multiple clusters based on a deterministic algorithm. And then what you do is you're always going to one cluster or the other, depending on the data you're dealing with. And based on that, you can essentially get multiple write nodes. And so that's another validation we've done and configuration and so on to see what kind of throughput benefit you can get, which is typically happening after you get three shards up after that you're getting horizontal scaling and you can add shards to get higher throughput.