 Okay, so as a PostgreSQL user, have you ever been wondering about how to make the world the single point of failure of your infrastructure? So whether it's private, public, hybrid, multi-cloud, you're probably wondering which one to use. And then how can you mitigate the risk of vendor lock-in? So hi everyone, I'm Gabriele Bartolini. And today I'll try and help you answer these questions by sharing with you the unparalleled range of possibilities and freedom that the open source trio made of Kubernetes, Postgres, and CloudNet FPG offer. So before we dive in, allow me to introduce myself. I'm a seasoned open source programmer and former entrepreneur. And I'm deeply passionate about databases and data warehousing. My journey with Postgres started in the early 2000. And now I'm a vice president of Cloud Native at EDB, one of the major contributors of the open source Postgres project. It's data on Kubernetes ambassador. So I'm really happy to be here today. And I'm proud to advocate for the seamless integration of stateful workloads in Kubernetes. So my mission is to spread the message that running Postgres in Kubernetes is not only efficient, but often superior to traditional VMs or bare metal setups. And then I'm all about lean and DevOps. So I've been practicing these two disciplines for many, many years. And they're actually the reason why I'm into Kubernetes. And who knows, they might be the reason to go away from Kubernetes one day. And then I'm one of the people behind two open source projects in the Postgres ecosystem. One is Barman, which is a popular open source project. And the other one is Cloud Native PG, the topic for today. If you want to talk with me, I'm available until Friday evening. So we've got a booth, come and join me and also the other developers of Cloud Native PG, if you have questions. And I'm also speaking about vertical scalability of Postgres databases in Kubernetes with Gary Singh from Google Cloud Thursday afternoon. So today agenda includes exploring the potential of Cloud Native PG and Kubernetes when managing Postgres high availability and disaster recovery. And then we'll delve into recommended architectures and strategy followed by the conclusions of this presentation. So before we begin, I don't wanna waste too much time, but it's just to remind, I think everyone knows what Postgres is. Postgres was also voted database of the year very recently despite its age, it keeps improving and it keeps renovating itself. And PG Vector, for example, or other extensions around Victoria databases are one of the areas that are growing more and more in Postgres these days. If you want to know more, I'm suggesting now two blog articles. This is about the microservice database pattern. If you're interested, it also explains why and all the journey we did that led us here. With this project. And the other one is about the recommended architectures for Postgres in Kubernetes. It's a blog that I wrote for the CNCF website. So let's start with Postgres and the high availability. So Cloud Native PG is basically a level five Kubernetes operator that seamlessly manages Postgres clusters for high availability primarily throughout the entire operational life cycle from day zero to day two. It's production ready and it's widely embraced by top tier database as a service solutions like BigAnimal from EDB, IBM CloudPak, Google Cloud and Tenbo. As an open source project, it's available under the Apache license and it was originated in 2019 by my team when I was part of second quadrant which was later acquired by EDB. And Cloud Native PG made a significant leap in 2022 in May before CubeCon Valencia. When EDB contributed the project to an open source and vendor neutral community that is openly governed the Cloud Native PG community, that you are more than welcome to join. And Cloud Native PG was acknowledged the most popular operator for Postgres in 2023 according to a time scale survey. And it's rapidly gaining traction with over 3,000 stars in less than two years in GitHub. Anyway, given our time constraints, I'll refrain from covering basic instructions and commands today. And for deeper insights, please find the documentation and all the information available from the website. And again, if you want, stop me. I think it's more important we use these 20 some minutes to talk, to trigger questions and to talk about possibilities, okay? So we will talk about building blocks that will give you unprecedented, in my opinion, possibilities around Postgres. So I will briefly mention these four pillars. They actually been defined by Patrick McFadden and Jeff Capenter in their book about Cloud Native Databases. So Cloud Native PG leverages the Kubernetes API. So we kind of extend the Kubernetes controller teaching how to manage a Postgres cluster through the operator pattern, okay? So then the second one is declarative configuration. That through it, we can deploy scale and maintain databases that sell field and also implement infrastructure as code practices. Third one is about observability. We have a native Prometheus exporter. We log to standard output in JSON natively so you can pretty much integrate it with everything. We have a Grafana dashboard and so on. Finally, the security by default paradigm. Again, read the article about the microservice database in which I explain this kind of security by default concept where the database is owned by the application developer, not the administrators. So developers own the database and they can put it in their pipelines. And then with security by default, we start from actual code writing. Then we also, how we build the containers, how we scan the images, and then also in the container itself with Postgres majors and also Kubernetes majors. For example, we have MTLS by default. So we, and we actually advocate for certificate-based authentication, okay? Everything I'll show you today is achieved declaratively unless I say that. There's only one thing that at the moment needs to be done manually and we'll see it. This is how we implement a Postgres cluster in Kubernetes with Clonet-FPG. I think the simplicity is what strikes out here. This pretty much highlights the convention of a configuration paradigm that we implement with our declarative configuration approach. And so you don't have to specify all the parameters, but you can actually modify all of them. Because we make opinionated decisions about the defaults. So, for example, in this YAML file here, we request Clonet-FPG to create one primary and two replicas, so three instances, and one of them to be synchronous. This means that every time the application starts a transaction and writes commit, the primary doesn't return to the application until the commit is written on disk on at least another standby. We can also change these and ensure that not only that it's written, but it's also replayed on the standby so that we can pretty much perform a read-only query later and it's consistent in the entire cluster. That means you slow down the entire write process, but you have pretty much less probability of data loss and faster RTO. Anyway, all of these is configurable, all of these is Postgres, okay? So that's the beauty, I think, of open source in general, whether it's Kubernetes or Postgres, is that it's all about you, all about us. So this is what happens under the hood. So suppose you've got a Kubernetes cluster with three availability zones and our worker nodes. I forgot to say that here, I request to place the instances on nodes where we've got the workload Postgres label, okay? So basically here we have three worker nodes with the Postgres label. So the operator pretty much places the instances only on those workers. We can choose different affinity settings. It's all there. Then we start writing the volumes. That's where we start. The volumes are the most important part in Postgres. We start with the PGdata, which is the Postgresdata where all the database files are stored and the whole files are the transaction files. So that's how we achieve data durability with Postgres. Then the primary started and the operator creates a read write service automatically. So that your applications or your AI workloads can connect directly to it. Then it automatically creates clones there, the primary and creates the first replica, which is synchronous and the second one. All in streaming replication. So we don't use the storage replication. We just rely on Postgres replication which can be controlled at transaction level. And then we create the read-only service. So if you want to perform read operations, you can use the stand-bys. So let's see what happens in case of failover. The whole, for example, the worker, know where the primary is as a failure. Kubernetes immediately detects that. The operator stops the read write service. So we've got the downtime. This operation is very fast. Normally a few seconds, you can try it yourself. And the service, the synchronous stand-by is promoted and the service is updated. And then when the worker node comes back again, our instance manager actually stops that. So that's how we prevent split brains from happening and says, you think you are the primary but you're not. So it demotes itself and re-synchronize it as a stand-by and the service is updated. Let's talk about backup and recovery. So by the way, if you want to hear the whole story, that's my talk with Michelao from Google in which we cover disaster recovery of very large databases with Postgres in Chicago. And we basically recovered a 4.5 terabyte database in two minutes, thanks to volume snapshots. There's the whole story, but briefly, continuous backup is achieved in two ways with the wall archive. So we basically copy the wall files in another location. At the moment, we only support object stores. And by default, they are stored every five minutes in the wall archive. Okay, so that means that your RPO is maximum five minutes by default if you've got backup in place. The other part are physical-based backups. So you can take physical-based backups which is a Postgres technology. From either the primary or the stand-by, you can choose the target by default is the stand-by. They can be scheduled or on demand and they can be on object stores and they're only hot, so online backups, or on Kubernetes volume snapshots and they can be both hot or cold, so meaning that they are consistent and they don't require wall files to be restored. And you can exploit the storage class capabilities in terms of transparent incremental and differential backup. We are also working on an interface through which you can pretty much write your own backup scripts, backup tools, and extend it. This is how Continuous Backup works. You've got a cluster. So it's done a cluster level, not an instance level. We copy the wall files in the wall archive and then we take base backups to build a catalog of backups. This is how we have Continuous Backup. Recovery is essentially a bootstrap method. So we copy the physical base backup somewhere and then we start reapplying the redo logs from the wall files. And until we reach a target. The target can be the full recovery, so until the end. And if we do that, that's actually, that's also the foundation of Continuous Replication, Continuous Recovery. So you can create what we call replica clusters that could be pretty much aligned or even delayed. Or if you selected a target, you can get point-in-time recovery. When the target is reached, by default, Postgres promotes itself. So it becomes another cluster. This cluster you can use it also for reporting. So for example, every day you can recreate a Postgres cluster just for development or reporting and destroy it at the end of the day. On the same technology, we build the replica cluster. The replica cluster is primarily used for DR, but also for recovery, for reporting. And it's essentially the same technology instead of promoting the Postgres cluster, we keep it in Continuous Recovery. And we can perform read-only queries on those servers. So think about that, you've got this replica cluster and another region is continuously replicating and I can promote it if needed. So we've seen now a single Kubernetes cluster, let's go beyond the single Kubernetes cluster which is our single point of failure. We've got Postgres with its backup. We can use in a different cluster simply the wall files, the wall archive. So we think about that, we create a replica cluster in another Kubernetes cluster from a backup and then we start replaying the wall files from the object store. We don't even need the connection between the two servers just by using wall files. It's pretty much lagging five minutes by default without doing anything. If you want, the other interesting thing is that if you want, you can set up the backup in the other region and you have two independent backups architecturally in place. And if you want to reduce the RPO, you create a streaming replication connection between the two clusters. So that's simple, right? So all building blocks, you build on top, okay? You don't have to get there immediately but you can get there. Starting from this, this is normally the development cluster. You can do it like you do in production with three instances or you can even use one single instance and remember to disable pod disruption budgets. Otherwise, the node where these instances placed will not be drained, okay? And then if you want, you can create your continuous backup infrastructure. So this is how it looks, a production cluster in a Kubernetes cluster with at least three availability zone that gives you very little RTO. So in case of failure of the primary, in a few seconds you're up without doing anything. And then at the same time gives you RPO five minutes which with the cloud native PG interface we are aiming to reduce to zero. So all you have to do, you have the wall archive. The wall archive can also be used as a fallback in case the streaming replication between the primary and the standby goes down temporarily. You have a dual channel for resilience. And then you build your catalog of base backups. So we've seen one single cluster, all good. You don't have to do anything. Kubernetes can do self-feeling and HA with cloud native PG without you doing anything except monitoring and receiving alerts. Let's extend the architecture on two Kubernetes clusters. Normally, we can also think about them in terms of regions. And we have pretty much an identical Kubernetes cluster in another region, let's say. We use the base backup to create the PVC and the wall archive to build what we call designated primary. It's basically a standby that is ready to be promoted in case the first data center, the first Kubernetes cluster goes down. Then you create the local wall archive and the catalog of backups. So you are already ready in case there's a disaster to assume the role of primary cluster. And then if you want, you can even immediately or later when it's promoted, create the replicas. That's entirely up to you. And if you want to reduce the RPO, you can set up the streaming connection. And this is the only manual thing you need to do at the moment and probably, let's not say manual control, because what we are aiming to do is to define a declarative way to perform this operation across clusters, okay? You have the whole region is down. So this should be treated as a rare event, okay? But if that happens, all you have to do, so probably you have lost maximum five minutes of data, but we're talking about a massive disaster and you can quickly make that become the primary, okay? You can even go beyond that. You can use cascading replication and go with three, four, five regions. It's really up to you, okay? These are all building blocks. The only problem is when you have a single availability zone Kubernetes cluster, which is very typical from what I see in on-premise setups where still the lift and shift mindset is prevalent. So you're mapping pretty much one data center with one Kubernetes cluster. So all you have to do here is spread the database across nodes and storage. So divide as much as you can, but still your data center is the single point of failure. You're missing out on Kubernetes a lot. And what that generates is more complex business continuity procedures like we used to do and you could stop doing them with Kubernetes if you had three availability zones, at least for Postgres. You have to do them and you have to do more of them with the more applications you have. For each application you need to have business continuity procedures that you could avoid. So my advice is if you have two data centers try and push for a stretch Kubernetes clusters so at least extend the Kubernetes control plane over three data centers somehow and plan for the third data center. So this is the end. I want to thank all the community of CloudNet APG starting from some of the developers are here but really the adopters and everyone that is contributed to this growing community. What we're working on, the image catalog. So for example, there's people that want to use time scale with our operator or other extensions. You can write your own image catalog and just point to that without specifying the image name. The generic interface that I was talking about before will allow us to pretty much simplify the work of the operator through external plugins. So primarily we'll manage backup metrics and logging. We want to control the switch over across Kubernetes clusters with the replica cluster switch over. We also want to introduce synchronous replica clusters for those that have two data centers in the same city and two Kubernetes clusters. The only way for them to talk is through replica clusters. And we can add the synchronous one. Declarative management of databases. Databases are the only global object that we are missing with CloudNet APG. And then logical replication, publication and subscriptions. By the way, we have an imperative way already to set up logical replication publications and subscriptions and also update the sequences. We released it last week. So theoretically you can move already from any PostGestin database in the world into CloudNet APG following three steps with the plugin. If you want to talk, we can talk more. Foreign servers and also storage auto scaling with DOK. We are working on that. So ultimately choose whatever works for your organization. So every organization is unique. Don't believe to those that tell you that all organizations are the same. Define your goals, RTO, RPO, TPS and let you be guided by them. And on Thursday I'll show you some amazing results in terms of TPS that you can achieve. We're working with storage companies to, you know, this new frontier. Meet to get the risk of vendor looking at all levels from the cloud down to the infrastructure level. And, you know, set it on on-prem, private, public, hybrid, multi-cloud. You can do pretty much everything with vanilla Kubernetes, third-party Kubernetes distributions, bare metal VM, and choose the right storage for you. Our advice is to go with shared nothing architectures and to take advantage of nodes and availability zones that, as I said, come from free with Kubernetes. So I hope that I gave you an idea on how you can make the world be the single point of failure for your Postgres databases thanks to Kubernetes. Now I've been using Postgres for many years and I really believe that the best way to run Postgres is in Kubernetes. Okay. Finally, this is my last slide. You know, it's completely off topic. This is a great book. And all, I don't know if you know this book. Anyone who's read this book from Gene Kim, Steven Spear? This book is basically, in my opinion, tremendous opportunity for Kubernetes to shine as a way to move from danger zone to winning zone through slowification and simplification. Okay, so if you have time, I suggest you read that book. And come and meet me at both G30 and around here. So I'm done. Thank you.