 Welcome everyone, my name is Chris Milsteads, I work for a company owned at, which is a container storage solution that runs inside Kubernetes, so turn your Kubernetes cluster into kind of a storage array and I'm here presenting with ... Hi, I'm Gabriel Ibatolini, as you have made guests from Italy. I work for EDB, which is one of the major contributors to the PostgreSQL open source project, I'm vice president and CEO of Cloud Native and my primary goal in the organization is to enhance the PostgreSQL experience in the Kubernetes space. I'm an active member of the PostgreSQL community and I've been working with PostgreSQL for over two decades. I'm an early adopter of the DevOps culture and here today I will briefly cover the Cloud Native PG open source operator while Chris will primarily talk about the storage part. Between us hopefully you find it interesting, so we're going to try and do a bit about kind of the intro, set the scene, then we're going to do a lot about the Postgres patterns and then we're going to try and do a little demo at the end, which I have recorded it because anyone in Valencia know what the Wi-Fi was like, trying to do a live demo was absolutely impossible, so we've gone for the backup save video, but it is running on my laptop, so if anyone does want to come and see it in real life, come and find me either here afterwards or at the booth and we'll go through it and do it again. As you can see by the picture, EDB had a party last night and there might have been some alcohol served and we don't look like that normally in real life. The first part is, there was a talk previously to this one which was data on Kubernetes, bring it on. We're not going to try and convince you to put data on Kubernetes, we're going to assume that you're all given in this room that this is the case, but if you are interested and want to know more, both of our companies are very active members and proud supporters of the data on Kubernetes and they've got these great reports, so if you or your boss or your teams or anyone is interested in convincing people to run data on Kubernetes, please go and download this report, there's a QR code in the bottom right hand side and the slides which you can click on that and you'll get access to the report. You think I should want to say about that one, yeah, really? That's fine, no, no, you said everything, thank you. It's all good. Boring words on the slide, the takeaway from this slide is that by putting your data into Kubernetes, none of your resiliency problems change. You still have to think about exactly the same problems, but it's a lot easier and there's a lot of nice automation and things that happen either from the Kubernetes side or from the cloud native Postgres operator side to make your job a lot easier. So it is much easier to automate and get this all to work without having to spend months and sacrificing whatever animals your country would do. But the main thing which I'm going to just leave you is the bottom three acronyms. So the maximum tolerable downtime. The first thing you do when you start looking at your resilience is you say what is the business willing to put up with and what will the business pay to protect against? This is a business and application level question. And then that leads you on to the two technical things, your recovery time objective and your recovery point objective. How much data can I afford to lose in the events of an outage? And how long have I got to get the systems back up and running? So we're going to try and frame kind of the different scenarios and the different patterns we're going to talk about in these different terminologies. But the key thing for me is resiliency. Nothing changes if you're running in Kubernetes. You still have to do exactly the same kind of thinking about it. And with that, I think we're going to jump onto some patterns. So I'm going to hand the clicker back to you. Yeah, thank you. Thank you, Chris. So this is tough, because I'm trying to put PostgreSQL in one slide. And having also a former core member of PostgreSQL here, this is going to be tough. But anyway, I want to ask, was in this room already using PostgreSQL in production? Wow, OK. And now keep your hand up, please. And who's using that in Kubernetes? I see many hands going down. OK, so hopefully, you are convinced after this talk that this is not only possible, but as I define it, that's the best overall experience of PostgreSQL. I think that PostgreSQL and others work very well. So for those of you who instead are not familiar with PostgreSQL, or simply PostgreSQL, it's one of the most successful and innovative open source projects ever. So it's one of the most popular database management systems, as we can also see from this room today, but especially in virtualized and bare metal environment. And according to a recent survey from Stack Overflow, it's the most loved database by developers. However, I feel it's important to restate some of the extraordinary capabilities that come out of the box with PostgreSQL. Many enterprise level features that have been consistently introduced in the project year after year, one major release after the other. PostgreSQL implements the primary standby architecture with a single primary and an optional number of read-only replicas suitable for high availability and read scalability purposes. This is possible through its native streaming replication protocol, which is available in different flavors. Physical, logical, asynchronous, and synchronous. By the way, even at transaction level, you can set these stuff. And also cascading. PostgreSQL also supports file-based replication, which is extremely useful in multi-region setups, especially when used in conjunction with object stores in the cloud. Then we've got continuous backup and point-in-time recovery that completes the business continuity requirements, enabling it to achieve RPO0 and very low RTOs in several disaster scenarios. My list of favorite features includes also declarative partitioning for horizontal table partitioning, parallel queries for vertical scalability, extensibilities. Think about extensions like PostJS, for example, for geographical databases. JSON support for multi-model hybrid databases, IEC transactions, transactional DDL, and last but not least, SQL standard compliance. Obviously, there are many, many more, but if you think that the project has been steadily innovating for at least 25-plus years, and the community just released version 15 with, and I'm really proud to say that, many contributions from ADB. And please raise your hand if you've ever heard of CloudNet FPG. Yeah, a few people. OK, so hopefully by today you'll know more. So CloudNet FPG is technically a level five Kubernetes operator that can manage PostgreSQL clusters, and it is production ready. There are a few operators for PostgreSQL out there, but CloudNet FPG is fundamentally different from the rest. OK, it's open source, it's distributed under Apache license 2.0, but in May this year, its intellectual property has been donated by ADB to a vendor neutral and openly governed community with the long-term aspiration to pursue the CNCF graduation process. We've applied to the CNCF sandbox, and we are currently in the review process. On the technical side, a differentiator is that CloudNet FPG directly extends the Kubernetes controller by defining a custom resource called cluster that manages the status of the cluster and that relies on a component that is called Instance Manager to control the underlying PostgreSQL instance. This includes failover management, which is our usual nightmare when we think about this. So as a result, because of this, because we extend the controller, we don't use a tool like Patroni or RepManager to which, for example, I contributed in the past in the first version of RepManager, stall on and so on. In order to have finite control on the PostgreSQL cluster, we've made the decision not to rely on stateful sets, but to directly manage PVCs. It goes without saying that without our multi-year experience and active contributions to PostgreSQL, as well as a deep understanding of Kubernetes, this couldn't have been possible. CloudNet FPG is fully declarative and relies on Kubernetes resources to facilitate the integration with applications, such as services, mutual TLS via secrets, affinity control, and so on. Out of the box, it provides observability through endpoints for the native Prometheus exporters, as well as direct log-in JSON log to standard output. Then other important features are backup and continuous backup and point-in-time recovery, rolling updates, scale up, scale down, and much more. Let's go to storage. Storage is the most critical component of a database. That's always been like that, Bermessell, VMs, and Kubernetes as well. And you must plan from day zero to run database workloads. As I was mentioning before, CloudNet FPG doesn't make use of stateful sets, and you can learn more about why by scanning this QR code. Instead, we directly manage PVCs, which are the most important assets of a PostgreSQL database. We call it PGData. Our internal motto, indeed, at CloudNet FPG, is the PGData is worth 1,000 pods. This is exactly what we say every time. We can't lose data. Our primary directive is not to lose data. With storage agnostic, and although we recommend sharing nothing architectures, you're actually free to choose between local storage, network storage, and hybrid solutions. We support dynamic provisioning, and we use storage classes and PVC templates. So one of the amazing things of Kubernetes is that it enables us to build a virtual data center using declarative configuration, so infrastructure as code. So I will go through some examples here of architectures from the most basic one to a disaster recovery example that spans over multiple regions on Kubernetes clusters. The first task architecture relies on shared storage over the network, with nodes sharing both the database and application workloads. This second example relies on taints, but you can use also node selectors if you want, to separate applications and dedicate some nodes to Postgre's workloads while still using shared network storage in this case, as you can see. The third example, we decide to dedicate some storage to Postgre's workloads, while in this one, we directly attach local storage on the nodes being reserved for Postgre's. Here we go even farther by dedicating a single node to a single Postgre's instance with local storage again. And this example, instead, describes a three node cluster, Postgre's cluster with the primary and two standbys, where each node has local storage and sits in a different availability zone. Thanks to topology key, okay, so we can do this in a declarative way. This final example, instead, uses the concept of a feature we call replica cluster, where we create another Postgre's cluster in a different Kubernetes cluster, which normally is in a different region, and use streaming replication with a direct connection. So, of course, you need to go through security and so on to make it happen. Otherwise, if you want, you can simply use asynchronous replication with file shipping using an object store that enables to transfer data across regions. Or if you want both, because that's done directly by Postgre's. Postgre's falls back in case the network goes down, okay? So, as you can see, just by using Kubernetes, you can easily open up for multicloud environments or hybrid cloud environments. It's your choice, okay? So, which one to pick? So, really, any, you know, you can pick any, the one that suits for your use case. So, the amazing thing, as I was saying before, is that thanks to Kubernetes, all of these can be done in a declarative way. That's what makes it special, in my opinion. That's the most important differentiator. I want to share an example. I mean, this is an example to configure a Postgre's cluster with three nodes. We use convention of a configuration. You call this myAppDB. And we have a primary and two-standbys. We set the affinity to prefer, you know, using clusters in different nodes. And we set the storage for PGdata and the storage for walls. And this is what happens under the hood. Okay, we initialize the first pvc. And we run initdb, initdb is the process that creates the PGdata of a primary. And once the pvc is initialized, we, the cloud native PG starts the pod. And when the pod is up, we define the Kubernetes service that will be used by applications. So we have three services, one for read writes, one for read only, and one for read operations. And then we use mutual TLS. We set already mutual TLS to communicate with the application. You can use TLS certificates. You can integrate that with cert manager if you want, all out of the box. Then we use PG-based backup. PG-based backup is the internal tool that allows us to clone the standby. We start the pod on the standby. We use again mutual TLS to connect in a safe and secure way and stream data and so on with the third node. And let's look now at the automated failover capability. What happens when the readiness node on the primary starts to fail? So Kubernetes detects that immediately and lets a cloud native PG elect the new primary from the available replicas. Again, this is done directly by the operator, okay? The instance manager, which is, I think, the real differentiator of our operator, that's the process that controls P1 in the pod, okay? It detects that it's the new primary, sorry, because I'm not able to see very well, but end updates DRW service accordingly. Then when the former primary comes back again, the instance manager detects that and it prevents the split-brain situation. And once the pod is ready, it reassigns that to the read-only service. So that's all from me today. So now I'm very happy to pass. You've got to stay around for questions. You've got to stay around for questions, Gabriela. You can't run away. No, no, I'm staying here. He's good. So we're going to flip back a bit and then try and get into the demo that I've built. So I think I'm going to point it like that. Can you just move me on a slide? Oh, sorry. What did I? I don't know what you did. I know we're still on there. It's fine. Where's the clicker? Clicker's there. I'm just going to use this. I think I've destroyed your computer. You've destroyed it. It's fine. Bear with the call. Shift out of the way. So do a little dance. Tell a joke, Gabriela. Just while I fix whatever you've done. Yeah, of course. As you can see, this is how it works. They fail over. And are there any questions you want? Okay, maybe we can ask. No, okay, that's it. Okay, the question is, does it roll back to the original primary? My question is why? Why should it? Does it matter? Okay, so the idea here is to just think in terms of clustered. Okay, so if, okay, we're back, back on track. So that's also a differentiator between, for example, stateful sets and our direct control of PVC because to us, it doesn't matter. The important thing is that there's only one primary at a time, you know? But the other thing is do go and read that thing about it doesn't, as a Kubernetes normal person who's been using Kubernetes for about seven, eight years now, I did not understand what was going on when I first started using it because there were no stateful sets, there was no replica sets, there were no deployments and I went, what is going on? So do go and read the article, it's good. It explains why it does this and why it does, it's a kind of a manual to decide to fail over because you don't want to throw away PG data and then have to re-synchronize it. So it makes a lot of sense once you kind of read the documentation from the operator. So I'm going to try and go really quickly about through these bits and then go through the demo which I'm going to try and narrate at the speed of light which is going to be very entertaining. The first thing just to say is that the storage as Gabriel said is the most critical and I think picking the faster storage that you need to actually run your workload is really critical as well. So the demo we're going to show is on a cloud provider but it's actually running on I3EN instances and an EKS cluster. So we're going to be running the demo on local NVMe drives. So everything I'm showing you will work on any cloud provider or on-premise or any execution venue. So the demo is deliberately done to show you that this will run anywhere and you can do this the same way. As I said, on the storage layer I'm using I can do the replication encryption fault tolerance. So what we're going to do is we're going to try and build a demo where we complement the cloud native Postgres components. So things like at rest encryption which the cloud native Postgres thing does not do we're going to turn that on on the storage layer so that we make sure everything's encrypted. We've got a contract. That's the storage class and we're allowed to speak together from now on. That's the contract, yeah, okay. But the other thing to say as well is that you can very easily turn stuff on and off in the different layers. So to go ahead and just say I'll just use the storage class without understanding what that storage class does under the covers. If you've got a storage array that's replicating data to a second site you turn on storage replication and you turn on Postgres level replication you can end up with 18 copies of the data written to disk and your database is only as fast as the slowest write. So please do ask your Kubernetes team what the storage class actually does and what it looks like because it really does matter. So that's all I want to say there. So in this case we're going to use a couple of storage classes. It's a Kubernetes talks, this and YAML. We're gonna see that in the video. So I'm not gonna talk too much in detail other than just say what you can do with the on that layer is you can just add these annotations. So you can turn on features and turn off features in different storage classes kind of dynamically. So we will map and the one on the left has got the storageOS.com replicas two. The one on the right does not have that. So we're gonna do replication on the storage class on the left at the storage layer and replication at the Postgres layer on the right-hand side. And the only other thing to say is that we talked about the RTO RPO and we talked about the patterns. The really important thing is to match them, you know, the right component to the right RTO RPO. If you replicate at the storage layer and you only have a single database pod, you will have an outage when we kill the pod or kill a node in our Kubernetes cluster. If you're using Postgres replication, it will just promote a primary, sorry, a standby to be a primary and you will not have an impact to your application running on your Kubernetes cluster if we kill a node. So think about what your application and your business application design needs and turn the right bits on at the right level. So the most important thing there is about the backups and the recovery as well. So again, there's two different ways to do it. The right-hand side is this thing called a volume snapshot class. This is kind of the Kubernetes level way of doing it. So if you've got a thousand pods and you need to orchestrate the backup of all of them at the same time to recover an application, you would want to use a backup manager and you want to use some orchestration. So we've got partnerships with people like Kaston and Cloudcasa, for example. So you'd want to use one of these orchestration engines to back everything up at the same time because it's pointless backing up one thing than the next thing and having all your application out of sync. Having said that, that backup and that snapshot will need a pause because you'll need to get consistency so there will be an application impact to using that kind of backup mechanism and as Gabriele said, using the right-of-head logs and using this kind of continuous backup. Yeah, because it is archiving. Yeah, so you'll have a different recovery point objective and recovery time objective on these two different approaches, faster and slower. So we're going to try and do a demo. Is that playing? Right, I'm going to move to this side and try and narrate it. Oh, that is not 1080p in any way, shape or form, is it? Hold on a second. Yeah, thank you, Google. 360p? Honestly. So yeah, but what Chris was saying, it's important. I mean, the good thing here is that you have choice to choose between file system, recreation or the database one. Anyway, we're back on track. So it's going to be a bit small, isn't it? I did try this at home and it did work okay on a flat-screen monitor, so I did test it, but okay. The video I'll post on my GitHub, I'll put a link in the notes at the end so you can go ahead and it'll be on the recording as well. But the main thing is that this bottom side, I'm using EKS CTL. So at the very bottom there, there was a set of steps which was to prime the NVMe drive, which so we're using a local NVMe. So this will work, as I said, in any execution venue. I've got some YAML files up there. So I've got two storage classes, and I've also got two database files. So what I'm going to do is, I think in a second, is show you the storage classes. So in this one, we've got a storage class which doesn't have the replication triggered in it, but what we know in there, there's one of these annotations, is storageOS.com forward slash encryption equals true. So every persistent volume, there will be a persistent volume encryption key created, and then that will be backed off too. By default, of course, it's a Kubernetes secret. So that will go into the Kubernetes XED. But there's also other projects. So there's an open source project which we've contributed to called TrueSo. So you can integrate that into something like a hash call vault. So you can have at rest encryption integrated into a key management service. The second one has the replicas set to two. So in this case, we've got a storage class where we turned replication on. So to make sure we don't have these 18 copies of data in the wrong place, what we're going to do is we're going to match the storage classes with the right database. So this is our standalone database. I called it standalone because there's one of them. I thought brilliant naming convention there. And the instances at the top is set to one. So in this case, I've only got a single PG data and a single well volume. So at the bottom, there's a storage class for the well storage and storage is the PG data storage. So what we want to do is we want to take the standalone database and run it on the replicated storage. And then we're going to have another database which is going to be the replicated storage. And we're going to run that on the, sorry, replicated database and we're going to run it on the non-replicated storage. Now there's a really nice tool which I found out. How do I know that my databases are healthy and what's going on on the Kubernetes cluster? So Cloud Native Postgres has a Kube CTL plug-in. So Kube CTL, CMPG, status. And then standalone DB. So you can call that and it'll tell you what's going on. And the thing you're looking for is there's some OK labels in there. And you can also see things like which one of my right-ahead logs, how far ahead through my transaction processing, am I? And so we've set up basically two databases and they're nicely running at the right speed and we're using local MVMEs. The great thing about these local MVMEs is their submillisecond disk latency for writes. So even with three-way replication across three availability zones, so this demo was recorded in the Dublin, so EU West 1A, 1B, 1C, even with three-way synchronous replication between three different zones, we can get about 30,000 IOPS just out of the smallest instances you can get from that class. So we can get more and more workload through these just by beefing up the size of the instances and beefing up the size of the network links, for example. There's also a command line interface to the on-dat storage layer. So I've got some commands here where I'm just going to pick a persistent volume claim. So this is the thing that is the most critical thing that the Cloud-native Postgres keys off. And just to show you, look, there's this master line and if we pick another persistent volume claim from the replicated database, sorry, the standalone database where we've got storage replication on, what you'll see is there'll be a master and there'll be a replica and a replica. So I'm just in the video showing you, you know, we're looking at everything under the covers. There's no smoke, there's no mirrors. This is all done there. So there's nothing about the Cloud-native Postgres and on that, both of them just key off the kubernetes.topology.io key. So if your cluster has got topology zones set, then this stuff will just work and give you distribution across the availability zones and it will give you fault tolerance across an availability zone out of the box on either the storage way or on the Cloud-native Postgres way. Now, of course, everything's happily running, it should be fine. So like any good demo, we're going to go into the AWS console and we're going to terminate the instance. So we're going to pull the power out of the back of one of the machines on purpose because that's what we have to survive with at the worst-case scenario. So we're going to go in, terminate instance, going to click terminate and boom, it's gone. Pod not available, pod not available. So we've already killed that. So the standalone database has gone, someone's killed the node my pod was running on. I killed the node and it was actually one of the standby, sorry, one of the standby instances that was running on the same node as the standalone database. I can set the demo up so we've got the primary and the standalone database running on the same node. I've run through lots of scenarios, it doesn't really matter. All I'm trying to show is the different recovery mechanisms that go on under the covers. What I found really interesting is there's about, I think it's about 300-second time-outs in Kubernetes and the talk about two before in this room which was about the new graceful shutdown modes and ungraceful shutdown modes, and I need to go and watch that talk to see how it's changed in 1.24 because I think there has been some improvements going on and I'm getting nods from people who've been at the talk. So you've got to wait about five minutes for Kubernetes to notice and start recovering and doing things depending on which time as you hit. But if you look at the AWS console, there's already four nodes there, so AWS has not waited, there's already another node in the 1B availability zone that's been spun up. And then what we do is we go through and we basically see what's going on, so not ready scheduling disabled. Kubernetes hasn't even noticed the machine has died yet. Kubernetes is just like, it's not ready, it's not ready. EKS has already actually created a new instance and is trying to join it to the cluster in the background. So there's some really interesting timing things and the one thing I took away from this is if you're going to do failure and disaster recovery and testing, you need to test the timing issues because things can go wrong all over the place I was finding when I was building the demos and stuff. So look, it still doesn't know it's gone, it still thinks the three nodes there. So what will happen, and we can speed up the demo a little bit and just kind of skip towards the end, what you'll see as we go through this is once Kubernetes notices and the timers have expired for the standalone database, Kubernetes was going, oh, I just need to start the pod again. So Kubernetes will spin the pod up and because at the storage layer we were replicating the storage between 1A, 1B and 1C, the three availability zones at the storage layer we'll promote one of the storage replicas to be the master of the storage layer and that will just connect to the pod and the pod will start running and I think I do a bit where I scroll up and I look at the writer head log just to prove that the well log is where we left it. Well, it's actually one more because I think when it does its recovery it starts a new well log as well as part of that. On the other one, the postgres, the cloud native postgres never recovers automatically as part of the process and this is deliberate and this is the safe. You go back and read the design of their operator, it's deliberate and the way it's meant to work because for it to recover automatically you have to basically say I want to delete PG data and delete the well log from that node. So I'm willing to throw away that node and now at a database level I want to start a re-synchronization of PG data and the well log. So there's a command which I found out which is nicely automated. It's great in the plugin. So kubectl, cmpg, destroy. It is as disruptive and destructive as it sounds like it is and you say this database and this instance number. So if you see on the top left, I don't know, I almost missed it there. And here we go, kubectl, cmpg, destroy, replicated db2. And what you should have seen in the top left is the pods back up unhappy and running and in the bottom left, so those were the two cmpg status boxes for the standalone database and the replicated database. What you would have seen is those two pods are now back up and sorry, the standalone database with the storage replication has just recovered and is back up and running. There was an outage to the operations because until Kubernetes and the timers had all triggered in and until the pod had been rescheduled, the database was not there and was not alive and was not healthy and not responding to requests. Whereas the replicated db, the cloud-native Postgres database, was happy and was running all the way through. So we've gone through this and we've got OK, OK, and we've done a destroy so that pod will be being restarted. So we've had a database running on a node. We've unplugged that node from the Kubernetes cluster in a very, very forceful, terminated fashion and while you've been watching this and this is real-time, within I think it's about six minutes total lapsed, both of those databases have recovered and the one thing I would say as well or took away from it is I'm not a Postgres person by core background, but the simplicity and the ease and the automation and the operation that this cloud-native Postgres stuff. I mean, it was like 15 lines of yaml to just get a database up and running and it just made my life so much easier and it's really, you know, you see you can build very resilient patterns. So if you want an RTORPO of zero, you can use this stuff and you can build, you know, production workloads into Kubernetes now. I'm going to skip past the last bit of that bit at the end. We're pretty much out of time and Gabriele, do you want to just do the honors with the conclusions? Oh, thank you. Yeah, yeah. And I mean, ultimately it's all about freedom. I like to say this word because it's up to us and that's what Kubernetes gives us. Okay, open source in general. We have the possibility to run, for example, an open source stack. And another thing that is important, especially I'm European for the GDPR, for example, is to own our data. We have the possibility now to retain control of the data, full control of the data, and decide if we want the data on premise, on multi-cloud, hybrid cloud, whatever, using the same infrastructure, by the same configuration. Then custom optimization, maybe you want to add more about these, you know? Yeah, so, I mean, if you want to run the database at the lowest price point, and it's a very data-intensive workload, you will not find a cheaper way than running it on local NVMEs. No one's ever done that because there's no one ever thinks to do it, maybe. But using something like on that, that's a CSI plugin that can orchestrate local NVMEs and you get that sub-millisecond response so you could do massive volumes of transactions through your databases. Yeah, correct. And again, I mean, this is something that I'm really happy because we learned a lot working with Chris and his team. We are database people, with storage people. We actually understood each other's point of view, and I think we are much better now. I think we have a better understanding of everything. And then finally, DevOps, because if we are here, in my opinion, it's because we have done a journey that puts closer developers and database administrator and infrastructure administrators. The good thing about Kubernetes PG is that it is designed to work with applications. So it's a database that by itself doesn't have any means, any goal, okay, doesn't solve any goal, but it's through applications that we keep value. It's been a great lot of fun. I think we need to wrap up now, or at the end of the time. We'll hang around for questions. We've also got a draw for a massive Lego Batwing at the on-dat booth. So if anyone wants to come and try and win that, there's only about 40, 50 people entered, so there's a really good chance. We're at the back right-hand corner of the exhibition hall, and we're going to do draw at half past five. So if anyone's come and try and win that, come along. Business cards here, if anyone's getting touch afterwards. But do we do questions live? We'll do questions... No, thanks, Gabriel. Thanks, Chris. We are running out of time. If you have questions, you can still come and talk to him. We are stopping the recording. Thank you very much.