 Hey, thanks everybody for joining us at 5.25 on Thursday. I guess it's good to see you're still awake. We've got a pretty exciting session for you today. You can read the title here. We'll introduce ourselves in a second, but we just wanted to start out with a few questions. Hopefully, we can click it to it. That one. Maybe we have a there. I have my new manual clicker. How many people here, hopefully, you're here for this, are running databases on Kubernetes today? Awesome. And then on the next one, of course, if you'll be interested, how many of you are actually running Postgres on Kubernetes today? I guess that's why you're here, because that's what this talk is about. All right. So what are we going to sort of cover today? There's a couple of interesting ways of deploying Postgres. So we want to talk about how do we actually scale Postgres and some of the cool things we do in Kubernetes? How can I deal with a single primary deployment? How do I scale that? People may think just add more replicas. So that's going to be the key thing. And then the second part we're going to talk about is just in general, how do we scale these sort of databases? Now, who are we? My name's Gary Singh. I don't describe myself very much, because I'm very secretive. So I'm just a product manager at Google. And I'm going to pass it over to Gabriella, who is much cooler than me. Thank you, Gary. Let's see if this works. So I need to stay here. Yeah, that's OK. I'll stay here. So I'm Gabriella Bartolini. I'm Vice President of Cloud Edith and Kubernetes at EDB. And I don't know. I mean, this is kind of a dream for me. I've been using Postgres for many, many years. Since early 2000, I'm a Postgres contributor. I'm also data on Kubernetes ambassador. And DevOps is actually what led me to Kubernetes. And my first KubeCon was in 2019. And I was with Marco, who's here with me. He's one of the maintainers. And we started to think about this operator for Postgres using local storage. Like we had been done for many years outside Kubernetes. And people thought we were crazy. So I'm really happy to be here today. After this journey, that's when Cloud Edith PG basically was born. That was August 2019. So I'm a proud co-founder and maintainer of Cloud Edith PG. And previously, I don't know if you're familiar with Postgres. How many of you know Barman? OK, I'm the one that came up with the name. And again, with Marco, I'm one of the creators of this project. And we put basically all the experience we did with Barman. We've put it in Cloud Edith PG, as well as the experience we gained with Rep Manager, of which I was one of the early developers. That's what we've put inside. So the agenda for today is we'll introduce vertical scalability with Postgres first, and then how to manage Postgres in Kubernetes with Cloud Edith PG, and then show some techniques to vertically scale Postgres through storage. Then with Gary, we'll show some benchmark results, and we'll finish with the takeaways. So Postgres has recently received significant recognition, and it's been named database of the year by DB Engines. And it's also holding the top spot as the most popular database management system, according to Stack Overflow's latest survey. In my opinion, a key factor for this Postgres success is a foundational feature that Postgres has had from day one, extensibility. So with extensibility, we can basically tailor our database using data types that we create or functions that use these data types. So basically, what I've seen in real life over the years is that Postgres has continuously evolved, and it's learned from all the technologies from different domains that were rising throughout these two three decades. And the constant has been SQL. So I've seen, for example, XML coming up, JSON. You can actually mix structured data and structured data in Postgres. And extensions like Postgres, timescale, and the latest addition, PgVector, that keeps Postgres at the forefront of innovation. So given the increasing demand for AI and analytics workloads, our focus today, I had to put AI, sorry. Our focus today is to offer insights on how to enhance Postgres databases to cover these critical use cases. So the idea for us today is to bring as much data as we can to AI workloads and analytics. So let's start with vertical scalability in the context of Postgres. So imagine this scenario. You are managing a Kubernetes node. It could be virtual machine or physical. It doesn't matter. This node comes equipped with its own set of resources, CPU, RAM, and most importantly, storage. Storage is the most critical component for a database. And I'm also talking about directly attached storage. Don't think that in Kubernetes, storage needs to be shared. You can run bare metal with locally attached disk. You can do everything. It's pure freedom. So our objective here is to fully maximize the potential of this single node within a database framework and, if necessary, upscale the resources. So this concept in database technology and also in computer sciences is known as vertical scalability. However, when we think about Kubernetes, the prevailing notion suggests that scaling a database across multiple nodes is actually simpler. But that means coming to compromises, compromises in terms of consistency, availability of performance, for example. But do we really need that? So this is kind of the question. And this approach is referred to as horizontal scalability. So in any case, today we focus on vertical scalability. And you can scale vertically using all the resources of a single node, CPU, RAM. But today we'll focus on storage. So before we delve into the specific, I want to quickly recap what CloudNet FPG does and some reference architecture for running PostGest in Kubernetes. If you want to know more, there's a QR code there. You can scan it and you can get redirected to a blog article that I wrote in the CNCF blog about the recommended architectures for PostGest in Kubernetes. So today we'll mention these architecture for a single cluster. So I mean, we're talking about three availability zone or plus Kubernetes cluster. So it means that your single point of failure with just this setup is a region, which is huge for a database. And Kubernetes simplifies all the business continuity plans for you, thanks to this self-feeling and high availability approach. And what I'm going to show you is it provides very high results in terms of a recovery time objective and recovery point of objective, RTO and RTO. So we have three availability zones with one worker node dedicated for PostGest in each availability zone. So we position the primary for PostGest in one worker node and it comes with its PGdata persistent volume. PGdata is where, by default, all PostGest files are located. And if we want to, we can add another volume to separate transactional logs in another volume and then also use stable spaces, which are a way to and will see it to add more space for PostGest. We can use native streaming replication for PostGest with synchronous application. So you have a synchronous standby and the potential is synchronous one. Then we provide a read-write service and the read-only service to access the standby. So this is, by default, the architecture you get out of a very basic cluster in CloudNet RPG. In any case, today we will focus on this. So what we'll try to do is that we're trying to adopt a scientific approach here today. And you have to understand that your organization is unique. So you have your unique people, your unique systems, your unique data. And the idea is that you can only choose through a scientific approach and basically let data drive your decisions. So the idea here is to benchmark, benchmark, and benchmark not only the database but the storage. So at the end, we will provide some results. So now it's your turn, Gary. Back to me. Yeah, so now let's talk a little bit about how many people here are familiar with CloudNet RPG. Man, there's a good audience. All right, we'll cool. Do you have the next one? So just for those who may not be, CloudNet RPG, it's a level five production-ready Kubernetes operator, which is fantastic. It's already in use in a number of places, obviously, from EDB's Big Animal at IBM's CloudPak. We have in a Google Marketplace, Tembo. You can read all the chart, but fully open source, Vendral Neutral created by EDB. It's great for working on that. Multiple deployment options, straight from a manifest for those who love to use whatever your GitOps tool might be. This is working again. We'll see. We'll try again. And as well as from Operator Hub. And you already saw the results on this. Super popularity in 2023. How many stars we have? 3,000 stars on this already. So this is great. You can check out the link there down at the bottom if you want to capture that. It's not working. Not working. We tried. The really nice thing being the fact that I love Kubernetes, this is like the simplest version of the cluster resource. I'm sure you've all seen Kubernetes manifest, but this is super simple to get up and running. How many people have tried to do stateful set stuff themselves and configure everything themselves? And then obviously we can move to sort of an operator model. This is pretty nice. We basically just say if I want a cluster, I want to name it. How many replicas do I want? And storage. Obviously we'll talk about some more configuration parameters, but that's about as easy as it gets to get up and running. On the next chart, I guess we'll have to move that over. I guess let me ask this question. How many people are, if I had a beard, I would be a gray beard. That's why I shave it. How many folks actually used to work with database in the days before we had Kubernetes and VMs? Yeah, do you guys remember creating raid disk arrays, mounting raw volumes, trying to figure out how to map everything for optimal writes and everything like that? And that's a lot of what you might see from your DBAs. So how do we make that easier? Gabrielle said we can mount whatever we want on Kubernetes nodes. But can we make it much easier for you? So obviously, I'm sure everybody's familiar with dynamic provisioning. This is great. With storage classes, we can have separate volumes. And with this operator, we're actually doing direct management of the storage itself, not having to deal in instances themselves, not having to deal necessarily with the stateful sets. Obviously, there's going to be some mandatory volume that must be created. And you'll see when we get to some of the performance testing, just like you used to have to optimize for where you want your logs written, where you want different storage spaces. You can do this for the right-of-head logs, as Gabrielle mentioned. And you can, of course, divide things up into numbers of table spaces. The beauty of this is, too, obviously under the covers, we're leveraging all the, I guess I call Kubernetes magic. But again, you can more easily configure this without having to do yourself through just a manifest definition through the operator manifest here as we cover there. I think the other main thing to look at is learn a little bit. The only thing you have to know is a little bit about what your CSI provider does and what your actual storage, your backing storage is, right? Is it in SSD, is it fast, whatever it might be? You're going to have to use those characteristics. And you'll see that that's why testing becomes important on those, because you may need to look at the trade-offs between performance, cost, and efficiency. And of course, some of the great work that's been done, I think, in Kubernetes lately, volume snapshots, you get a lot of stuff for, I'll call it for free, that's sort of in there, which makes it easy, right? Because now we're just using native Kubernetes volumes, and we can just use snapshot and capabilities. And then there's obviously, there's other backup and recovery technologies as well that work in there. It makes it super simple. Yeah, that's actually a very good point, Gary. It's about leveraging what Kubernetes has already provides. This is kind of one of the pillars of CloudNet RPG. Thanks, Paul. Yeah, I think the, I mean, we've seen, you know, I think just we won't go too much into it, but there's many times where people try to figure out how to, how do you map, I guess, you know, normal constructs to Kubernetes constructs, right? I think Kubernetes has evolved enough to actually support these workloads, and we have that sort of great mapping. And before Gabriella goes into some details on these things, we thought we'd just introduce some concepts in case some of you weren't familiar with them, the concept of tablespaces, right? These are sort of these global objects that you can have in Postgres. They're typically going to be used to how you might want to divide up volumes, typically like on the simplest version. They'll map to either, you know, a directory or it could be an actual, you know, raw disk or whatever, you know, on there. The typical use cases for these obviously store temporary files, divide up logs, do things for separate disks. And as we said, because basically dynamic provision class, storage provider classes, right, are typically going to create another dedicated disk in Kubernetes. Hopefully that's the way you set things up, you know, on-prem in the cloud. That's typically how it works. You create a new storage volume. It's typically going to map to whatever a new solid state disk or persistent disk or whatever it may be, but those are going to be dedicated to you with multiple things mounted to your actual underlying nodes. And this is beautiful, and we'll talk about how to configure this pretty simply with just the tablespaces stanza, right? You just add as many of those as you want, and I think you're next. Yeah, so thanks, Gary. So now we'll start exploring some techniques. So if you've been using Postgre's outside Kubernetes, this is all stuff you know. This is stuff we've been doing for many, many years. And I love the fact that now there's a new wave of us explaining this stuff. So this section, by the way, is the cornerstone of this presentation. So in the previous slide, Gary told you about how Cloud Native PG offers a seamless approach to scaling at the storage level. So we can create additional volumes for walls and also tablespaces. So you've got flexibility here. You can customize storage classes and also optimize cost efficiency and your bandwidth for specific volume purposes. So also the fact that Kubernetes is working through annotations to control bandwidth and optimize the specific volumes is great because for us, it's just adding one annotation in the configuration. So volumes can be added to live clusters and they can also be resized if the storage class supports it. And this basically gives us tremendous adaptability and scalability in case we need to grow. So just to recap, the primary advantages of scaling with volumes, it's not just performance isolation, but also predictability of performance. This is very important to know what you can expect from your storage. And also distribute queries across multiple volumes and also simplify and make more efficient database operations like vacuum or indexing or re-indexing. So the lovely way of doing this in CloudNet FPG is that you just need to add two lines. And basically here, we say to CloudNet FPG, create a new volume for walls and using the default storage class. And here is how you add basically a temporary table space. So temporary table space called TMPTBS. And basically we are telling Postgres to add this volume, this table space, in the temp table spaces configuration option of Postgres. So the operator does that transparently for you. You can add more. So it's really interesting. And a widely used technique, which is particularly effective if your database is simple, is to involve separating IEO for operations for tables and indexes. So in this very simple example, we create two table spaces, one called data and one called IDX. And with SQL statements on the right side, we can create, for example, a table and say this table needs to be in the table space data. And we can also create indexes or constraints in general through the using index table space statement. So all of these, really, if you have large databases, sorry, the same technique, we'll see it in larger databases. But for simple databases, this is already performance improvement. So let's try with a very simple example. I probably have all dealt with web access logs. And this is an example of access log. I will use the timestamp as our most important dimension here. And this isn't just theory. This is actually something that we might call like Jonathan Gonzalez we've done in the past. And he's a fluent bit maintainer as well. And we actually use fluent bit to parse and store this table in Postgres. So as time passes, the fact table that I showed before expands. It accumulates every month new data. And so think about how frequently you access old or versus new data. So think about this access pattern. Is newer data typically accessed more often? These are kind of questions we need to ask yourself. So also consider the scenario where the Postgres planner decides that to retrieve a specific month, so it's actually faster to do a sequential scan, sequential full table scan. And also what happens every time you update or delete a record, what happens to the index? You are sharing the same index with all the records of the table, or when you need to remove an entire month. So this is pretty much the main cause of bloat of your Postgres database when you remove a lot of data like that. So the database over time becomes less and less efficient. This cannot scale. So the solution to this common problem is known in the database industry as horizontal table partitioning. So this is very common in data warehousing. I come from the data warehousing world and in general, very large database environments. Essentially, this technique involves slicing table records horizontally and spreading them across different tables. These tables are known as partitions. So basically what we do is we create this kind of abstract table called partition table, from which we derive the concrete tables that are the partitions. So basically each month resides in its own table and with its own indexes. So over time, these tables become pretty much read only and maybe they're less frequently accessed. So the indexes don't need any more updates. So the cool thing is that partitioning can also be combined with table spaces, allowing all the data to be moved to cheaper storage. So essentially the partitions are kind of a first level index so that routing of inserts and queries is more efficient. And retrieving, for example, the data of a whole month is much faster than before. If you want to remove a whole month of data, you simply drop the table. So you don't have to update any more indexes. The cool thing is that out of the box, Postgre's open source comes with all this stuff and that you can actually achieve partitioning by range, list, hash, and also have subpartitioning. Anyway, declarative partitioning is a complex topic. You can study more by yourself. I'll give you an example here on how to partition by range using the timestamp. This is a partition table and this is how you create partitions. This is all through SQL. And this is how you can, for example, put the current data in a fast table space, fast volume, and progressively move all data in cheaper storage and basically achieve optimization of costs and performance. This is how you can set the table space, by the way, as I shown before. Don't worry, you can alter the table at a later time and move the data in another table space. All the cool stuff. Yeah, so this is pretty cool. So we mentioned that, well, Gabrielle mentioned, that we really have to think about benchmarking things for your specific workloads and things like that. Before we start talking about obviously doing these repeatable benchmark tests, I'll just go back and highlight. And again, I said, I love Kubernetes, so I always will throw the value back in there. I think it's super simple, is my technical term, to iterate on this stuff. Because as we showed, to change how you configure things, it's simply simple stanzos or fields within your CRD. So if you want to try, if I want to do a write it, am I going to have to put the wall separately? Am I going to do table spaces? You can continue to iterate on this stuff, and especially in cloud environments, it's much easier. You don't have to worry about whether that storage is mounted. So it makes it very easy to iteratively test and try out these different scenarios. So obviously, the key things here start small, start with kind of a single instance of a cluster, which makes sense. This is pretty cool. There's a tool, PGBench. I think the link down there talks more about running this yourself and how to use this tool. But this makes it easy to reproduce even the results that we'll show in the next one. The other key thing in your testing, say that you don't skew your results, recommendation was 4x the size of memory so that you're actually checking your actual disk performance and not checking your memory and caching performance. And the beauty of this is with this link down below, again, to rerun this, anybody can run setup and run your tests on a Kubernetes environment your own anywhere in the cloud. So on the next slide, I think we'll talk about the simple base specifications here. We're going small. Note, these aren't the, this is nothing small. I just wanted to call that out. I know Gabrielle will get mad at me, but don't say it. Yeah, yeah, you know, Postgres can do whatever it needs to do. We're going to do this bench OLTP processing here. Again, here is the sort of size, right? I won't read it all out, but you know, 4,500, 6,6 gigs. I'll do 16 clients. And a simple sort of sort of round tripping. And then on the next one, I think we talk about what were the scenarios that we tested. So we wanted to try out a few of the various techniques that are in there. Do we just do a single volume? What's our sort of performance like that? Probably call that your baseline. Do we dedicate a volume for the right of head logs? Do we look at tablespaces for data? And then do we do even do for the indexes, sorry? And then do we actually do the last one that was shown, which was sort of partition the data and have tablespaces there? The results are pretty interesting in this particular case. Maybe they're not as surprising. Obviously, scenario two, which you can see sort of highlighted, worked well in, I guess, three scenarios really. Sort of a bare metal scenario. There were two on Google. And that's just because we have between us and Amazon and other ones, everybody has different storage and different storage classes and different backing. Scenario two, as you remember, was separating the right of head log out, which typically would make sense, right? But performance is pretty significant, right? And then as a small thing, we even ran two tests in Google, which was using just standard PD or SSD. Maybe the difference here on SSD and PD at this scale wasn't significant enough that maybe you'll just say, hey, it's good enough for me, and I'm not going to pay the extra cost, but still are pretty good things. Yeah. And we also have to remember that we're just using 1.5 cores, OK? So this is really, so if you scale with CPU, the results could be better, but yeah. Yeah, we're not going to have parallel rights to disk and things like that, right? You're still context sharing the same CPU. But then the other interesting result was that it turns out that in the EKS case, for example, scenario three was actually better, right, in terms of its improving it. That was the biggest improvement for it. So I guess the TLDR is kind of testing your environment and where you are, right? But again, it's fairly simple. We just use the same tests here to run this on these same clusters, spin them up in these environments, take your configs, deploy it, and you're ready to go. There's people that have done also tests on Raspberry Pi and 250 transaction per second. I like that. The key outcome here, I always get ahead of myself, but that's fine. In this particular case, PG data and the right ahead logs, you can see the improvements that we saw, also depending on which sort of disk type we used. Again, Gabriella highlighted that this is only 1.5 cores. So it's kind of maybe won't test as much if we were separating out and partitioning by tablespaces. But there was still improvement over the baseline. It's just that it wasn't as insignificant as the improvement as wall was in this particular case. And again, storage capabilities are important, right? But I'll just leave it with, it's really nice to be able to just run these tests kind of quickly, right? It doesn't take much to set this up. Kubernetes cluster is up and running. You can pick your default storage classes. You can specify your storage classes, and then you can specify how you want to divide things up. Pass it over. Yeah, and I want to thank people. So the good story about this is that from now, we can actually, everyone can test this stuff. And I want to say the people here for having helped me produce this benchmark. And as I was saying, we're just scraping the surface now. So this is a non-exploit territory for everyone. This is just, for example, a slide that Sajji from Lightbeats did using NVME over TCP. And this is just basic performance. This is just a starting point. We're talking about 15,000 transactions per second to start with. So conclusions. We covered this for primary sections. And so lesson learned today is that storage, I hope you understand, is probably the most critical part for a database in vertical scalability. But do your benchmarks. Know your goals. So know your goals in terms of RTO or RPO. Don't forget that you have to back up and restore. And you have to ensure high availability. So all of these is included in this. So Postgres, I hope you saw today, can scale up through volumes. My recommendation is to use shared nothing architectures. So maybe consider placing Postgres in nodes separated from applications, but running in the same Kubernetes clusters. And there's no one-size-fits-all, but that's also the good part that it's on you. The work is on you because, again, your organization is unique. So all you have an amazing set of technologies, in my opinion, you've got Kubernetes, you've got Postgres, and I like to say you've got also CloudNet RPG now. And you're free to run it everywhere. Private, public, hybrid, multi-cloud, bare metal, VM, and using local or network disks. So last thing, join our data on Kubernetes community if you want to know more about stateful workloads and also the CloudNet RPG community. So thank you. And questions? Are there any questions? I think they learned everything they needed to know today. And it's 6 o'clock. Hello. I have a question about the backup and the fact that you split the data into several volumes. If you do snapshot of disk, you don't have a coherence snapshot? Thank you for the question. OK, so we are pretty much one of the first operators in the database space to support volume snapshot backups and recovery. If you can go to the KubeCon in Chicago, you can watch the video of the talk that I gave with Michel from Google. It's called Disaster Recovery about very large databases in Postgres. I showed how to restore a 4.5 terabyte database in two minutes. Two minutes. So the consistent is granted by the whole file. So essentially, when you start the backup procedure, you take a snapshot of all the volumes and we ensure that we copy also the whole file at the start of the backup and at the end of the backup. The other way is that we also have a way we call them called backups, where you can actually take a backup from a standby, we shut it down temporarily, and then you take basically a cold snapshot. So that's consistent by default, and we spin it up. So it's done automatically by the operator. What we're working on, and I would like Leonardo to stand up, please. Leonardo is actually working with tech storage to implement the first operator supporting volume group snapshots in Kubernetes. So Kubernetes is working on ensuring consistency of multiple volumes at the same time. And we are the first pioneers of this technology. So we're really happy. Actually, there's already a patch for that. But this is how it's achieved. So Postgres allows you to basically exploit all of that. This is a technology that's been in Postgres for over almost 20 years. So it's very stable. Thank you for the question. No more questions? OK, thank you. Thanks, everybody.