 And so our first speaker is Craig. He works for Citus and Microsoft. And then he's going to talk about Postgres extensions. Thank you. Morning, everyone. Thanks for being here. I know the first talk of the day is always the hardest one to wake up and get to. So I'm going to spend some time kind of a tour of extensions. I don't know if this resonates over here in Europe like it does in the US. Has anyone seen Carmen Sandiego or familiar with that? A few hands. OK. It was this computer game that would learn geography growing up in elementary school. There would be clues and mysteries and all that sort of thing. I'm actually really excited because now it's on Netflix, so my daughter, I can introduce it to her. Similar to DuckTales. Is anyone excited that's back? OK, just me. So Postgres has been around for over 20 years now. And it keeps getting better and better and better. But I'm with a strong opinion that I think some of the most exciting things for Postgres in the future are not in the core of Postgres. The core of Postgres moves as a really solid, stable foundation. It used to do one thing right, which was not your loser data. That was kind of the goal of Postgres. In the last five years or so it's become, I would say, a sexier database if you can call a database sexy. We got things like rich geospatial support. We have JSONB. We have full text search. Now we have native partitioning. So it's less and less of just a stodgy relational database. But extensions are a really interesting approach to extend Postgres without having to change the core code base, really taking it to be more of a data platform than just a relational database. Before I get into it too much, a little bit more about me. If you see me online, that's what I look like. I work at CitusData. We turn Postgres into a horizontally distributed scale-out database. We were actually, I guess I used to work for Citus. We were acquired about two weeks ago by Microsoft. So happy to chat and answer any questions about that afterwards as well. Previously, I ran Heroku Postgres. So there we ran around 1.5 million Postgres databases for customers, so a pretty reasonable scale, and curate Postgres weekly. So if you're not subscribed, I encourage you to, if you care about Postgres at all. It's focused less on like a DBA content and more on like an app dev rundown of here's what happened this week and here's interesting articles and content. All right, so what are extensions? So per the Postgres manual, extensions are basically low-level hooks that allow you to change or extend the behavior of Postgres. You can do things like they can be written in C or other languages. So you can write them in pure SQL. You can write them in Python in cases, that sort of thing. But allow you to basically create and change the behavior of Postgres. You can have new data types. You can have new index types. All sorts of things directly now within your database without changing the core underlying code. Postgres itself, you probably use some extensions without realizing it. It ships with some native extensions already known as contrib ones. Others have to be built and installed. If you're running on somewhere like Amazon RDS, there already have a whitelist of extensions, some that are contrib, some that are outside of contrib that you can install. So a few of the examples of the contrib ones. And we'll dive deeper into a few of these, not all. If you've ever used the UUID data type, you've probably used the extension for it. CI text, if you're migrating from MySQL, which doesn't preserve case sensitivity, to Postgres, which thinks that's important, but you maybe want to preserve the value, you can have a case insensitive text data type. HStore was one I think the first, like no SQL data types within Postgres. It's a key value store directly in Postgres. I say one of the first, because I think the first technically was XML, which came about 15 years ago. Now the non-contrib ones get really interesting. And I think we'll see less in contrib over time. For a while, that was the primary way you would find extensions. Now there's more and more in an ecosystem starting to develop on its own. These includes things like PostGIS, which if you're doing anything geospatial, PostGIS is well regarded as one of the most advanced geospatial databases that exists. Things like Citus, which turn it into a sharded, distributed database. Hyperloglog, which I think is just fun to say. And there's a list of probably a north of 100 extensions. I'll get into it more where you can go and explore, look and find new ones. So today, what we're gonna do is take a tour of just a quick drive by of a bunch of these. To give you a sampling of the various things you can do, I don't expect this to be exhaustive and learn everything you need to know, but hopefully you get a taste of, hey, this one caught my attention. Now I wanna go and use it or learn more. So the first one I would consider is, the most critical extension for anyone running a database. PGSTAT statements has existed for a while and then got an update in version 9.2 where it is now immensely useful. So what's it do? It records all the queries that were run against your database and basically normalizes these or parameterizes these so that you can see things like how long it ran, how many times it ran, things like how many IO blocks are read, how many pages were dirty, all these sort of things under the covers. So what's it actually look like? I don't know if you can see that perfectly. Can we kill the lights anymore? Dead room? All right, we'll see if we can dim it a little bit more. But basically, if you can squint and see that, oh, that's much better. Thanks. Oh, success. Well, you can't see me now. That's fine. All right, so you've got all these things, right? Like the query of select star from users where email is like my email address at cytostata.com. You've got all these different like shared blocks that were dirty and read. A lot of this you actually don't need to know. This is like if you clear the PG SAS statements table, what you're gonna get. What you can do from this though is write this really, really simple query that'll aggregate the total amount of time a query's been consuming resources from my database and how long it takes on average. So I can get this really, really nice pie in the sky view of like this query has run for a total of 295 seconds against my database. And on average takes 10 milliseconds. Similarly, I've got another one that's consumed 219 seconds. On average takes 80 milliseconds. Now just by a rule of thumb of what I know on performance, I can probably get most queries down to about a millisecond or so. So if I wanted to go and optimize things without having to look at any of my application code, I can just hop in here, say, where's it slow? Let me go add an index on this second query and give a lot more performance back to my database. Cytus, we actually took and extended this as well. So we have something Cytus stat statements within Cytus, which extends on PG stat statement. If you're doing something that's very multi-tenant based, think like a B2B application, if you're Salesforce CRM and, you know, one customer's data is their own, but you wanna know which one of your customers is using the most resources from your database. What we do is preserve that tenant ID. We preserve that, you know, who is your kind of shard key, that sort of thing. So you can see, you know, who is noisiest, who's consuming the most resources, maybe who's underutilized, that sort of thing. All right, PostGIS. PostGIS has a wealth of information out there. It is probably the largest extension that exists for Postgres, and it has its own complete kind of parallel ecosystem that runs parallel to the Postgres ecosystem. It is well regarded as the most advanced geospatial database. Well, when you install PostGIS, what you get are all these new geospatial data types. So you have things like points and polygons, so you can start to draw maps. You can put those directly in the database and then find where, you know, points that overlap within these polygons or what is the short assistance from one point to another based on this map that exists. Now, when you install PostGIS, you get a bunch of new built-in indexing and operators you'll want to start to use. So Postgres itself has a number of different index types. You've got the standard B-tree index. You've got GIST, GEN, SPGIST, and I'm missing one, BRIN. BRIN and SPGIST, I know are used for really, really large data sets that can naturally cluster together. GEN, the easiest way to describe it is when you have multiple values within the same column. So if you think about text sentences or arrays or JSON is an obvious one, right? You've got a large document in a single column. GIST is most commonly used on things like full text search where you've got values that can overlap between rows and then very, very heavily on the geospatial side. So here you can have a GIST index and you can say, hey, select the distance from these two points directly within the database and Postgres is gonna do all the heavy lifting for you. I'm gonna just go to fly by PostGIS because otherwise I would spend 45 minutes on it. There's a number of other extensions that you tend to use with it in coordination as well. PG routing, which is useful for finding a mapping, right? How do I get from point A to point B based on these roads and otherwise? There's a number of others around connecting to remote geospatial data sources. That's really common in the geospatial space is, hey, I've got this other data source that's over here, open maps and that sort of thing and you can connect directly from within Postgres to those externally. You don't have to pull all those in. All right, gonna shift a bit to Hyperlog log. This is one of my favorite extensions just because I think it's really fun to say and I think I sound way more intelligent when I start to talk about it as well. So if you read the paper on Hyperlog log, it's a paper out of Google. It has all of these things in there, things like minimum value, bit observable patterns, stochastic and harmonic averaging. Do all these things make perfect sense to everyone in here? Cool, I'm not alone. I read the paper and there's all sorts of map in it that makes a lot of sense for all sorts of reasons that I don't understand at all. To simplify it, it's basically probabilistic uniques with a small print or I think it is close enough uniques. So what happens is it's doing some sampling of data as it comes in, basically how many zeros are in the front of the value, sampling that down, combining it with other sets so they can do really interesting things. So taking a look at it, we've got the extension HLL we're gonna create and then we've got this new data type. So we can see that I'm gonna call it set in HLL. Now when I wanna insert into it, I'm not just gonna insert into it, I'm gonna use a function that says, hey, hash this value and so we can hash anything pretty much. We can have some integer, we can hash text. It's gonna take a hash of this and store this directly in this data type now. And what I'm gonna do is create this table. So in this case, I'm taking all of the raw site bits as I have every single day. I'm just gonna record an impression to my website, save that, and then I'm gonna roll this up into a Unix table at the end of the day. So I'm gonna do insert daily uniques and then I'm gonna select all of my values and I'm gonna hash them together. Now when I query this table, it's gonna get, I'm gonna get this. So it's super intelligible, right? What I can do with this though is I've got this daily Unix table and I can query and get a record back that doesn't make any sense. But what I can do is basically say how many Unix are there and extract it from this. I can also start to combine this. So I've got like, I saw 100 Unique people on Monday, I saw 100 Unique people on Tuesday, right? But how many did I see on the combination of Monday and Tuesday? And HLL is really interesting in that it can combine those two. So I can see how many people did I see both days, how many people did I see on just one of the days, those sort of things. So a few best practices for it. It uses update, so you're not gonna insert directly into it, you're gonna take data from somewhere else, update into that data type. And you do wanna tweak the config. So I said it's close enough to Unique, it's usually quite accurate, even right out of the box. But you can tune a lot of things like how big is the data structure itself, how accurate is it, how sparse is your data. So you can actually come in here and tune a lot of these config settings for it. So is it better than just having the raw data? With it, you can store an estimate account of around tens of billions in a little over 1,000 bytes. So I'd say that's some pretty good compression with a few percent of error. And for most cases, if you're like an ad network, if you're an advertising, that can work really well versus storing all the raw data. All right, so Top In. Top In is another approximation extension as well. So for HLL, if you have data that's too big to have Uniques or it's too costly, it's a great one. Top In, also known as Top K often, is the top set of people that have done X or Y. So if you wanna see what are your top 10 pages on your website by visit, it's a great one. So Top In, we're gonna create the extension and then we're gonna have a Top In. But instead of storing the Top In as a data type, we're actually gonna store it as a JSON-B. Now when we insert into it, we're gonna do this Top In at Ag. So we're gonna insert in a similar fashion to how we did with HLL. But our data types actually appear a JSON data type, which is pretty nice. So instead of that, when we query the HLL table before and we got that unintelligible set of bytes, here what we can see is a JSON-B data type that's pretty understandable. So this is my top thousand users. This could be my top pages on my website. Now to query this, we've also got a very, very similar aggregate to what we had with HLL. So to parse this out of, hey, if I actually wanna know what are my top pages for a set of days or top GitHub repositories, here I'm gonna use the Top In Union aggregate and feed that in to the Top In function itself, which expects this data type. And I get a pretty intelligible, nice output here. Cool. So shifting a bit, there's a lot of interesting ones like Hyperlog, Log, Top In that are approximation, allow us to do kind of some new operation. There's also extensions that kind of change fundamentally what Postgres can do. So Timescale is a company that actually runs and has a time series extension called Timescale itself. So if you're looking at Timescale, a few kind of requirements generally, like you wanna have data records that always have a timestamp, here you're looking at probably sensor data is the most common and you're looking at append-only data. So sensor data, again, a good fit. Like it's append-only, I'm not going back. The sensor itself isn't getting updated, edited, saving values, it's just saying, here I read this, send it off to the database. And this is really key I think for a lot of time series databases where they often get overused. Recency is really, really important and key. If you're always recording all of the data and always querying on all of it, a time series database isn't gonna help you as much because you're having to scan the data anyways. So you're really looking for no updates to your data and a recency bias when you're querying it. So taking a look at timescale, we're gonna go ahead and create a table that has taxi rides. So it has the pickup, the from, the to, things like rate and all that sort of data. Now this is just a standard Postgres table. To start to put timescale in place, what we're gonna do is take this table and run a function on it that's called create hyper table. Under the covers, this is gonna create all sorts of automatic partitions. So it's gonna create like maybe a one minute partition for every set of new time series data that comes in. At this point, I don't really have to think about it or worry about it. It's just doing it for me under the covers. And then when I start to query it, we're not gonna do anything special really again. So here we have pretty standard query that's finding the average fair amounts grouped by day. Under the covers, what this is gonna do is start to split it up and say, hey, five minutes of this data is stored here. Aggregate it together, five minutes of this data is stored here. So it actually knows how to go to those underlying partitions without you having to think about it. And you can go to basically really, really granular buckets as well. So you can do really broad, more granular than exists from their partitions. It knows how to span partitions appropriately. The other nice thing is you can go in here and tune in the config to start to roll off the old ones. This is generally pretty key for the time series databases where, hey, you have a lot of data, but if you're using recent data, you wanna get rid of the old stuff. So you can move it either to colder storage, do pre-aggregates, save them, and then delete the raw data. That's generally a pretty common workflow. Within the time series space, there's another one, PG part man as well. So Postgres itself got native time partitioning a couple of releases ago. At the time it had some rough edges. It's gotten better. I still say generally don't use the native Postgres partitioning without some extra utility. PG part man is really kind of nice to smooth out the rough edges. I think by 12 knock on wood, it'll be where we don't need external extensions. It's gotten to the point where it's super solid and robust on its own. But for now, you generally wanna rely on something like PG part man or time scale. So PG part man, time scale kind of does its own thing in regards to partitioning. PG part man builds on Postgres itself. So it takes all of those, the native Postgres functionality and gives you basically some helper utilities and maintenance elements so that you don't have to kind of do some of the manual overhead that you wouldn't. So for PG part man, we're gonna create our table and we're gonna use the native partition here and say partition by created that. So a little different from time scale where we created the table first, then came in and created this hyper table. Here up front, we're gonna specify this table is gonna be partitioned by something. Then I'm gonna have PG part man come in here and create my parent and I'm gonna go ahead and update this config right away. So this is a thing I personally like to do. You don't have to. It's up to you on your partition setting but basically I'm gonna say, hey, keep going creating partitions forever as long as there's new data coming in. Don't just stop and run out. This is a common thing I think I see with a lot of people that manually set up time partitioning. They say, hey, I'm gonna create five years worth of tables. That's more than I will ever need and five years later, you're not around and no one that knows how that system is around and you're getting errors in the log that you can't insert to a partition that doesn't exist. So please set up automation around most of your partitioning items. Now under the covers, what this is gonna do is create all these events tables. So we're gonna create events 2018 for nine o'clock, nine o'five, nine, 10, et cetera. So PG Partman has a pretty robust config. You can come in here and tune a lot of things. Now the config itself is just within a Postgres table. So things like versioning of it, you maybe wanna be a little careful and that sort of thing. But there's a number of things you wanna come in here and probably set. Like how many partitions you wanna pre-make? When you see data from like right now, I wanna create four partitions into the future for this so that when new data comes in, I'm not on the fly creating that. How often do you want your jobs to run? How long do you wanna retain things? Do you wanna just retain things for 30 days for one day? And PG Partman will then go in and automatically take care of dropping these old partitions, creating new ones, that sort of thing. So PG Partman basically builds on that native partitioning. I would say if you really wanna use Postgres native partitioning, which you should, it's gonna keep getting better. You should also consider PG Partman alongside it. If for some reason you don't want the native partitioning, take a look at timescale. All right, so Cytus. So I mentioned I work at Cytus, but Cytus is also a open source extension as well. So you can take it like all these others, download it, run it, install it. What Cytus does is turn Postgres into a horizontally scalable distributed database. So to your application, it still looks like a single node Postgres database. It looks like Postgres because it is. Under the covers, what we've done is spread that out across multiple physical nodes. So basically it turns it into a sharded database without you having to worry about it. This generally is the case when you're outgrowing the limits of a single node. I've seen that happen often in the terabytes. I've seen it happen as early as 100 gigs. If you're not ever gonna approach that level, you don't need a sharding out of the covers. So really quickly, for those that may not be familiar, what is sharding? It's basically the process of splitting your database into a bunch of smaller bits. So here if I've got one database right there and I've got a bunch of smaller tables, similar to how we had with partitioning. Maybe that was 16 partitions. Here what I'm gonna do is actually take this and in this case I've got 32. I'm gonna spread them out across two physical nodes. So a couple of important things to note here. I think a lot of people get a little mixed up when they first start on sharding and say, hey, I'm gonna create two shards because I have two physical nodes. And what that means is that's great. You split it up across two nodes, but if you wanna go split that up across four nodes now, now how are you gonna split up those tables that were on there? So it's really important to create a larger number of logical shards than you have a physical nodes up front. You don't have to use factors of two, but usually it works pretty nicely. So here we've got two physical nodes, but 32 actual shards under the covers. What you're gonna do up front usually is hash based on some ID. So hash on your primary key. It can be an integer, it can be text, whatever. There's a lot of actually talks on the internet of, hey, what's the right appropriate hashing function? It's not gonna determine whether you're successful or fail with sharding. The Postgres internal ones work great. Postgres has its own internal hashing functions. You can just use those. And again, you're gonna define a large number of shards up front. Two is generally bad. You also don't wanna go overkill. Two million is probably on the extreme. A lot of people in production I see with like 128, 256, and they'll be fine for the next 10, 15, 20 years probably. So generally you don't wanna take and just route values. This is the other mistake I see is that people say, hey, I wanna set up sharding early on. I'm gonna put my first 10 customers on this node, my next 10 customers on this node, my next 10 customers on the other node. The problem with that is you're gonna have some big hotspots. And so my first 10 customers are my oldest. There's the ones that have been my platform the longest and the other ones with most data. So now I've got all those together competing for resources and my new node with my last 10 customers is a completely unoccupied box. So what you actually wanna do is take a range of hashes right up front. So you're gonna say, okay, if I'm gonna have 32 shards, look at the resulting hash range and split that up. So if I've got like a hash of one, that's 46,000. I've got a hash of two, that's 27,000. And if I divide up the entire space, like shard 13 out of 32 shards would have the hash values from 26,000 to 28,000. So I can see that, you know, user number two would go into bucket number 13 right away. This is gonna create a nice distribution of my data without me having to worry about those hotspots. It's not perfect, but it covers 90% of cases pretty well. And then what I'm gonna do is have like something like an events table. So here's how is it working? We've got something like GitHub events and the users. We're gonna shard both of these by the same column here, I believe, user ID. Yep. And so what the site is, we've gone ahead and already created those tables just like we did with timescale. Just a standard Postgres table, nothing special about it. Then I'm gonna have this other function called create distributed table, which under the covers is gonna go create all those shards. So under the covers, I didn't have to go and create all those different ones, figure out how to route them. I just said create distributed table and now I've got GitHub events underscore 001, 002, 003, et cetera. And then when I want to insert data, I just insert it into GitHub events like I normally would and it'll rewrite and route that. Then for querying, we just execute normal SQL. We do a select count star. And what this is actually gonna do is rewrite the execution under the covers. So because I'm doing a count star from GitHub events, all that data is in 32 different tables. And now I've got to get a count from events underscore one, events underscore two, three, et cetera. And we can see right here that we've got different executors under the covers. This is gonna take this and rewrite it and say, hey, this is the explain plan one of 32. And I'm actually executing 32 of these different explains or these queries, pulling it together back on the coordinator and giving you the value back. So rewriting the query in the executor under the covers and then distributing multiple lines, getting the results back and returning. To you, this just looked like a standard SQL query. If you're routing to a single node, we don't wanna hit all 32 nodes, right? So if you're saying, hey, I just want the events for user ID two, well, I know that lives on shard number 13, I'm gonna rewrite that query under the covers and basically say, okay, now send this from GitHub events down to GitHub events underscore 13, just that one node, get the data and bring it back and not wait for everything else. And then I have to put this in here because this is like the one graphic I have where database people, we don't do a lot of UI, but this reminds me of like the Windows 1995 disk defrag. Everyone remember that? I never knew if it was actually doing anything but I have always felt like it did. And so for us, basically when you're rebalancing shards, we're gonna take and move these in an online fashion. So you've got 32 shards on one node, 32 on another. If you wanted to scale that to four nodes, we're gonna take 16 from one, move them over, 16 from the other. All right, FTWs. So FTWs I put up here is, they are extensions where they're almost like their own special class of extension. They allow you to connect from within Postgres to another system and query it directly. So if you have a bunch of disparate data that you wanna pull in for like ETL or one off reporting, that sort of thing, they're really, really useful. Now I say they're like their own class because writing for data wrappers is like writing an extension but a little bit different as well. You've got certain different things that you can do with them. And there's an entire ecosystem and collection of foreign data wrappers. So while we've got like 100 extensions or so, we've also got, I would estimate 30, 40, 50 foreign data wrappers. I could be off on that number but it's probably roughly in that ballpark. And there are some really crazy foreign data wrappers as well. So like these are some obvious ones like Redis, Mongo, Cstore, which is a column restore. There are ones like the DevNol foreign data wrapper that just like lets you write data to nowhere. There is a Twitter foreign data wrapper so you can query Twitter from directly within Postgres. There's an email one. So there's a lot of foreign data wrappers that allow you to connect to anything including like a CSV one. So if you've got a lot of CSV data and you wanna parse that, you can do it directly within Postgres. So when you use a foreign data wrapper you're gonna install it like the extension first and then you're gonna create a foreign server and you're basically gonna give a connection to this other database. Postgres itself already comes with a Postgres foreign data wrapper which is really useful. So you can query from one Postgres database into another. Really, really handy in a lot of situations. Here we're gonna actually connect to Redis. I'm gonna create my foreign server to say, hey, it's over here. Then I'm gonna create my foreign table. So for Redis it's just key value so we're not gonna have multiple tables but something like a Postgres FDW. You may wanna map all of your tables from some other database directly into your local one. And then you're gonna create a mapping of, hey, who can see this, how do I connect to it? Now when I describe this, my database, I can see I've got products, purchases, users and I've got this Redis database. And this Redis database in this case is a cache of who's visited my site and how many times. So I've got like, hey, this user showed up five times in the past three days. If I wanna query something like, hey, give me really basic user 40, how many times have they been here? Just show me how many times have I seen this user. Great, but I can also then go in and join this. So with Redis it's in a text value so I have to do some casting but I can say show me my top users that have been here more than 40 times. And then I can maybe look at things like, okay, who's been here more than 40 times but not bought something or who had something in their checkout and it was here yesterday but I haven't seen since. So I can do some really interesting things here. I would say use caution when putting some of the foreign data wrappers into production. They don't always push down things well so you may bring back your entire Redis database as you're querying this which may be fast but if you have a 10 gig Postgres database you probably don't wanna pull back that full table. Now fortunately the Postgres FTW is getting better at pushing down predicates which is really exciting and I think more and more and more in a user facing production website we'll be able to use the Postgres one. We'll see in times with others. All right, so a bit of a wrap up. Postgres is definitely more than just Postgres. I should have asked at the start. How many people here are using any of these extensions already? Awesome, a few hands. How many here have used all of these extensions? Someone's kind of cheating back there. So there's a whole other world of extensions out there beyond just these five, six. There are new ones created every week and I think they're really pushing the battery of what you can do with Postgres and now we don't have to wait on the core community which has a higher barrier to what gets committed, right? It has to be well maintained for an X number period of years. Here we can go and experiment and have fun and create a lot of new things but also add a lot of value without having to wait a year, a year and have to see something directly in core. A few honorable mentions. Madlib is a really awesome one. I don't know how well it's maintained these days. It was originally I believe out of UC Berkeley but it's like machine learning data science directly in Postgres. So a pretty exciting one. Zombo is interesting. You can connect directly from within Postgres and to Elasticsearch. Automatically send your data there and then when you query it, it'll use Elasticsearch indexes. So you can have Elasticsearch indexes basically backing your Postgres data. C store I mentioned briefly earlier, that's a columnar store directly in Postgres. PGCon is actually really, really handy. If you ever need to go and delete things on a regular basis, why set up a cron job on a server somewhere else or on your database that could fail when you could run it directly within your database? So if Postgres is up, your cron job is up and running. PGCon also really useful for rollups. So if you have a bunch of raw data you ingest and you want to do rollups every five minutes every day, really, really useful to run all that directly in the database. So a little bit of further reading. Here's a blog post we wrote on kind of what it means to be an extension. pgxn.org is kind of the current Postgres extension community. So new versions of extensions are posted there. New extensions are there. You can get a description of what they are. They're tagged in some way so you can kind of easily browse them if you're looking for data types or for data wrappers, that sort of thing. And then if you're feeling adventurous, take a shot at writing your own extension. You could do a lot of fun things. Like if there's an extension that you want that's not there, give it a shot. There's a few I'd love to see like an email data type directly in Postgres. I don't think it's going to land directly in core super soon, but maybe if we start as an extension we can put a little pressure and see if it gets there. And that's it. And I think I've got a few minutes for questions. Questions anyone? Have a question regarding Citus extension. In which way are you handling backups? Do you have many small backups for each node or somehow? Which way are we handling what? Backups. Backups. Yeah, so it really kind of depends. What we generally do is use just the standard Postgres tools. So every node within Citus is just a Postgres database. So if you have like a coordinator in like two data nodes you would just have three backup processes running. So we have customers that like ourselves we use Wally, we have customers that use Barman, Backgres, a mix of things, but it's kind of whatever you would normally do for Postgres because it is just Postgres, follow that process. You just get to do more of it because you have more nodes. Other than contrib, where do you find these extensions? So PGXN is a great place. I would also say GitHub and then Postgres Weekly. So some obscure ones will show up on GitHub and never hit PGXN. Postgres Weekly try to highlight most ones that come up better of a certain quality, not every extension that exists. But then PGXN I would say is kind of the de facto directory, pgxn.org. Oh yes. What's your favorite extension? You weren't here. Oh, so this shows where you were late. So yeah, so I think in order I would rank my top three PG stat statements, it is the most useful extension for application developers hands down. I'm biased, but Citus is a pretty cool one that turns Postgres into a shorted database without having to worry about all that. And then I really do just love saying hyperlog log and sound I think way more intelligent when I talk about all the things it does. So it probably pulls out the number three spot. All right, thank you. Thank you.