 I said, my name is David Steele, I work for Crunchy Data. I'm not gonna give a pitch, but we do all things Postgres, and you probably know a lot of people who work for Crunchy. So, and of course, we're gonna talk about PG Backgres. So PG Backgres just last week got to 1.0 after two and a half years of development. So it has been running in production in major enterprise for two years. And now we have gotten to the first stable release. Stable in this case means that the interface is stable. So I will not be changing the command line options. I will not be changing the repository format. I will not be changing all that kind of stuff. Backgres has always been extremely stable in the sense of the software works. But over time, I have changed the way it's configured. I have changed the repository format twice. One of those times was in 1.0. So over two and a half years, it changed two times. One of them just recently. So if you're running a previous version, you will need to start with a new repo and you will need to look over your configurations very carefully. The new configurations are simpler though, so everyone tells me at least. My old stuff was just way too complicated. All right, so here's our agenda. First we're gonna talk about why we back up. We're going to talk about living backups, how to back up, design features, performance. We'll have a short demonstration and then we will have questions should any arise. All right, I'm gonna go through these pretty quickly. I mean, is there anyone in this room who really needs to be convinced of why they need backup? No, thanks, Magnus. So anyway, there are a bunch of reasons. Of course, one is hardware failure. Hardware failure happens and no amount of redundancy can prevent it, so having backups is a good idea. But you might say, hey, I've got replication. I don't need backups, I've got replication. Replication are not backups. Replications are for a part of a high availability plan. If you do something terrible to your master, it will be replicated. If you drop that really important table, it will be replicated. I know you can put in replication delays and other things can happen, but realistically, replication is not backup. So that's like saying that RAID is backup. RAID is not backup, RAID is redundancy and that's what replication is. And sometimes you need, JD mentioned this, if your replica gets too far behind, the master can run out of walls if you don't have replication slots or you could fill up your master with wall if you do have replication slots. It's nice to be able to just go and get the archive from some place. Also, when you're bringing up new replicas, you can sync them from the backup instead of from a master. So that's extremely handy. You don't have to put load on your master when you wanna bring up a new replica. You just restore the last backup, turn on replication and you're good to go. Corruption, this is a serious issue that can affect you. There can be bad disks, there can be bad controllers, there can be bad drivers, all kinds of stuff. So backups will in theory protect you from this. The problem is you have to know that it happened. This can be tough. So if you're running 9.3 or greater and you are not running with checksums, you should really think about that. I know it's painful because you have to do a PG dump and restore. You can't do a PG upgrade but you should really think about getting on checksums at the top point. That way at least you know when you have corruption. Accidents, of course, is what I was just talking about. You drop this table, you delete your most important account. That instantly gets replicated. Where do you go to recover that data? You go to your backups. You can also use it for development. There's no more realistic data than your production database. So you can pick that up and use that. This may not be practical due to size or privacy issues, but you can sample your data if you need to and you can also redact it. I know people who do both. And last is reporting. You can use backups to stand up an independent reporting server, which is not a replica. For the cases of reporting, sometimes it's really handy to be able to create temp tables and do other things that you cannot do on a hot standby. And, of course, last is recovering important data that was removed on purpose. This sometimes is just forensics and sometimes you're just curious. So it wasn't an accident that was removed, but you realize, hey, we'd really like to get back to that. So these are all ways that you can use your backups. I can't take credit for this. I found this on the internet, but I've not been able to find any definitive attribution for it. So I've left it unattributed. The only thing I do is I just tell everyone that it wasn't me. The state of any backup is unknown until a restore is attempted. If you do not think this is true, you are kidding yourself. Seriously, this is true. So we need to find a way to make our backups useful. I call this living backups, right? So backups are not the thing to be put in a dusty closet and forgotten about. They're something that you should be integrating into your enterprise in a way that you will know if something goes wrong with them very quickly. So you're using them for staging. You're using them for reporting. You're using them for offline data archiving, development, bringing up new replicas. You name it, all of that stuff. That way, if something goes wrong with your backups, you're gonna know it. Hopefully within a week. If you have more backup retention than that, then you should have an older backup, which still works. So far, with PG Backgres, we have never had a bad backup in the field. We have had the software crash due to a combination of options or do whatever, but PG Backgres has never actually created a backup that it was not able to restore so far. My goal is to never have that happen. Had there been bugs? Yeah, there have been bugs. Those bugs basically always result in early aborts of the actual backup process, and you do not end up with a backup. You end up with an error, which is not ideal, of course, but at least you don't think you have a backup when you don't. And here's the other thing. Unused code paths will not work. So this is still just a restatement of the previous slide. So you need to do this stuff regularly. And rather than, like I said, I think it's better if you can implement an enterprise. People always are gonna do these disaster recovery types of drills, stuff like that. They set them up with the best of intentions. They write a wiki. The wiki gets old. No one ever does it after the first time. And so it's better if you can do one of these things, and then when you get to this stage where something's gone horribly wrong, everyone knows the tools, they're comfortable with restores, they know what they're doing, and it's not a big panic situation. You do not want that. All right, let's talk about how to back up. PGDump, this is the logical dump. For small databases, this is quite good. You get a text file or you get a tar format. And they're extremely easy to work with and they're extremely easy to re-import and you can use them for upgrades and all kinds of good stuff. The problem with PGDump is it's point in time. So if you take a PGDump at midnight and your database server crashes at 11.59 p.m., you have lost almost 24 hours of data. It is gone. You cannot replay a wall on top of PGDump. Next is PGBaseBackup. PGBaseBackup takes a physical backup instead of a logical backup. One of the downsides of that is that you have to back up the entire cluster. But that's okay. PGBaseBackup is a great tool. It does suffer from a few problems. So it is single threaded. It will only take a full backup. So for very large systems, that can be painful. And the other problem is that although base backup will transfer the wall required to make the backup consistent, it doesn't have any general archive management scheme in it. So if you wanna play forward from the time of the PGBaseBackup, you need to be dealing with archiving your own way. PG receive X-Log, whatever. You need to have some mechanism in place to make that work. There are other third party tools, Omni-Pitter, Barman, Wally. Barman is probably the most commonly used one. Omni-Pitter, honestly, I don't wanna insult poor Keith, but it's starting to feel kinda long in the tooth. And Wally, Wally's great if you are backing up to S3. Wally was specifically designed to archive and then later actually do backups to S3, which is pretty cool. And then of course last is Backrest. That's what we're gonna talk about today. Prior to this conference, I always had a question mark after PG Backrest. PG Backrest was not to 1.0 yet, so it always had to be considered a little experimental if you were gonna run it. That is no longer true. Backrest is extremely stable and it is, in my opinion, the best option on this list. I'm just gonna say it. Yes, yes, I'm a proud papa, et cetera, et cetera, but it is also pretty awesome and we're gonna talk about why. So when I set out to write the software, I had a couple of things in mind. One, I was dealing with very, very large Postgres databases up to 40 terabytes. The largest single cluster database we had was 12. So some of these others were sharded clusters and other things, but we were talking about an enormous volume of data. And these were on systems with 32 cores. So it was really painful running R-Sync or Barman or something like that and we're just ticking along on a single core. It was agonizing. So I realized, and I've been using R-Sync, of course, for a backup for years, everyone does. It's the easy way to do things, but R-Sync really wasn't designed for database backup. It's got a couple of problems. One is that it's single threaded. So you can multithread it by writing your own stuff, wrapper around it and telling it which files to copy and stuff like that. But you've got this other problem that has a one second time resolution. So what this means is if Postgres modifies a file, while R-Sync builds its manifest and it gets the timestamp, if Postgres modifies it again in that same second then that file will not be copied for the next incremental. You will need to use checksums and this is one of the reasons why the official documentation has been changed to say on your second R-Sync you must use checksums if you want it to be safe. And if you think this is something that can't happen, I can demonstrate it. It's actually a large window of vulnerability. It's really easy to reproduce and really fun because it works every time I run my script. You get a bad backup. So if you think this can't happen, think again, it can happen. The other big weakness of R-Sync is that it will not compress at the destination. So it compresses in transit over SSH. Actually SSH does that. R-Sync does nothing of the sort. And then you get to your destination and you've got an uncompressed file. So if you've got ZFS or some other big SAN or whatever, that's great. But there are a lot of smaller companies that don't have all that stuff and it's just nice to have backups compressed. It seems like a basic feature, right? So the idea is Backgres doesn't use any of that stuff. No R-Sync, no TAR. It was originally written to run over SSH with a whole slew of command line tools all piped together. That was version 0.1. The idea was to take that and replace that with a complete protocol error which does everything in protocol. And that's what it has now. And it easily solves the timestamp resolution issue by just waiting one second. So it builds the manifest. It waits till the end of the current second, whatever that is, that might be 10 milliseconds. It might be 990 milliseconds, whatever. And then at the beginning of the next second it starts copying. That way if Postgres modifies a file, it will get a new timestamp by the next F-Sync. You may not see that new timestamp immediately. You may not see it immediately, but you will see it every start backup does an F-Sync and the timestamp should be F-Synced at that time. If your file system is not doing that, then you have a problem, which has nothing to do with Backrest. And your database probably is not consistent. So let's kind of run through the features here really quickly. I'm not gonna spend a lot of time on each of these slides. You see there's a fair amount of text and the slides will be posted and you will be able to go look through this. But I'm multi-threaded. This is huge. It's actually funny because of course almost immediately the bottleneck becomes your network link. So if you've got a gigabit network you can, with normal type compression, you can do about a terabyte of raw backup per hour. So if you've got a five terabyte database it will take five hours to do the backup with between four and six threads. But this is the bottleneck and the network is almost always the bottleneck now so it's kind of depressing. But 10 gigabit networks are becoming more popular and even 40 gigabit networks. So if you have one of those you can go beyond this. You can do terabytes per hour but you gotta have the network bandwidth to support it. It does local remote operation without any weird hokey SSH loopbacks or anything funny like that. So you can back up locally, you can back up to a locally mounted NFS volume, you can back up to a remote server, all kinds of options for configuration which are fairly simple. It supports full incremental and differential backups. And as you said before, since back rest is not susceptible to time resolution issues of our sync making differential incremental backups is safe. Now with our sync you can do this by turning on checksums. I wanna point that out clearly to be fair but that means on your 10 terabyte database you have to check some of the entire thing for every incremental backup. It almost makes the incremental backup kind of pointless but you can do it, you need to make sure to turn on checksums. It has backup rotation and archive expiration built in so you can give it a number of full backups a number of differentials and it will automatically expire the wall. You can also only keep wall for a more recent set of backups. So you can say I only want wall for the last three backups and older than that I'm willing to give up point in time recovery but I just wanna be able to go retrieve old data. So it'll automatically, it will keep the wall required to make that backup consistent so you'll have like sort of groups of wall that get retained and the rest of it will be thrown away. This is necessary for some people with really high volume setups. Back up integrity. Checksums are calculated for every file and stored and then rechecked when the restore is done. So like our sync will of course check some but it throws those checksums away at the end and you're left to wonder whether your file's at rest or good or not. Well, you never have to worry here and not only is that but the manifest itself is checksummed. So when the manifest loads it checks its own checksum. That file contains all the checksums of all the files and so you've got multiple layers of protection. The backup also finishes after it finishes copying files it waits until every wall segment required actually reaches the repo. So PG stop backup will return when it's successfully pushed all of them but this will actually make sure that they're on the repo and we'll see why that's important because there's a feature called async archiving which needs this. Backups in a repository are stored in the same format as a standard PostgreSQL cluster except that they're compressed if you have compression turned on. So but if you disable compression and turn on hard links you can actually snapshot that directory, mount it say, you know, a ZFS volume and then bring up PostgreSQL directly on that directory. It'll still have to do recovery of course. Don't forget about that but you can actually do that and for companies with terabyte scale databases this is really, really handy feature. And of course I do lots and lots of F-syncing to ensure durability file and directory level for every right. Backup resume, so if a backup is aborted in the middle it can actually pick up where it left off. While it's backing up it stores the manifested intervals so it's got all the checksums. So it'll go check some things that it thinks it can keep delete everything else and then resume the backup. So this is, you know, it still takes some time to recheck those checksums but checking checksums is a lot faster than compressing and transferring. And all that work can be done on the backup server and it's not producing load on your primary. So we don't see a lot of aborts but sometimes you get network failures, you have to kill it off for some reason, you know, whatever, nothing's perfect. Compression and checksums are done in stream. This makes it very, very fast. Nothing is ever done on a file at rest. So, you know, the file is picked up, I start copying it and I get the checksum, the size, and the, because the size can change, you know, in progress of course. And all of that is done in stream and then stored in the manifest. Delta restore, this is a really fun feature. So what this will do is look at the manifest. This is not the default by the way. The default is all your database directories have to be empty, table space, PG data, anything linked by default. But if you turn on delta, it will take the manifest, delete anything that doesn't exist in the manifest and it tries really hard to make sure things are really post-gust directories. If it's table space has to have the right name, it has to have the right directory name. If it's a PG data directory, then PG version has to be there or the backup manifest. The first thing Backrest does is copy its backup manifest. So it's gonna look for one of these two files that they don't exist, it will not run delta. So delta has a lot of checks in it to make sure that you're running on real PG directories and you haven't pointed off it, slash bin or something by accident. But still, you need to be careful. Then it actually checksums everything that's remaining and it does this multi-threaded with as many threads as you've told it to use and then it copies all the stuff from the backup that it needs. Those two steps are actually mixed up. It builds a list and it goes and checksums the file and if it doesn't match, it copies it and all that's happening in parallel at the same time. So it's not a two-phase process, it all gets kicked off together. It's really fast, really, really fast. The great thing about a Restore is your cluster's not running so you don't really have to worry about sucking up a lot of CPU and IO. You can suck up as much as you want because the important thing at this point is how fast you can get that database back to the state you need it. You're not worried about how many resources you use. I always tell people if you're doing a backup, just make sure it takes less than a day. You want to do some reasonable period but don't do like 32 cores of compression for your backup, do four or eight maybe. And don't chew up all the IO because it doesn't matter how fast the backup is as long as it fits within your schedule. It could just be running in the background. But Restores should be fast and they are. We've got really advanced archiving so it turns out that backup other than parallelizing, backup and Restore are really not the hardest problems to solve. Archive is, man, what a pain. So, Backrest includes dedicated commands for pushing wall so you're not using our sync or anything like that. We've got an archive push, archive get command. And the push will do fun things like automatically detect that wall segment has been pushed multiple time and deduplicate it. I'm not gonna go into it right now but there are a variety of reasons why wall might be pushed, the same wall log might be pushed more than once. Recovery stuff, network failures where it doesn't get the response from the archive or all kinds of stuff. So that's automatically taken care of. If the two walls are not identical and they have the same name, then you will get an error because that's bad. Something horrible has happened and you need to address it. That will show up in your Postgres log and you'd better be monitoring that, please. The other thing to push and get commands do is that they actually match your database against the database information that's stored in the repo, the version of the database, the control number, the system identifier, everything. So if you copy a config from one to machine to another and kind of forget to change the archive location or the stanza name or something, it will detect it and tell you that, no, you can't do that. I've seen this with other systems so many times where people copy a config, they forget to update the directory and they stick two wall streams together. It's not pretty. I see, and then the other thing you can do is you can do asynchronous archiving so this actually allows you to, so basically the backrest will take the file, store it locally in a spool path that you've specified and then it will start up a separate process to compress it and transfer it and this is extremely efficient and for people who have extremely high write volumes, it's actually critical. This particular feature has brought a lot of people to backrest because nothing else worked, nothing else was fast enough for them. It's not multi-threaded but it does maintain a connection and it runs in a separate thread and it gets stuff done pretty quickly. Full tablespace and link support so you can back up tablespaces and when you store them, you can remap them anywhere you want or you can put them in the original location or for development purposes, you could just tell it, well, just throw all the tablespaces over here because you have a different storage geometry so you're just gonna stick them all in one directory. Also, now file and directory links are fully supported so you can say link postgust.conf to someplace else or you can link pgxlog or you can link pgstat, you can do whatever you want. By default, when you restore, everything, all of the links will be restored as files and directories in pgdata. It does not restore links by default so you have to say dash dash link all and then it will link them or you can individually remap things but for safety's sake, because there's no way for me to verify those link locations like I do with tablespaces or pgdata, for safety's sake, it does not restore links by default. You have to tell it to do so. Although it will warn you on the command line, it will warn you that a link is being rewritten. Right, so if you do the restore, you're gonna see some warnings pop up and when you see those, you might go, oh, maybe I actually wanna go do something about this. And last but not least is Backrest has support versions down to 8.3. I know this sounds ridiculous but there's still a lot of people running 8.4 in the enterprise and in my experience it is not their fault. It's because they're running Cloudera or they're running the Red Hat Java network thing or they're running over the last year, many of these have been upgraded but there's still people running 8.4 out there in the field and by and large it is not their fault so I wanna make sure they're supported and that they can get a backup. And the backup interface has actually been quite stable over time so really it's not that big a deal to provide that support. I do have regression tests for every version that I support. Sorry, I just wanna see where we were on time. I'm doing all right actually. All right, so performance. Let's talk about, that's what everyone's interested in. So I gave you those numbers earlier, one terabyte per hour on a one gigabit link. These are some more specific numbers. In this case I wanted to compare directly with R-Sync. So I had to game it a little bit because R-Sync is not multi-threaded and it doesn't provide destination compression. So in this first test what I did was I just set the network compression to both to L3, level three GZip. So backrest if you do not have destination compression on will reduce the network compression to improve performance because you get most of the compression without a lot of the CPU. If you have destination compression on then the default would be L6 because it's gonna transfer in store. So in this case we can see R-Sync is actually faster. One thread, level three compression, no destination compression turned on which is the only thing you can do with R-Sync. So R-Sync is actually faster here by a little bit. R-Sync is written in C, it's been optimized for a long time and it was written by someone who's smarter than I am. So there you go. But the equation starts to change pretty quickly here. Now we do two threads. So same compression and now we're at 84 seconds and R-Sync is now NA because you can't do that. But we are 1.4 times faster than a single thread. Then we turn on destination compression and what I did here with R-Sync was I did the R-Sync and then compressed the files at rest. I know that's not very efficient but that's the best comparison I could come up with and a lot of people do that. So here we are, 3.34 to 5.10, one and a half times faster and then we go to two threads and we're three times faster, almost double. And of course now we're back to a big old NA on the R-Sync side because it can't do that. And of course you can just keep scaling up. So good stuff. All right, so demo. All right, I have to say I'm a bit of a chicken when it comes to demos. So I actually scripted my demos. I have a piece of code that, ooh, that's awkward. It would really be nice to have that top line. That's gonna be a problem for me. Hang on a second. Let me just, I'll back this off and I'll just try to make the screen as big as I can. We're gonna have to go with this because you will need to see the top line. It's very important. All right, so let's start up the demo. So the, maybe? I don't think anything happened. There we go. All right, so here's the demo. Oh, we're also missing stuff off the left. Geez. Right, now we're good, I think. So this is gonna go ahead and what we do here is we create a cluster. So I have just recently upgraded this demo to 9.5. Yay, because you're all running 9.5 now, right? Everyone, anyone? Anyone, really? Cool, okay. I haven't upgraded anything to 9.5 yet, so. But I will be. It's coming soon. But I like to wait a little bit. I'm the cautious one. So I'm gonna create a cluster here and then the first important thing I do is create my backrest comp. So I'm gonna set the repository path. I'm using Vagrant for my virtualization, so I'm gonna do this locally in Vagrant and then I'm going to set up a place to, this is where I created the DB. I didn't create it in the normal places where you see databases on different distros. I used initDB to put it exactly where I wanted to, right in my demo directory. Now we're gonna perform our first full backup. So it's exciting. You might think this is slow for backing up an empty database. Part of the problem is I have a MacBook 12 inch, which is a lovely machine, but it is not fast. And the other thing is is that backrest has a kind of a high startup cost and it also has to wait that second, remember. Once it gets cranking though, it's very fast. So yes, if you have a very small database, your backups will take two to three seconds. They won't take a half a second. But if you have a very large database, that will be lost in the noise and you won't think about it anymore. And if you have a small database, then two or three seconds, who cares? Maybe that's just me. So here's our original database size. We can see our backup directory here. We've got a backup. We've got a history directory, which holds old manifests for doing forensics and stuff. We've got a backup info file, which holds our current set of backups. And then we've always got a link from latest to our latest backup. That's just for your convenience, so you can go and see what the most recent backup is. And now our backup is down to 5.7 meg. Yay. Obviously, that's pretty unrealistic. The database is empty and it's mostly zeros. You're not gonna see that in real life, that level of compression. Unless you have mostly zeros and then why are you backing it up in the first place? So and then we can see our archive directory. We've got some archive logs. This is the first one that Postgres generated. We've got our backup and then we've got another archive log. So every backup will include one, at least one wall segment, right? Always one, it has to be at least. It probably will be a lot more, depending on how busy your database is, but you have to have at least one. This is true even if you don't think you're writing to your database. Like, oh, we've got to read only database. We don't have any write load on this. Doesn't matter. You still need archive because Postgres is changing things, even if you're not. So you will need this. But Backrest, of course, takes care of this for you. And again, we have compression as well. Now we're gonna do our first differential backup. Database size has gone up to 69 meg. This is because of some extra wall segments that have been created. Has nothing to do with anything in the database. But our backup size is still 5.9 meg because we haven't really backed up a whole lot. Probably all that changed there was PG control, PG stat, like a couple of things. Now we can see we've got some more archive. We've got a new backup, new archive. Okay, so here's what I like to do. If I'm doing a release, database release, the first thing I like to do is I like to take a backup as close to the release as possible. So if something goes horribly, horribly wrong, I can recover that backup and I can play as little wall as I need to. It depends on how much wall your system generates. But I've worked on systems that generate a terabyte of wall a day. You don't wanna replay a terabyte of wall, it sucks. So you wanna get as quickly as possible. So do an incremental before the release window. So here we are and we can see that we've got that incremental. And now it's release time. So the other thing I like to do is before release, I will always set a restore point. That way I don't have to think about, well was it 601 that we started the release or was it 603, you know, you don't have to worry about it. You set a restore point and you recover to exactly that point. The other thing you should really do, and I keep forgetting to put it in the demo, is you should run a PG Switch X-Log here. Because right now this restore point is still sitting on this machine, right? So if you run a PG Switch X-Log, it'll move that off and it'll move it to the archive so that you're actually able to play forward to that point. It's just a safe thing to do. If you have a replica, of course, the replica will have already seen that and all the good stuff. But presumably at this point you're shutting down the replica or you're, you know, once you do the upgrade, of course, that's gonna hit the replica. So that's not a backup. If you need to roll back, you need to come here. All right, so we do the release, yay. And earlier, I forgot to tell you I created this test table, just so we can write messages in and see where we are in the whole process. All right, I'm going to, yep. Okay, so we set this test message after release. So now we know where we were. I think I had written one that said before release earlier and I just forgot to mention it. So we did the release, we insert this message, yay, we're done, it's time to go home. Or maybe not. So QA says the release is no good. They want us to do a rollback. So we're like, okay, we know how to use backrest. This is pretty easy. We do backrest, stanza main. Stanza's sort of basically the configuration for a cluster. The reason why it's not called clusters because a cluster can, you know, this actually encompasses a group of clusters. So you use this stanza, you know, on your primary and that's where, you know, how you do archive push and where you know it's gonna land in the repository. On your replica, which is a different cluster, may not even have the same name. You're gonna use that same stanza name to get archive, right? So this is how your entire databases is related to each other and that database may include multiple clusters. So we did not use dash dash cluster because we thought it would be confusing. Although stanza is kind of confusing too. It's a standard backup term though. And we're gonna do a type name, target equals release. This is that save point that we created, restore point rather. We're gonna do a dash dash delta restore because we're adventurous and then restore is the, finally, the command. We run it and we immediately get an error. So Backgres does everything it can to keep you from stomping on your foot, right? It really does. I mean, you know, there's only so much you can do but one of the really easy things you can do is see if Postgres is running. You cannot restore a database when Postgres is running. This is foolish and dangerous and it just basically won't work. So check to make sure Postgres is not running. So now we stop the cluster, we run the restore again and it works. You'll notice that restores are a lot faster than backups because it doesn't have all that setup to do. So we do the restore and then what I've done here is I'm just showing you the recovery.conf. So one of the cool things Backgres does is it writes your recovery.conf file for you. You don't have to do it. But if you're prudent, you will go look at it, right? Just make sure that this is what you expected. Maybe you've fat-figured something on the command line. It's a lot better to go check this. Now our restore took less than a second but you could have a big restore that takes you 10 minutes or an hour. So you wanna make sure you get it right. Once you start that thing, you're committed. If you mess something up, you have to go back and redel to it. In this case, we can see the restore command is there. Backrest archive get. We've got our recovery target name is release and so we're going to start the cluster and let it rip. And we get back to, now we check our test table and we see we are back to before release. So worked. The release has now been rolled back. So who needs rollback scripts, right? You really need to roll back. Has anyone ever seen rollback scripts really work? I mean, they're crap. You spend a lot of time on them and when you're in a real crunch, they don't work. How about this? I know this works. Lots of people know this work. We've spent a lot of time on them to make this work. So use point in time recovery for your emergency rollback of a release. Don't try to run rollback scripts there, just a nightmare. And we can see here we've actually gotten, so now we've actually, this is interesting. So who's seen a history file? Like one of these, oh, Magnus. Okay, a couple of people. So what this means is we've moved to a new timeline because we basically told Postgres to stop before it got to the end of timeline one. So it says, okay, that's fine. I'm going to diverge and we're going to be now on timeline two. So now you're on timeline two. The thing that's important to remember is our backup, our last backup is still on timeline one. So if we want to do a restore to timeline two, we have to tell Backgres that because Postgres will always recover along the timeline, its own timeline. So if the backup is on timeline one, it'll go to the end of timeline one. And then it'll actually switch to timeline three because it'll see that two is already taken. It can get a bit confusing to be frank. All right, so we do the rollback. That'll work great. They start the app and then some really important data gets written into the database. Remember this, this'll be important later. Then for some reason QA says, okay, no, we screwed this up. The release is actually good. We just want to roll back to where, go back to where we were before. We made a mistake and we've got an end of the month deliverable that we must make or the board is going to be pissed. So rather than reapply all the database scripts, we say, well, hey, we'll just go back to where we were before using point and time recovery. So we can do that. We just do PG Backgres main delta restore. Why don't we have to do anything fancy here? Because as I said, PG Backgres will always replay on the same timeline as the backup was made on, right? So it's gonna play to the end of timeline one. Now timeline one includes our restore point, we know our restore point in it, but we want to go past the restore point and we just want to play all the way to the end. So we do that and voila, we've gotten back here. So now we're after the release again. Nice and easy, no one got hurt. However, now we got a problem. So when the app was started up, this very important update was written out and now we don't have that important update anymore. So people are panicking because some customer gave us their credit card for this huge account and it would be embarrassing to go back to them and who knows. So now we need to go back and recover that data. Now at this point, I would actually go to another machine and do this, right? I would leave the primary the way it is. I would go to some recovery server or something and I'd go retrieve this data and I'd PG dump it out and I'd pull it in with P-SQL and I would do it that way, right? I would not do this on the primary. But here's what you would need to do. The data that we need is on timeline two. Our backup is on timeline one. We can't take a new backup, of course, because we need to go back in time. So that won't work. So we need to use the timeline one backup but somehow get on to timeline two. Turns out that's really easy. You just tell backrest you want to be on timeline two. Backrest hopefully writes the recovery target timeline in here for you. You do the restore and shoot. Sorry, hang on a second. I can actually scroll back. And we get back to our very important update right here. So we've got the data. As I said, this is happening. This really should be happening on another server. Just let your primary go, recover your data and come back to it. So and then, you know, Backrest has some basic info functions. This is for the text version. Basic info is just for human monitoring. Like you want to go to the command line and see when was your last backup taken, when's your oldest backup? Some things like that. We will be expanding the amount of data here and formatting it nicely. I have someone who's gonna be working on that soon, which would be cool. But this is really just for a quick warm and fuzzy. If you really want the kitchen sink, then you want this. This is the JSON output. So same command. But you say dash dash output equals JSON and then you get everything. This is everything that Backrest knows about your repository. All the backups when they were taken, archive start and stop, locations, version numbers, the original size of the backup, the delta size of the backup, the size of the backup in the repo, the delta of the site, it's just, it goes on and on and on. And so if you're gonna build a monitoring solution around this, then you would probably want to pull this JSON and do it that way. We do have a standard Nagios alert, which we're also working on just for the people who use Nagios, because there's a lot of people which includes me. So I'm gonna put a Nagios alert in for that reason. So that is the end of my demo. Let's see here. So yeah, that's all I got. I think I will finish a little early today, but are there any questions? I'm sorry? Okay, so the question was, can it back up from the slave? The answer is no, not yet. We're also working on that. That's actually, the other things I'm talking about that we're working on, or we're working on in a way that we will do them at some point, backing up from the slave is actually one of our top priorities right now. So we are working on that, but currently no, it is a really commonly requested feature though. Absolutely, it's very important. And it's great because it reduces load on your master. So we are working on that, but it's not there yet. It's pretty much the same thing. Basically, it's just less configuration because you've already got Backrest installed, Backrest knows everything about your system, so it can kind of take care of it. Async archiving also has a very interesting feature, which when people hear about it, it scares the crap out of them. So I wanna show a hands of people who are scared by this feature when I describe it. But you can actually, when you're doing the Async archiving, you'll have a spool directory where that queue of files will be kept. You can actually set a maximum size for that queue. And what that means is when you get to that maximum size, Backrest will start dropping wall on the floor. It'll tell Postgres, I archived it, but it'll just drop it. Now, it starts sending real nasty grams into the Postgres log at the same time, right? So you get a lot of unpleasant messages even leading up to that. But basically the idea is, and it's interesting because everyone's scared when they hear about this feature, but the funny thing is a lot of people use it. And the reason why is that it's better than having your primary fall over. So what happens if PGX log fills up? Postgres panics, that's it. And even getting Postgres restarted can be a real nightmare. Which PGX log do you delete? How do you prune them? How do you do these things? There are ways to do these things, but now you've actually kind of like gone into the realm of wizardry a little bit. Something someone suggested to me last night, we were talking about this, and most file systems will have a 5% reservation for root. And that's gonna be your savior, right? If you have that and you know how to work that, then that can help you out a lot. You can like bump that reservation down, and that'll give you an extra couple of percent in a lot of cases that can give you gigabytes. But in general, if things are happening in the middle of the night, you just don't want your primary to fall over. So it's better to drop the wall, forget about it, and rather than have a dead primary. And so people are willing to make that trade. And that's another thing it does differently than PGX log, for instance. But yeah, I'm actually looking in, we are looking at integrating with PGX log as well. So you can actually set a perceived X log on your backup server, and it would pull the X logs from PG receive X log. Still check some of them, check them, do all that kind of stuff. But so that's another project that we are looking at quite closely. But that's right now, that's the major difference. And PG receive X log is synchronous, and this is asynchronous. So you can actually build up. What happens is you get these, sometimes you have people who have these enormous volumes of wall, but maybe they'll come all at once. So they'll build up a big queue, and then they'll work it down, and they'll build up a big queue, and kind of work it down. And that can be on a separate storage device, there's all kinds of great limitations you can put there. So even if you're not gonna use archive at MaxMB, you can put it on a different storage device. If that storage device fills up, Backgres will stop taking logs, and then PGX log will start filling up. It does give you more time, but it's still a problem. So it is a feature that is actually used and liked by many people, and I mean, I wrote it and it does still scare me. So if you were tempted to raise your hand there, then you're not alone because it's a scary feature, but it's better than having your master follower. Yeah. Okay, the question is what is the distinction between a differential and incremental backup? Good question. I did not know what this was before I started working on this project. Steven Frost kept telling me, you've gotta have differential and incremental. A differential is an incremental that's always done from the last full backup. Always. So let's say you have a full backup. You do a couple of incrementals. You do a differential. The differential will actually go back and reference the last full backup. This is actually really cool. I mean, one of the, I don't use incremental backups a lot, to be honest with you. I like full and differential, and what I'll do is I'll have full backups weekly or maybe bi-weekly, and then I'll have a rotating set of three differentials. So I'll set my differential retention to three. That way I know I've always got three days of, I can go back three days without having to go back a week. It's all about wall replay, right? You don't wanna go back too far or you've got a lot of wall replay. So I'll set up those kind of weekly backups and then I'll have a couple of differentials too or maybe three that are kind of just moving through time. And so you've always got a couple of days of differentials. They're not that big because they're differential from the last full. And with incrementals it's a lot harder to expire them because they depend on each other, you know, all this kind of stuff. So incrementals are harder to expire, differentials are really easy to expire because they have few dependencies. And you know, some people will go with the complicated full backups and then the occasional differential and then incrementals based on the differentials. And when you expire the differentials, the incrementals that depended on it will go away. You know, so you can do that type of rotation scheme but I find the full plus differential to actually be the most efficient. It kind of gives you the best bang for your buck. Magnus. It's actually, here's the thing I think is that it's not complicated to implement or it would not be complicated for me to implement because I understand how this stuff works really well and I could probably do it with very little effort but I think it would be really hard to configure and for people to actually understand what was going to happen. So the, Magnus is right, you know, if you know you're going to do say a big load on the weekend, you know, you've got specific time windows when you're doing these big loads you could tune that to do the right type of backup at the right time. I'll be honest with you, Magnus, it's not something that's on the radar. A lot of what goes into backrest comes from suggestions from users, even potential users. So obviously we can't implement everything that someone asked for but if we get the request a couple of times then we go oh, this is something that people need out in the field. If we never hear about it then we assume people don't need it and we only implement it if we want it. So yes, if you have an idea, feel free to go to the website, go to GitHub. All my stuff is here and everything will be up on me. Well, not anymore, it isn't. I haven't found the conference page where we're supposed to put all our stuff yet but when that actually appears I will be putting my slides up and here's all my info. It's really easy, just type in pgbackrest. When you type that in you're going to get the first two hits will be the website and the GitHub page. You can go there, you can submit an issue and I will look at it and I will engage you and we'll talk about it but generally speaking if it's not on the backlog on the website there's also a backlog on the website. Click on backlog it'll give you a whole list of the features we're considering in order like priority order. So if you don't see something there and you think it's important, send me an issue. I'll take a look at it. It doesn't mean I'm going to implement it immediately but knowing that people need it and want it is extremely important. If there's nothing else then let's go ahead and wrap her up. All right, great. Thank you very much. Thank you.