 So now that you've all been warned not don't dare ask me a question during while we're speaking while I'm speaking I actually would like questions So if you have a question pop it up, and if I think it's gonna take too much time. He'll just yell at me later, so So so my name is Scott Mead, I'm a senior architect with open scg We are the open source consulting group. We do services consulting around Postgres and other open source data tools We're one of the platinum sponsors of the conference, so come up and say hello We've got some cool things to show you But what I'm here to talk to you today about is backups, right? So this is basic. Let's go over in Postgres Who what where why when you know when when it comes to backups and you're new with Postgres or with anything with backups You always kind of feel like well, I really would like to go fishing I think I'd like to do some fishing from the beach, but I'm not quite sure how to do it So my goal is to teach you how to fish And I want to get you from here to at the very least here where you feel like you can go out on a Busy day on the water and be a fisherman or woman. So we'll that's the goal I'm gonna show you a couple of tricks and a couple of basic things you can do You'll notice I'm gonna do a couple of commands on the screen for you I always forget the options to every Postgres utility So I'll show you some of the ways to find them quickly and and how to get to it But what I really want to do is at a high level again. We're talking about backups So on the agenda now this slide is not a testing slide number one for backups And everybody always starts with how do I take a backup then they go to how do I restore a backup and then they stop So number one is testing the most important thing you can do for your backups is test them and test them frequently You know, I don't know how many times I've gotten into a place doing a health check or looking at whether it's Postgres Or Oracle or a file system or something and we realize that the last two months the backup was of size zero bytes Or it's on a four terabyte data store. The backup is you know, 500 megabytes. There's something wrong So testing and monitoring are the number one thing you need to be thinking about So what I want you to be thinking as we're going through How I take a backup, how do I restore a backup? What are the types of backups all those things always in the back of your head have testing as Number one because it actually is the most important thing. I have a very good friend. I don't think he's here right now Got out of the big white beard Walker. His name is Peter Steinhauser He's the grumpy old DBA that turned me into the grumpy old DBA I am today And his favorite saying was backups are like parachutes if it doesn't work the first time Chances are you're not going to need it again So Making sure that you're testing testing frequently Is important and and the thing about testing is not just testing it but making sure that it's telling you When things are wrong now if your test depends on a human or a set of humans every single day Or every single week running these 10 commands in order and Validating it I guarantee that it's going to happen three times They're going to do it three times and then they're done. They'll never do it again. That's just human nature Maybe it's DBA so sad men lazy nature, but it's the way we are So it's important that you take those things you think about how am I going to automate them and how am I going to have the Automation tell me when there's a problem, okay? and Now that I've said that let's actually move on to To some of the technical things right so so when you're looking at backups You've actually got a couple of there's one really big choice you have to make and you kind of have to look you know how Do I have a copy you have to make sure you've got a copy of the database somewhere? How much disk space do you need? It's an important one right a copy is a full copy of the database You should have multiple if you have four terabytes. You don't plan to have five terabytes You plan to have four terabytes while you plan to have maybe six terabytes for the master six terabytes for the slave 12 terabytes for backups minimally and then you've got tape for offline retention and everything else So you've got a plan way out ahead So how much space do I have an automation automation automation? Automation do not depend on anybody running a backup Or doing the testing of those backups automate the living daylights out of them They're very simple to automate and you'll see as we get in the command line tools and how they work how easy it actually is Backups are one of those things, you know, I I grew up as a systems administrator and so system admin DBA and I played developer for a little bit too But the one side of the house that backups really matter for is the business side of the house and usually it's more so than the technical side You know at the end of this talk every single person in this room will be able to take a backup of any Postgres database Anywhere on the planet like that. It's that easy The the question is is, you know, the business wants to know how many days do I have available? How many days of backups are we keeping online? How long does it take to restore that backup if I have a backup and it took me 12 hours to make that back up But it's going to take me 18 to recover it And that's my only strategy in the event of a failure. It's probably not going to fly when most businesses So that's where we start thinking about what strategy we're going to pick And then how much data can I recover how much loss do I have if if I'm using certain type of backups by the time It's done. It's already 42 hours old. It's kind of limited in its usefulness So as we're talking about the different types of backups we can do those are the kind of goals You want to be thinking about right and you kind of have to at the end You'll see you really have to marry multiple strategies together in order to get good backups But we'll get in so from a technical side of you from Postgres's perspective, right? I talked about a lot of things But we're here for Postgres. So Postgres has really two main types of backups And these aren't just specific to Postgres, but this is what Postgres does. You have logical backups and physical backups Logical back when you first start with backups and usually if you've got a new app that you're deploying logical is where everybody starts It's simple. It's easy a lot of tutorials do it PG admin does it for you A lot of great ways to easily get logical backups Essentially what a logical backup is it's like doing a select star from every single table in your database and dumping that to a file That's really what it is. You got to think you're the the tools connect to Postgres as if they're a client They're a C-based client. They select star from everything and they store it in a file In addition to that you're also getting all of the other meta objects So you're getting what the create table statement looks like you're getting any views procedures Grants permissions anything you need to make the database go from zero to What you have is in those logical backups The nice thing about logical backups, they're very highly portable So what do I mean by that if I take a logical backup from a Solaris machine? I can move it to a Linux machine if Linux 32 if I go from Windows 32 I can go to Linux 64. It doesn't matter. I can cross it's basically just logical view of my data There's a few tools that enable us to do this. All right, so here's the copy command Copy is essentially it's it's an underneath the covers You'll very rarely unless you're doing some stuff with ETL Most people don't actually get their hands directly into the copy command, but it's actually a server side Very high speed load and unload mechanism. So it'll basically take a table Simulate a select star from that table But it dumps it out to a flat file on disk with you know a tab delimited or a CSV or whatever your delimiter is You can use the command directly and if you're doing ETL I recommend you do because it's the fastest way to get data in and out of Postgres But the reason that I talk about it here is because all the remainder of the backup the logical backup utilities that I'm about to talk about Use the copy command So it's important that you know that it's there Because it is pretty useful It's it's nice if you ever needed if you're ever writing reports And you just needed a quick CSV file that the business can open and excel and do whatever they want with Copy command can do that for you. So take a look at the documentation. It's pretty nice And it's got a lot of nice custom antlers. You can say okay for null data quote it like this, sir You know, however, you want this a lot of flexibility So backups so before I should talk about backup There's one concept and I build this as a beginner talk. So I'm going to keep this at a very high level There's a very important concept in Postgres called MVCC. It's multi version concurrency control You can get into this with some serious math some serious Academic paper reading and if you're really interested pick up some some relational calculus books some time and have a fun read on a weekend But at a high level, what do you need to know is is multi version concurrency control? Applies to every single statement that I run inside of Postgres. So when I run a select statement What happens MVCC gives me a snapshot so that if that statement takes three days to run When it's done, it's going to look like it was when it started the transaction So multi version concurrency control gives me a snapshot when I start my my statement or my transaction The nice thing about the way Postgres does it is if you've got concurrent activity reads and writes and everything going on You're not going to block one another So I if I have a 10 terabyte database that's going to take three days to do a logical backup of I Can do that while activities going on in the database when the backup's done It's going to look like three days ago when I started the backup and that'll be consistent You won't have missing foreign keys or anything. It's a consistent backup and it runs the life MVCC is the super magic That makes all this happen underneath the covers And there's a lot of talks and trainings that are available if you're interested in the guts of how that works But it's a little bit much for this right now So before I get into the actual tillers, are there any questions at this point? Cool, I've explained everything perfectly All right, so the first tool to look at is actually a tool that I don't really recommend For most by the way, just so you guys know I know everybody's taking pictures of slides feel free I'm also going to post them afterwards. So feel free, but if you don't want to you don't have to so but Basically with PG dump all what this does is connects to a Postgres database instance or cluster and it's going to dump all of the data in all of your databases into Basically standard hour. That's what it does So if you run you got to be careful with PG dump all you don't have as much flexibility with some of the other tools But if we look very quickly, this is a Postgres database. I have set up here. It's got a number of Oops, let's do a slash. I'll see if I there we go. So I've got a number of databases in here I really like Making random names for things But I've got a bunch of databases in this instance if I run PG dump all it's going to basically Copy all of the data from all of these databases Into a single backup for me that can be nice, right? It's the whole instance is backed up. It's great I don't have a lot of options with it though I basically get kind of all or nothing you can write a script and do a couple other things But it's it's kind of limited in flexibility The real reason that PG dump all is important is because it's operating at that global level PG dump all is how you get backups of your users and your groups and your table spaces. So The the magic switch to use is PG dump all and it'll scroll to the top of the screen in a second globals.sql minus g Minus g is globals. This is really important that you do this because what this does is this actually connects it Like I said, it's not getting your data and your databases This makes sure that the definition of your users and your table spaces and your groups things that are global to that Postgres cluster are backed up. So if you look at your globals of my globals.sql It's just a plain text backup. It's a sequel file that I can pump right back at the database But it's got my roles. It has the encrypted the hashed version of the passwords And so I can restore this to another database instance and basically gain back those users So a lot of people forget this step This is usually the first step in my backup scripts is a PG dump all dash g is you just it's a small file and here It's only 28 lines. I only had a couple users. It doesn't give a lot of data But without this if you try to back up your other databases individually, you're going to lose all this really important stuff All right So for me, this is small But some people have hundreds or even thousands of users and being able to save the definition of that's important Okay, PG dump all dash g is nice because it is only on the global objects So it's very fast. So as you saw here, this database is about two and a half gig But it's not scanning it right so you can run this live on Most production Postgres databases and get something back very quickly and you're backed up So it's good. You know a lot of this stuff doesn't change frequently I some people go transactionally adding users and woo-ha, but most don't so you know I once a day is usually okay with this kind of stuff, but it is critical that you do it Okay, so It's writing against writing plain text sequel it actually writes it to standard out So if you just run it without the redirect like I did it's actually just gonna spit it right to the screen Which can be useful you can pipe it through some things you can play with it. It's nice It's easy to read that the great thing about this is if you Wanted you could actually dump this modify something and then push it back to the database So this is also a quick way if you have to if you can't remember you know I say teach how to fish. I don't remember the create role statement ever So I run PG dump all dash G and look at what it's doing and then go for the version of it All right, so it's kind of a way to get a quick help It's a quick and easy way to do it. Okay It can dump data too. So if I just ran PG dump all with no flags It can it'll dump every single row and every single one of my databases into that file which would be a lot of fun For my two and a half gig, but you know when you get bigger it gets a little scary okay, so So okay, I've used PG dump all but only for globals don't use it for databases. How do I back up my data? That's what we're here to learn about. This is where you need PG dump PG dump operates not at that global level, but it operates on a single database in your Postgres cluster So what we're gonna do is we're gonna Turn on PG dump. We're gonna say connect to database ABC XYZ Postgres whatever it is And dump the stuff out now the really nice thing about PG dumps got a lot of nice things about it First of all, it is as of 9.3, which hopefully everybody's on 9.3 at least or greater I know not everybody is But as of 9.3 plus you get the option to do a parallel backup So you can actually say use 10 jobs and do the backup and actually run it a lot faster It it makes a really big difference. So it's a nice feature But the other thing about PG dump is it's it's object selective So what I mean by that is that hey, I could either dump everything in the database Which is the default or you know what I really only want to dump the users table So maybe I need to refresh dev and I only need the the latest users data So I can do a PG dump and say just give me the users table or I could say give me everything Except the users table it gives you a lot of flexibility how you get your data out of the database if you're not writing It's kind of like the ability to Write a select statement store it all on a file and move it without having to do that All right, you can just grab the data and go and There's a lot of flexible output options. So so the default is plain SQL which you just saw me do We'll do it in a second again, but just plain text file You could take it open it in an editor modify it a lot of times if you're using standard stuff you could take it and run it against Oracle or My SQL or yeah, you would never do that. I mean geez, but you could right? So you can put you can actually use PG dump as a method of schlepping data back and forth between You know different systems With that though, you know plain text format comes with some side effects It's plain text so you also get the custom I call the compressed option which creates a nice single backup file Gives you a lot of flexibility and I'll show you how we do that a little bit You also get directory and tar format which are very very similar to the custom format They just have a couple of different idiosyncrasies to them so I'll show you how to use them all and What we can do okay? So so how do we make a backup right? So what I'm gonna do is PG dump basically is just a network client to Postgres It's just a C-based client connects into the database just like any other application would and run some queries So you need to give just about anything that connects to Postgres You have to give it really for major things the host report user in the password Most of these things have defaults or they have implicit defaults that are compiled in you can set environment variables And in this case, I'm just gonna run if I run PG dump Format compressed or custom that's that dash fc. So it's gonna be a custom format Dash v is verbose. So basically give me a status bar. What are you doing right now? What are you working on as you're doing the backup and? Just like PG dump all it writes it to standard out So I'm just gonna dump it straight into a file called demo dot backup I just told you that you have to provide all these these four parameters, but I didn't provide those four parameters, right? So there's some implicit things with the Postgres command line utilities here It's actually the host is local or it's actually using a Unix socket But it's basically without a host being specified. I'll connect your local machine the port It's connecting to the default. It'll use environment variables or the defaults 5 4 3 2 The user in my case. I've got it. It's Postgres It's the default users connecting but you can switch which user and the password I have mine set up to not require passwords. It's a super secure laptop But you you know you can provide a password on the command line or if you're automating these things, right? It's like you just told me I have to automate everything. I can't put a password in the command line You can use something called a PG pass file So you can basically lock it up in a secure file and the user's home directory and it'll it'll do that for you automatically Okay, so let's take a look at doing a backup. So I Go to my databases here. I'm going to list all the available databases. I've got I've got a couple This Postgres database. I've been using I've been running the Postgres benchmark tool PG bench inside of it So if I look this is a little bit messy output But if I look at that specific database and say Let's see Postgres database. Where'd you go? I can't read that Here it is Postgres database. It's about 1.5 gig That's a messy output because the screen is a little compressed there But so we're gonna we're gonna do a 1.5 gig backup of this Postgres database Let me move my term. I realized terminals a little squished here Get all this junk off the screen and see pictures of my kids All right, so so what I'm gonna do is PG dump Format compressed or custom of the Postgres database. I want the verbose flag I want to see a status basically. What are you currently working on and go into a file code postgres dot BAK? There's no really common extension. You can use whatever you want there But this is actually a binary file is spitting out. So what this is doing is telling me, okay This is the table. I'm working on These are all the schemas these are all the objects that I dumped out and I can go through this log and it'll show you if there's errors They'll pop up here This is really great for a small database like this. Yeah question It's a good question. Oh, just give me one second. I'll get to that. The question is about what's it? Yeah, okay, so well, there's the question is about you know doing it this way Versus doing it through like PG admin. Okay, so just give me one second The so basically I get I get to see the status. This is you know really Really short little backup, but if I'm running a backup on a big database It's nice to be able to see what it's currently working on right so I can come back to and say, okay Well, I'm backing up this table right now. I know that that's the largest table in my database and You know, it kind of gives you a mini progress indicator What do I get when I'm done? With my compressed format backup Postgres dot BAK is 29 mag. So I dumped 1.5 gig down to 29. That's pretty good Now it's cheating Because inside of that postgres in that 1.5 gig also has indexes and my backup doesn't have the indexes It just says hey you have to create an index here So the backup isn't storing these large physical indexes. It's just storing the record of it So you get some really awesome compression out of it. The problem with that is when you go to restore it You have to rebuild all those indexes So it's they're easy to dump out. They're very fast to dump out, but the restoration is gonna be a lot longer So, you know a lot of people will say oh well Hey, I took a backup the other day and it took 45 minutes. How long is it going to take to restore? The answer is who knows And it's really around indexing so the nice thing is is when we do a restore We can do it in parallel and it will actually parallelize the create indexes as possible So if you've got 64 CPUs and you give it a parallel of 15 you couldn't create 15 indexes in parallel Which is kind of nice to be able to do But that's usually with the logical backups You know we there's the other type of backup physical which we'll get to in a few minutes But with logical you're always unloading the data into a separate file and when you want to restore it You have to reload it. So you're always going to pay that price with logical backups Um, no So it's that's that is a challenge is you know, I we talk about testing and validation of our backups I actually happen that this happened at a client a little while ago. It's about a Two to four Tara. I don't remember the exact size two to four Tara by database and we did a PG dump And it came out to like a hundred gig like Okay, I think that's okay I know it's heavily indexed the only real way to do it is to restore it And then you got to do row counts, right? So we'll I'll show you some other strategies for testing, but there's not a real good answer for that Especially for large shavings question No, you don't lose them. So it it stores a record. So what it does it stores the create index statement So it'll say inside of this backup. It'll it'll there actually be a line that says create index So but the problem is is for large tables you have to actually create the index, right, which is a pretty heavy operation Yep, yep, so this this backup would give us a 100% restore including the indexes I actually did one of these talks I did Something similar that I had somebody get up and run out and come back scared Okay, so let's just look very quickly so that that's the compressed format But you know, I did tell you it's plain text sequel. It's easy to modify. So what does that look like? So I'm gonna I'm just gonna get whoops get rid of that format compressed just run PG dump and I'm gonna make this postgres.sql and It's gonna dump out it actually The the backup procedure itself Can be a little bit quicker because you're not compressing and There we go. So if I look at let me just do postgres star So oops Make this human readable because it's as much fun as that is so with this the plain text sequel backup is 959 meg. So it's it's pretty close to the actual size of the database itself Whereas the compressed versions 29 meg All right So and I'll show you there's a couple other reasons why you should be using format compressed or format custom backups There's a couple things you can do on restore that are liquidy split and cool But it's harder to edit those so if this one I'm not going to do it because vi will not have a fun day But you can open this with vi or a text editor and go. Yeah, I'm what the heck right while in Rome So we'll open this thing in vi Give it a second or a minute. There we go. It's just plain SQL. So if I go and look for my data So I hear it to see create type So I had a custom type in that database. So it's got the create type statement So if I wanted I could copy and paste this and run it and use it or I could modify this Right, it's just a nice way to get a full full backup and you'll have all of your data will be in here as copy statement So if we look for the string copy There so copy event log So these are empty tables, but I would see my data would be right here With this backup. So there we go. So So I have a feedback table and I've got a bunch of stuff in it And that's actually all the raw data is tabbed eliminated So there is I can see it straight away if I want to with PG dump I could actually have it dump it as insert statements So if I wanted to use this to move again to a different database system or something I could tell PG dump. Hey dump those as inserts and it'll write insert into blah blah blah blah The problem with the plain text backups in some cases You'll actually get to be larger than the actual database because it puts comments in there. It's putting spaces It's putting all kinds of great things. So in some cases you can actually get a much larger backup than your actual database Based on what you're asking for So This is where things get a little hairy. So we've talked about the different formats, right? We've got plain SQL custom, which you've seen directory and tar They're just different flags you switch and they look a lot like the custom backup This is this is really critical and I'm going to get to your PG admin question in just one second with this But when I use a plain text backup, how do I restore it? You have to use P sequel or PG admin or any tool that can read SQL and send it to the database When you use a custom directory or tar backup, you can't do that. You have to use the PG restore program It's a separate program I don't it drives me nuts that you have to do distinction between the two PG restore will actually if you run it on a Plain SQL they'll say hey, this looks like it's a plain SQL file. You should run that using P sequel So it gives you a hint, but that's kind of the number This is one of the things that really trip people up all the time So plain SQL use just a sequel interpreter P sequel is the quickest way to restore for any of the other formats You want to use PG restore, okay? And no matter what you do with all of these formats you have the selective option available So if I only want to you know, take a backup of one table, I could just do dash T So what I can do here is I can say PG dump Dash T PG bench branches from the Postgres database And It's going to dump just the PG Bench branches table and its data So I'll get there's if you look at the sequel it creates it actually gives me the create table It fixes the ownership and then it's got the table data So it's got the copy command and here's all the actual data for that table So I was able to select just that one kind of kind of handy I could also if I use a capital T I could dump everything except the PG Bench branches table and and here's one of these areas where I want you to be This individual right where we're learning to fish There's a million options to PG dump. There's a million options to PG dump all PG restore all of these PG dump dash dash help It's got a great help menu that tells you all the options available. I'm not going to go over them I just want everybody to memorize dash dash help and that goes for every command line utility in Postgres So so get familiar with dash dash help. It will make you feel like you can go fishing on any day In any weather and come up with something good, right? So So dash dash help, okay So with the compressed in the custom backup, why do I prefer using that first of all it's smaller Second of all, you know, so we talked about the fact I can say I only want you to back up a certain table But more often than not Something nasty will happen not necessarily the whole database, but one table or a small set of tables And so how do I just restore one table? I can't do that with a plain SQL backup If I've got a multi terabyte database and I have a plain text SQL I can't open that in the text editor. Just grab the one table and its data and replay it this is the beauty of Using the compressed format and the custom format is if I run PG restore What it actually does it doesn't actually restore data. What it does is it converts that binary format into SQL okay, so what I can do is I can say PG restore list dash L everything in postgres.bak and I get this beautiful table of contents that tells me all the different items that are in there I can see if we can find the top We have you know, what what the database name is how many entries are in there the compression level the version of the Everything I could ever want to know about how the backup was taken and all of the items that are in that table of contents So now okay, you know, I had a problem Let's use the PG bench branches table again So let's just do because I don't remember the option do dash dash help, right? We'll say okay. I only want to dump One table up there it is restore a specific table name dash T same option is when I dump it out So I'll do is PG restore the single table called PG bench branches From postgres.bak and this isn't going to put it in a database It's actually just going to convert it to SQL and give it back to me So I can I can extract from that dump the SQL for this specific table I could tell it to go to a database if I want to or I can dump it on the console. It's a lot of flexibility Sometimes too much question No, you you would have to what I'll show you what you can you'd have to kind of do it this way The the trick to that is just create another database put it in there and then you know slip it over question in the back Yes, so what I would actually do is just say go to PG bench Branches dot SQL and now I have a file that will give me my PG Bench branches Table just a redirect just a great event Yep, yep, because it just don't sit on standard out. It's not anything else Okay So this is what the the other flexibility that you get is hey, you know what? I really want to dump I Just don't remember the create table statement for that thing So this is another one of these times when the tools can be used for things. They weren't necessarily intended for I can say Give me the PG bench Bench branches table and only the schema And whoops and give it to me from postgres.bdk and now instead of the data I basically just get the create table statement so I can very easily grab Just the schema or I could say hey, just give me the data. I have either option question again. Oh That's a good Let's take a look at I don't remember which of the PG Bench tables are Next I know it does not I'm wrong. So when you look at the if you look at the list You'll have here's storing the indexes as separate entities basically Okay Okay, so that's how I can I can get now What if I just want to dump that straight into the database all I have to do is remember I got to give four things to connect to a database host port user password, right? So I can say okay PG restore. I want you to Restore the PG Bench accounts Only the schema of PG Bench accounts and I want you to do it to the demo Let's do that age another one flyway demo database So now if I go into flyway Demo and I look I now have PG Bench accounts and I just did the schema So I was able to extract it from a backup load it to a different database and I'm off now If I wanted I could extract the data and do the same thing. Okay question Yeah, so it's actually PG dump Again, I don't remember the option off the top of my head But there is an option for that Which is here does dash capital C include the commands to create the database so that I'll actually do it So you just need to throw the dash C flag in now I said I'm going to come back to the question somebody said what's the difference of doing all this stuff versus using PG admin? Zero PG admin just if you go into the options menus of PG admin it says give me the path to PG dump and PG restore So it's just doing exactly the same thing for you. The only difference that you're going to see actually comes along At the end with tips and tricks right so we're looking okay. Well, we're doing PG dump But one of the things that PG admin does is connects remotely. So the great thing about PG dump it's just a network client so I can have a backup server and Run PG dump on it connect to a remote database and only copy the data over the wire one time So a lot of people will run a PG dump on my on their server And then they'll scp or our sync the file to the backup server. That's an extra step It's basically double copying the data you can just run PG dump connect to a remote server just give it the IP address and It'll just suck the data right over the network for you So you can move the data once instead of multiple times it cuts down on storage space, too Well, yeah, it depends you get the look and see because PG dump can use the format compressed So if you use the uncompressed it's going to be faster, right? So I believe PG dump by default makes compressed or PG admin Okay I I don't use PG Yeah, I'm gonna just give me a few minutes. We're gonna move beyond questions I know we got a couple more, but there is one other really so that's logical backups, right? So I can basically unload the data from the database store it in a file take it to another server You know give it to my girlfriend and whoo-hoo off we go, right? We're happy But there's a problem, right? And the problem I already mentioned it is that there's no way to predict what restoration time is going to be It's because of the create index when I start those You know if the majority of time with these things is usually index creation I can parallelize it, but there is no way that I can go to management and say hey This thing took 45 minutes to back up it could take 45 minutes or it could take three days I don't really know and you have to test it and the thing is you test it once and it might behave one way You test at a different time. It might behave a different way. So it's they're unpredictable So the other problem is I am dumping all the data out, right? So what in the world can I do to get around those problem and that's where we get to physical backups so Logical backups are great. They're simple. They're easy to do command line utilities They're available for Windows Mac Linux Solaris Uncle Bob's backyard operating system. Whatever you want, right? It's there It's yours go ahead Physical backups are a lot harder. There's a lot more restrictions on them But there's a couple of real key benefits to it, okay? Essentially a physical backup is a copy of the PG data directory So when you're running a Postgres instance, it's got a data directory where it's storing everything The simplest form of a physical backup is you stop Postgres you copy that thing You're then you start Postgres you've got a backup and the beauty of it is is when you do copy it That's 2.5 gig it's going to take a little bit more space than our 29 meg But when I restore it there is zero restoration time All I'm doing is starting a copy of the instance So if that database is 1.5 gig or 1.5 petabytes, it does not matter It's going to just start the database. So restoration time is eliminated, right? Wonderful, I'm just going to do physical backups except I can't stop the database That's ridiculous So this is where we use what's called pitter point-in-time recovery In training classes when we do our point-in-time recovery lab, it's literally half a day So I I'm going to go through it quick But you do need to know point-in-time recovery involves what feels like some duct tape and string and bubble gum and you know wishes and prayers But it actually works very well And if you haven't done point-in-time recovery yet I recommend that everybody do it and the reason is not just because it's good for backups But what the procedures for taking a live we call it base backup That's what your postgres people say all the time is this I just take a base backup But that means a full physical backup the reason that this is important is because Physical backups are based on this technology on taking this this backup Streaming replications based on this all the logical decoding and replication BDR, which is coming down the pipe All of that is based on this type of operation So understanding how this works and being able to do it. It's a little complicated. There's some great wiki pages They'll walk you through step by step by step what to do and there's a couple of utilities and tools to help. I'll show you but You really need to get very familiar with how to do this because Everything you need to do for running enterprise postgres is based all on taking a live pitterback up So what actually are we doing so postgres has these transaction logs, right? So when when you take a physical backup what you do is the first thing is you say I want you to save all of your transaction logs somewhere and then while postgres is running No, you're not the quiesce traffic. You know, it's a stop anything full-blown Go at it mode your track your archiving transaction logs You just start a copy of the data folder use operating system command copy scp rsync Use LVM snapshots use well ZFS snapshots technically you don't have to go through all this rigmarole But the idea is is you any way that you can take that data folder and copy it do it and When the database is done copying it's going to be an inconsistent pile of mess Right, but what we're going to do the restoration procedure is basically going to point at these archive transaction logs and it's going to replay and get consistent right so it's Very very easy way to get your backups done and made and it does it physically There's a couple of nice things the reason that restoration so fast. You're not unloading the indexes. They're pretty their physical files It's it's great. It's easy to do and then like I said This is the basis for doing whether you're doing log-based replication or streaming replication All of it comes from this standard procedure and you're up and running the entire time. This is happening the other The other important thing about this is not just the fact that we're backing everything up and restoration time is fast Remember we talked about MVCC back when I started I said when you when you start a query The query is consistent as of when the query started So if I take a PG dump of a two terabyte database, let's say and it takes 12 hours to complete for whatever reason By the time it's done My data is my backup is 12 hours old Right, so my backup is only good as of when it started. It's useful, but it kind of stinks With physical backups When your backup is done your backup is consistent as to when you stopped your backup But even better that when you flip a flag You're now replicating you can actually go beyond that and catch up So that that physical backup you take you can restore to any PITR point in time recovery. I can actually say hey, you know what? Developer developer xyz typed drop table accounts at 12 52 p.m So I want to tell my backup I want you to roll forward to 12 51 and 59 seconds and you can recover the data So that's how granular you can get even more granular than that if you know the transaction id You can get to 1x id before he blew the or he or she blew that accounts table So, um, it's it's very very fine grain that the issue that you're going to run into with physical backups There's there's a couple one of them is you're tracking every single every single log file You have to keep and track. Yeah, I keep track of it. Um, sounds great. Sounds sound like it's too bad They're 16 meg in size. They're not that bad, right? How bad could it be? I've got a customer that flies through four terabytes of transaction logs Times nine instances of postgres every single day So storing those bad boys can you got a plan? You really got to be ready for it And the other problem with physical backups physical backups are wonderful But when you're backing up the physical structure of something if it's corrupted you just backed up your corruption too All right. So when you're doing those physical backups, you're just copying blocks on the disc If one of those or 10 of those are really screwed up and messed up you just copied your corruption, right? So Okay Now wait a second We just went this whole route of you got to do physical backups. They're great. They get you to the end of the back There's all these things you can do with them You can do replication with them, but now you're telling me that it's uh You know, it's got this problem where I'm storing corruption. So how do I deal with it? Um, what I want you to do is combine physical and logical backups, right? And here's why this is important if you take a physical backup Let's say you take a physical backup on a monday and then you can do incremental backups with physical You can do incremental backups every day of the week at the next week before you delete those old backups You restore it and you do a PG dump from it Why do you do that? What does PG dump do? It does a select star from every table In that Entire database that basically runs through all of your data And if there's an error if there's a block error in one of those blocks You'll see the error message And then you can start going back through your old backups with your point in time recovery and roll forward to before that That error message and recover your database and get to the same state. So That's why testing is so important, right? So we're talking about testing less But you actually have to combine these two strategies on large databases Make sure you're testing and make sure you're reporting them that way, you know, your backups are good, right? Because with logical backups Maybe you know 29 meg versus 1.5 gig. I don't know. That's not good with when I do physical The size should match And if I do a PG dump I can actually run through every single block in the entire database and validate that the backup Looks like it's good or it's at least not corrupted So it's important that you do that, okay I ran through a couple of slides quickly But the how-to On configuring Physical backups it is complicated. I'm not going to lie to you There's a tool called PG base backup. It's actually shipped with the server. Take a look at it It does some stuff for you. You have to configure archiving ahead of time You've got a bunch of other settings. You got a set. There's a couple things to do But it's a worthwhile exercise. It'll probably if you've never done it before It's probably going to take an hour to two hours to do it. Definitely worthwhile And if you're banging your head on the wall, you're right there. You're in the right spot Okay Essentially the procedure is uh, so when you get to replication, right? Essentially your the first step is take a physical backup and then restore that backup and connect to the master It's just with a little recovery.com file how you control that, okay um, so Logical backups fast portable. You can select What you're putting in and taking out of the backups Restores are kind of slow physical backups. It's flexible I'm sorry. It's not a flexible restore. That's a that's a bad bullet point That what that's supposed to say is physical backups. You don't get the chance to choose what tables you want It's all or nothing. So you can't say I want to take a physical backup of this one meg table You got to take the whole database in order to get that But restoration is very very very fast It's just starting the instance and if you have a lot of logs to replay like let's say you've got 24 hours of logs You've seen 24 hours of logs go in five minutes, right? Very fast to replay them And then with physical you also get differential and incremental options So if you start if you use a couple of tools like pg backrest, which is a really great back pg backup and restore It's a great tool that actually will automatically take those physical backups for you It'll let you restore them and it will do differential and incremental. So to save you some space The joke is is that every consulting company has their own backup and restore tool in the postgres community Which is kind of true Some cool ones like Like walley, so pg r-man if you're used to oracle it's a very r-man like interface walley Is it actually will store your transaction logs up in amazon s3 for you? So if you you're worried about planning for that and you got an s3 bucket that you can have at It'll actually send them to amazon for you My personal favorites pg backrest. It's very flexible. It's very fast question To backrest There you go. So amazon s3 support to pg backrest. Whoo. That's that's a happy day Um So, uh, so take a look at all these tools because you really do need to go through the procedure of setting up that backup Setting up that replication once because then you'll really have a love and appreciation for all these tools Um, you know, the other thing is if you're using Like amazon rds, uh, they do all the replication for you, right? So you that then you really gain a love and appreciation for that um but Bless you. Those are the those are the different types of backups you have. Um, hopefully we went from, um, You know the fishermen on the beach to being comfortable on the water to uh being able to rip whatever we want from the bottom of the ocean There's just one thing I caution you You know as you spend time with databases like myself. I've been doing this about 15 years now You kind of get a little bit bitter. Whatever you do. Don't become this guy Um All right, um, if you have any questions, I'll like I said, I'll post these slides up You'll see them. There's a spot for posting all of them. Feel free if you have any questions. Let me know and uh, I think that's time Thank you everybody