 Okay. So hello everybody. I am Dimitri Fontaine from France and we're going to talk about backups. And actually we're going to be speaking about why backups are not interesting. Let's get back to that in a minute. So I'm currently working at a French place called Le Boncoin which is doing a classified ads. And I've been a major contributor for PostgreSQL for sometimes now. The current project that I spend my free time hacking on currently is a PG loader. The goal of PG loader is to be able to load data into PostgreSQL as fast as possible. And the trick that doesn't show in the name of it is that it will, for example, load data from a MySQL database. To connect to MySQL, introspect the schema by doing catalog queries, recreate an equivalent schema in PostgreSQL including data type conversion. For example, tiny int of one might actually be a Boolean so it will take care of that. Do the rule data loading and then recreate all the indexes, the constraint, etc. So you can do a full blown MySQL to PostgreSQL migration in one command line. And it works also with a SQL server, SQLite, and CSV, DBase, plenty of things. So check it out if you have a use case for it. Yes, sponsor me and I do that. I'm serious. I'm not going to touch your record on my free time. That will never going to happen. But I really want PG loader to have a record support. So let's talk about it. I'm serious. So I've been doing other stuff for PostgreSQL. If you're using DBian, check out APT.PostgreSQL.org. DBian has a policy that is really good but that work against PostgreSQL in this time, which is that they really want to be able to maintain the software, whatever happens. So they're on the hook to fixing the bugs, which means that they don't want to include all currently supported PostgreSQL releases. And in DBian, DBian stable, that doesn't mean the software distributed by DBian is stable. It does mean that the version number of every software doesn't change during the lifetime of the distribution. So it's not the software that is stable. It's the couple software name, version number. So if they have PostgreSQL 9.1, they will not have 9.2, 9.3, etc. They only have one version. If you want to have a stable PostgreSQL release on a DBian stable distribution and if you want to have the opportunity to choose whichever of the five currently stable PostgreSQL releases you want to work with, check out APT.PostgreSQL.org. Okay, so as I said, backups are not interested, interesting at all. So who is running PostgreSQL in prediction? Yeah, who's doing backups? Who's doing automated tested recovery? The other ones, you don't have backups. End of story. That's it. If you don't test the backup, the only way you know you have a backup is that you're able to restore it. Okay, there is no use case for a backup other than recovering. If you don't try to recover a backup, you don't know if you have a backup. If you don't know if you have a backup, well, you don't have a backup. That's it. So that's my full talk. So backups are used in general because you want to be sure that whatever happens, you still have access to the data. So that's the disaster recovery plan. When everything, every other things are failing, all one after the other, then it's time to recover a backup. And if you didn't, if you don't have automated tested of the backups, so the recovery actually, well, the day you really need them, you don't have a procedure. You don't have a script that does it automatically. You don't have people trained to do it. So you're discovering how to risk to recover a backup for the first time. And you're in that situation. That's not good. Nobody wants to be there. So don't be there. So my advice is that never ask people to do backups because they will do they will do backups and they will stop there. And that's not what you want. There is no business that just want backups. So task task people to do automated recovery each and every day, week, whatever is makes sense for you. So for example, you do physical backups each and every night. Let's get that as an example. And how do you test your physical backup? Well, you copy it over to a box where you start PostgreSQL and PostgreSQL should be happy to start with whatever files. Wearing the physical backup, right? And then you PG dump out of that standby that you just bring down from the backup. And PG dump will read every piece of data into PostgreSQL. So it will actually test that you can read all the data that is part of the backup. And bonus, you will have a physical backup and the logical backup. And the logical backup is not hammering any resource on your production server. So just do that. Physical backup fully automated and then you restore that, you PG dump out of it. And if you really want to be sure, you also vacuum the whole thing because PG dump will not use the indexes. And maybe you want to test that you can actually read data from the indexes. So you can also vacuum at a standby, you just bring it out from the backup. So you don't do backups. You do automated recovery testing. And one of the good tools that you can use to be able to do that is Barman. So why Barman? Barman is the only tool I know of. I need to check about PG Backrest about that, but David will tell me. Up to now, Barman has been the only tool that I know of that explains. The first thing he says, Barman is not a backup tool. Barman is a disaster recovery tool, okay? The aim of Barman is to recover data. The guys who wrote the software, what they want to be able to do is recover data. That's what you want. Backrest, you have that too. It's all about the recovery. Okay, so you can use either Barman or Backrest. The other ones, each time they're talking only about the backups. We're really good at doing backups. Nobody cares, whatever. So I'm not really sure I succeeded into convincing you to change all your backup organization and procedures, etc. So let's talk about a story that did happen when I was called in the early morning to help fix a customer who had a problem with their backups because they never tested it. So they did not have any backups. And still they lost data and prediction. So let's see what happens when you lost your data and prediction. You have no standby and you have no backups. What's next? You can close the door, find a new job, I don't know, something. So that's another favorite of mine. We do have backup. We have written a shell script. What can I say? Never use shell in prediction ever. In prediction, the only important thing you want to be able to do is recover from any error that you didn't think about. So you need to use a tool that allows you to implement error handling. How do you handle errors in a bash script? Well, you don't. So don't use bash in prediction. That's it. Whatever you want to, but you need to be able to handle non-anticipated errors. That's prediction that will happen and it needs to be graceful as graceful as possible. There is no way to do it correctly in a shell script. So if you have shell scripts in production, rewrite them in something else. Maybe not just now, but tomorrow. I'm quite serious about it. Each time there is a huge fuck-up in prediction, there is a shell script somewhere. So in that case, the shell script was all about caring for backups. So it was a backup script. The first thing they did, obviously, is to make sure you have enough space on the device to actually make a backup. So the first thing you do is you remove the old backups that are still laying around. So every file that is older than five days gets removed. And of course, this is a Chrome job that runs as the progress user on the machine. I think you're seeing it coming, but let's spell it out nonetheless. And they were under pressure to deliver this system. So they didn't do all the checks that the setup was entirely correct. And they forgot to include the environment that contains the backup there. So what happens? That directory doesn't exist. So that actually will happen in the currently working directory. And if you're in the Chrome context as progress user, that's PGData. So let's remove any file older than five days in PGData. And then we'll take a backup. Yeah. Okay. Who's running backups from a shell script in production? Yeah, I changed that. Yeah. Don't do it anymore. Okay. So if you can spell that at loss, that's what happened. So let's see about what to do now. Don't do it. Yeah, don't try to do it. So they did have some work archiving. But they didn't have a pass backup that matches the wall stream. Because the pass backup was supposed to be taken by the shell script. But the wall archive, it was the archive command in the PostgreSQL.conf. That was deployed. The environment, no, but that part, yes. So we were like, okay. So let's try and use those wall files that are beginning at the, just the beginning of the server. We don't have a pass backup, but when you start your server for the first time, you just need an initdb. It's here, initdb. And then you start. So any server on the same physical architecture begins all the same every time. It always begin as like the result of the initdb command from the same version of PostgreSQL. So we did that. And we tried to apply the wall with the restore command to the resulting cluster. Like the bare default cluster. But of course, the wall files contain the system identifier that is randomly generated each time you do initdb. So it didn't match. Okay. Let's see in the wall what is that number, that magic number, and replace it into the, replace in the current initdb system. Change the system number with that. So the system number was this one that I will not even try to read. And it's written that way. It needs to be written in hexadecimal, of course, because it wouldn't be fun otherwise. And network byte order, which means that you begin by the end. And so with exedit pgcontrol file to put that into place at the right of set in the file. Don't mess it up. But it's in the new initdb cluster. So if we mess it up, we don't lose any data. We do initdb again and try again. So we do that, and then we try again to apply the wall files. And this time we have no system identifier ever. So it means that we managed to bring in the correct identifier. But of course, the procedure to install a new server will first do the initdb and then change the setup to include the wall files and then start again the server. So we're missing the first wall file, the first two of them actually. So there is no way we can invent them. No luck. So we couldn't apply the wall files we had. We were missing the first ones. If we had a base backup, we would have been able to use them. We didn't have a base backup. And there was no way to reconnect with the initdb sequence. So too bad. What's next? Just to be sure, who wants to do that in production where any other hope is lost? You don't have any data left. The data is really important. That was financial transactions. I'm serious. So they really wanted to recover the data. They have nothing left. And we're doing that in production with every manager having pressure on his head because they lost the financial data to report to the customer and invoice the customer. So who wants to be doing that in production? Yeah, do your backups. Test your recovery. Do the recovery. Yeah. But we don't have the data yet, so we continued. So we're going to act on whatever is left from the rm-ref that we did before. So whatever left, we're going to act on that and try to have PostgreSQL be able to start with something. So first, make a copy. Don't destroy it again. We're going to act and we're going to do bad things on the data. So whatever left, we work on a copy. That's obvious, but given that the backup script failed and removed the production data, maybe we need to state the obvious. So when we start PostgreSQL against the copy of this data set, well, there is no pg-file.map file. So in PostgreSQL, when you create a relation, like a table, an index, a sequence, lots of things, it will be assigned a numeric name that is the name of the file on disk and which might be the same as the internal identifier or maybe not. So there is a mapping in between the internal PostgreSQL identifiers, the OIDs of the catalog table. There's a mapping in between that and the name of the file on the file system. So that's the map of the file nodes for PostgreSQL, pg-file node, which we didn't have, so we recreated it with that magic command. Yeah, so I had some help to figure that out. It was an address friend. I asked him and he was like, oh, that's easy. You just do that and you have one. You can play with it. Okay, let's try that. It worked. Next problem. So now we have a pg-file.map file and PostgreSQL refuses to start nonetheless because it's missing some clogged files. So the C log, okay, so it finishes with the log. So some people, when they short of space on some directory and they're like, okay, let's remove some useless files. Oh, there is a whole bunch of log files that are written in binary and that nobody can do anything with it. Let's remove that. It's taking lots of places on disk. So it might happen to you even, yeah, it happens. So there is log in the name, but don't ever remove that. The C log is the state of each transaction on disk. It will tell you for each transaction number, XID, if it did commit or rollback, which means if you don't have that file, you don't know if the data is visible or not. So at this step, we're going to hack the file, create them again. And so there is two bits per transaction. So you have three different states possible, and the 01 state means visible. So we're going to hack the C log files, create them again, and placing the big U here means 01 because we have four transactions visible, if you say big U. And so we are creating the missing C log files and we are pretending that every reference transactions in those files were committed and not rollbacked. We don't know that. We don't know that. Maybe some of them had a rollback and so we are going to make data visible that was not visible. So at the end of the story, we get some data back to the analysts and they're like, yeah, but that transaction, I'm not sure it did happen. Yeah, maybe not because we did that. We don't know. So we needed to do that for PostgreSQL to start, but we're not lying about any data that we may recover. No PostgreSQL is starting. So it starts. It's great. So you connect to it. And no, you cannot connect to it because the first thing it says is there is no PG database file. Okay. So let's create one. What is a PG database file? So we need to be another cluster on the side and we connect to it and have a look at PG database and it is what it looks like. So remember about the file node mapping that we had before. So we need another mapping for the database entries. And because it's a shared catalog, it needs to be there whatever the database you want to connect to. This file exists for the whole cluster. So it doesn't depend on the database. So you need to have it to be able to connect. To check that you, when you connect, PostgreSQL needs to check that the database you want to connect to exists. And it does that in that file. And that file was older than five days. So it's been lost by our backup script. Okay. Just so that we are on the same page. So if we try to figure out that binary content of the file, we can see that you recognize the strings here, right? That's the names of the database. So let's create a file with the mapping we need. So we can't inject anything in the cluster because there is no way we can start the cluster. So what we did is hack PostgreSQL source code so that we would have a with OIDs option for create database. So now with that command in the server, we are able to create the database and specify the target OID we want. Because we had the files. So each database here is mapped to a directory that lives into base inside the main cluster. Those files, those directory entries still existed. And the name of the file is all numeric. And that is the number we want here inside. So the content here, the relation file not entry. We know that data is just ls in the base directory and you have the numbers. And we found the script that they used in the shell history. I'm not even kidding. So in the shell history of the box where they lost all the data we could find in which order they created the database like when they put the box in production days ago. And so we could map because it's an increasing sequence so we knew which number would be which database. So now with that command we could log into so we could create a new cluster, hacked with a new command and create the database in the same order again and pick the OIDs from the ls of the damaged cluster. And when we have that we can copy over the file with the mapping and now we can actually connect to PostgreSQL. Yeah. But it's not done yet. So just as a quick reminder, either you do that or you have proper backups that you test. Okay. But let's have fun some more. So now we want to connect to a database and well, I was not at ease at this time because as the consultant to help them I read an error message that I didn't know existed. I had to open the code of PostgreSQL again and try to find it and try to understand what it is. So who has seen that error message before? Peter maybe. No, not even you. So what does it mean? Well, the PG database file, so the mapping in between the numbers and the name of the databases, PostgreSQL has that now. So it can open the file. It's happy so it can start. But when you try to use it, so as it's a shared catalog it's indexed because you need that pretty often. So PostgreSQL will actually read the data from the index, not from the file. So we created the file with the right mapping but it will not use it anyway. But PostgreSQL hackers are really, really, really good about that. So of course you can start a PostgreSQL cluster and with this option tells the PostgreSQL cluster please refrain from using any system catalog index. And every PostgreSQL feature will continue to work, just disabling the indexes, the system catalog indexes. Because of course you're going to have problems with that and the hackers did know about that so there is a mechanism that allows you to fix the system catalogs. You can re-index them and whatever. So the full thing is, once you start it you don't want the... What we want to obtain from this scenario is whatever data is still available in the damage cluster. We don't care about doing anything. So we disable all the index usage for later and we disable the system catalog indexes. We reload with the new config and not only PostgreSQL starts but we can connect to it now. So is there any data in there? The first thing we do when we connect of course is please PostgreSQL list me the tables that you do have. And it's like, oh I would like to but I'm missing that file. And in the fresh cluster that we did in IDB on the side we see that the file exists and it's PGPROC. So each time you create a database a series of files are created, same with the same OIDs extra. So the OID and the PGPROC file node here are always the same if you do an IDB again. For the same version of PostgreSQL it's always the same numbers. So we missing here PGPROC. And when you do backslash DT to have the list of the tables the query that is used to do that actually uses some functions stored procedures. So you cannot list the tables because you have lost all the stored procedures. But they are available in the binary of the PostgreSQL cluster. So the binary has them but they are not available in the PostgreSQL catalogs. So they're not available at the SQL level. So well we copy the file over, the PGPROC file directly we copy it over from the fresh IDB system to the dmg cluster and we were really lucky on that recovery scenario they didn't use any stored procedures. So when we do that we just install all these standard stored procedures from PostgreSQL but none of the user defined one. But they did not have any so it's okay. So we do that. Well PostgreSQL is running, we copy over the PGPROC file and we try the command again and it works. So that's pretty amazing. But now it's complaints about schemas. So let's continue to dive into that. Either that or you have backups, you know. So here is what the namespace is. So it looks like that. So it's the list of the schemas that you have. So again they only have a couple, maybe three schemas and they did remember about them which order extra. And actually we didn't really care, we could have renamed them whatever we wanted. But there is a trick. They don't have the PG namespace file but they, so here is what it looks like if you copy it out from a working system. So the goal is to create a new file that looks like that and copy it over the damage cluster. So we copy PG namespace from standard input with ideas and we say okay, we had a neural message that this idea was unknown as a namespace. Well it's my namespace now and I own it and it has no special privileges and you can actually do that on a running cluster. And where are the numbers coming from? Well we were really lucky. We still have PG class. Basically they did create maybe a temporary table or something in PG class in the last five days. So the backup script that removes anything older than five days didn't get to remove that one. But during a create table command in the last five days we wouldn't have PG class to work with. So we were really lucky. And so in PG class we have the real namespace that is the OID of the namespace which is used as a foreign key. So of course if we need to do a left join because the namespace entries are empty. But we know the real namespace and we know the name of the table and they're like oh this table is part of this schema so it was the proper name of the schema. Okay so then we can prepare the command here and inject our namespaces so that PostgreSQL works again. And again we do that live so we do that copy command live into a broken cluster and we go to the next stage. It's pretty amazing. We can query the catalogs but what we want actually is not the catalog entries we want to recover whatever data is still available right. And so we have PG attributes again maybe they did create a table in the past five days so we still have PG class and we still have PG attributes. And PG attribute it's a good thing we have that because it's one of the most complex to recreate from scratch so we didn't have to do that. We don't have the attribute but we don't have the catalog with the attribute defaults value. So here is what it looks like the default values and if you look at the catalogs so you have one row per attribute with an attribute which is an add name so it's the list of columns with the first column number one second column number two extra and in the ATTRDef table we have the default value expression so now we need to step back a little and so the goal is to recover any data we're not going to insert new data into the damage cluster image right. So do we care about default values? No we don't we just want whatever is in there we're not going to invent new default values so let's pretend that there is no default value we have in PG attributes we have an ATT has default which is a column that allows PostgreSQL to know if it needs to go fetch the default value when you do an insert without the value and so let's update that and say no no there is no default so don't care about that okay so we don't need to fix that catalog that is missing we just pretend that nobody has a default value please okay and then after that we are able to copy the data over to CSV files so we could actually recover some data from that and remember about the C-LOG slide we could recover data but we don't know if the data we recovered had ever been committed or not no way around it so two days later when we did that then the guy complained you recovered some data that never existed yeah yeah so as a conclusion I was really really impressed by PostgreSQL resiliency here because I could start we could hack our way into the having the cluster start again and then while the while the thing is running we could fix it so if we back to the if we back a minute to the the PGPROC countries here so we did it was complaining about that file that is missing and it's PGPROC but it actually I didn't copy paste the slide but I could have copy pasted that like with dozen other catalogs that were missing to like PGAM the access methods PGOP families PGOP class PG anything that you will never modify in after modify after the create table so we lost any any file older than five days so that's the things you typically don't modify so we lost all the all the core infrastructure of catalogs for PostgreSQL so we did replace not only PGPROC but the dozen others again we were lucky they were not using extensions if you create an extension that is to those catalogs and it sentries that the object identifiers the ID are going to be dynamic so you don't know them in that case all the OIDs we have are created at any db time and those numbers are coded into PostgreSQL source code so from one installation to the other it's always the same number so we could pick all the files from a new system and copy them over not only for PGPROC but for lots and lots and lots of other catalogs so and we did that while the server was running so we type in the query, we have the new error message we copy the file over we type the command again and it's the next one so I was pretty I was really really impressed and at the end so we get the data and some data actually not all of it the recovered didn't existed in the production system but they could do something try to figure it out but I'm not even sure they lost the customer on that that's another story so as a conclusion who's doing backup in production who's having PostgreSQL in production like lots of, yeah all of you who's doing backups who's doing automated recovery testing the other ones you don't have backups remember about that and don't ever ever write a shell script to run in production because you will have cases that are not undult properly and you will for example remove any file older than 5 days in the wrong directory so just don't do that use a proven tool like barman or pgbackrest something serious that is all about recovery not backups nobody wants backups we want recovery any question yes most of the things like here that was pretty fast yeah that thing not so much so it took two days yeah two days I sleep at night so maybe twice 10 hours let's say 20 something before I started yeah yeah no no no no over there and then you the dba in charge called me on the phone over saying we really need you to come over now yes he did it was not so much like the boss that was really crazy that some of the data didn't exist actually he didn't have a clue what did happen and the sysadmin guy and the dba guys they didn't really want to explain to him the failure of the backup script so he didn't get a clue well I don't know it's a well the file did disappear so maybe it would have made things no I don't think it would have changed anything because we couldn't use the whatever world files we had and checksums are going to be in the world files we couldn't use that so I think it wouldn't have helped at all I think but I don't want to do it again just to be sure good question that must have been 9-1 no I don't know on that version no but if we had checksums I don't think it would have changed anything yeah I guess any other question thank you