 Great. Thank you very much. So hi. I'm Chris Pettis. I'm the CEO of Postgres QOxperts We're a small Postgres consultancy based in Alameda, California There's our company website. There is my personal website. That's where the slides will be after the after the conference my email address and my Twitter handle so Corruption. Yes, it will happen to you of it if you run any database of any size Eventually at some point something will happen and you'll have database corruption The good news is it's super easy to recover from corruption Step one is you restore from the last good backup and then you receive the praise of a grateful nation and you're done Right so it questions. Oh You don't have a known good backup or that's a shame Unfortunately, even good backups can have hidden long-term corruption So you might restore from the backup and it has exactly the same corruption that you just were trying to get around Or they could be too old, you know, the a three-month-old backup is probably not that super useful Or you might be hit by Postgres bugs so we're going to talk about preventing corruption finding corruption fixing corruption if you can and First let's talk about preventing corruption The Postgres is very trusting. It basically assumes the file system is perfect that everything it writes the file system will come back But just as it wrote it It can't recover from any sound bad data data, right unless you're very very lucky Nine nine point three introduced checks page-level check sums So at least you get a warning that the data that's coming back is bad So use nine three check sums if you possibly can Just remember Hardware is cheap. Data is expensive. Use good quality hardware. Make sure your hardware properly honors f-sync and 10 There's a tool that ships with Postgres that are to test this It's more common than you think that hardware does not honor f-sync. This is especially common among consumer level Built PCs and very high-level sands often do not offer f-sync it honor f-sync end-to-end because it greatly flatters their performance numbers Um Generally you want to avoid network attached devices for the PG data for the actual database and for the backups There is an unfortunate very common sort of trope that of the backups will go to MNT slash backups The problem is when something goes crazy and starts destroying files on your system The backups are just another local file as far as it's concerned and it'll destroy those as well So have that that air that little it's not exactly an air gap But a data gap between the the system and your backups make it make at least go through an scp or an r-sync and Of course backup backup backup what only exists on one drive you do not truly possess Make sure you follow the right backup protocol for your technique Backups on postgres are not super complicated, but there are multiple steps and you want to do them in order correctly every time Make sure you issue your PG start backup before doing a file system level backup And test your backups Because if you don't haven't tested your backup strategy, you don't have a backup strategy what you have is a great deal of wishful thinking One thing I like to do is what I call a prophylactic PG dump is even on a gigantic database Just PG dump the whole thing to dev null So who's running a database that's over like a couple of terabytes How many bad pages are there? Do you know? No, you don't you don't know if there any bad pages out there because it's very eat because probably every single page on that Database has not been read in a long long time The nice part about this is it reads every row in the database and if one's bad, it'll tell you It's great fight for fighting lurking corruption You know of course if you're you can save the dump file do it I mean if it's 12 terabyte database you probably don't have that lying around so that's okay One downside of using PG dump is it doesn't read the indexes So if you want you can write do some scripting around PG stat tuple extension That does read the indexes and then you can literally pick up and rattle every single page in the entire database and that's very handy So why you know what why why is there evil in the world? Why what causes corruption? So The underlying storage failure and this is this is not uncommon Sometimes sit down with your the spec sheet on your hard drives or your SSDs and do the math of how many uncorrected sectors per billion They say there will be and how many of how many sectors there are on your drive And you'll come up with a number of like 23 Which means there are 23 uncorrected disc errors on a brand-new just out of the bubble wrap hard drive Um, it could be the bad disc. It could be a bad controller controllers do go bad One of the sometimes disc controllers have do not have ECC memory in them. So so which is a little weird It's like, you know a seat belt made out of pasta it's You get garbage right you can get garbage rates during power loss Even battery backed up controllers. Sometimes this will be an issue for example on your battery backed up controller Do you know the battery is still good? The you or you have bad RAM especially non-error correcting RAM non-error correcting RAM, you know a huge amount of a non-error correcting RAM will have a bunch of errors in it just because We live in a world where there's radio activity all over the place Sometimes the hardware is working as designed, but how it was designed was not so great For example deferred or entirely missing f-sync behavior because to flatter a band Perforate bandwidth or latency numbers. Sometimes they will not full. They'll consider The f-sync primitive will stop at the point that it's written to the local memory cache on a sand for example But it hasn't actually been committed all the way to permanent storage A network attack storage that doesn't handle detach very gracefully. This is also pretty common If you're using soft RAID there are plenty of edge conditions in soft RAID that can cause weird things to happen You know the the nightmare scenario is one part of a stripe of a stripe set doesn't attach in which case your corruption is a little The database corruption is the least of your worries And sadly post-crisis does have bugs like 9 2 and 9 3 did have a series of unfortunate replication bugs We've been free of those for a little while They're not common is the good news. Who's on 9 6 1? specifically 9 6 1 Who's on RDS at 9 6? Because 9 if you're on RDS and 9 6 you're on 9 6 1 it has an index corruption bug so that you happen like Who's running 9 6 1? Or operator error Backups that did not include critical files For example, there was a you do a backup with a dash that on with our sync that has a dash dash exclude that is star log star That that misses PG underscore X log and PG underscore C log They don't follow the backup protocol. So the backup you don't have all the wall segments for example from a backup It's amazing how many backups just forgot that you have external table spaces This is one of the reasons I tell people don't use table spaces in post-crisis unless you must because they greatly Increase the number of places you can just go wrong Or you type our MR dash RF in the wrong directory Who's done that? Come on be honest. We're among friends here. Yeah, I sure have Or it was a really easy problem and then you started to recover But it was three in the morning and you hadn't had your coffee and bad things start to happen Like you delete the wrong files to free space Like the git lab incident Where they made a bad situation worse because they thought they they copied the bad secondary over the good primary They were very honest and transparent about what happened. So kudos to them for that. There was no they were They they did not try to cover their mistakes, but yeah There was the you and the problem here was there was kind of an unfortunate situation That turned into a disaster because they were sleepy, you know And they had been fighting this problem for a long time and they did the wrong thing and it happens to us all Okay, so you've hit this corruption thing. What do you do or before you hit the corruption thing? So by good hardware, who knows what hardware their cloud services are running on? Okay, you're lucky if you do because most cloud providers don't tell you because what they want to do They because it's a capital-intensive business. So they keep their hardware acquisition is expensive And have multi-tier redundancy. So have a secondary Have have continuous good backups someplace else Make these backups and test them One thing I really suggest recommend doing is priming developer laptops and workstations from Develop from prod backups because of the backup stop working. You'll hear about it And stay up on post-cris releases. Don't defer these. It's really alarming when I go into Some place and they're running 9 3 0 You know, that's what 12 releases back minor releases back something like that Okay, so basic corruption recovery techniques First of all save all the parts stop post-cris QL. Don't just Do a file a full file system level backup and keep that backup safe May just Take a snapshot of it because you're going to start modifying things but you want a zero point to start with and Make changes methodically and document every step use a wiki use slack use a paper notebook use something But don't just keep making changes it because halfway through you'll go. What did I do again? And that could be very very bad. You're going to you know, the adrenaline's high You're going to be unhappy about this situation. Take a deep breath move slowly because a crisis a crisis is a problem Plus a plus bad planning storage space is really inexpensive So no matter how big your database is make sure you have a place to do a full file system level copy of it Either on that device or some other device ideally a different device you can mount cloud storage makes it You know cloud providers make this a little bit easier because you can always attach something new but Because you will want to make these copies Okay, the most common kind of corruption you'll run into post-crisis index corruption This will either you'll get errors out or you'll start getting queries in they're performing impossibly wrong The the first thing to do is Start a transaction and drop the index and see if that fixes the problem The nice part about it is if you're in a transaction You can roll it back and the index comes back for your session because DDL changes in post-crisis are transactional unlike every other database in the world If it did fix the problem just rebuild the index, you know commit build the new index you're done If that didn't fix it's probably not index corruption. It's something else or you've got the wrong index In post-crisis 10 they're coming in post-crisis 10 There's a new tool called am check that detects malformed indexes, which is really cool It's also available for pre 910 from Peter Geegan there It doesn't repair the corruption. It just tells you that the index is bad it will never it is designed to never report a false positive, but it is it is possible to get a false negative out of it because Corruption is an unbounded problem Bad data page usually because some errors flying out of in the post-crisis log saying or you're getting an error When you try to run a query and it says bad page You will it may be a checksum failure if you're using checksums use checksums Complaints about bad headers things like that. So can you do a PG dump of that table? Can you pick up each row and shove it out to disk? So you're you want the output there to be clean If that doesn't work, you can turn on this There's a global setting zero damage pages which you can set in inside a particular session That will if it hits a bad page it will issue a warning rather than an error and it will return a zero page Zero in post-crisis means there are it means no data So you've lost whatever was on that page, but at least you can keep going and scavenge what you can But sometimes they'll hit these really bad data pages like Postgres credit like the Postgres back-end crashes when it touches that page Can you select around them? This can be a really tedious process of course of passing starting crashing starting This is why you want to work on a copy? You can use the copy command to get just the good data out to see if you can isolate down to the bad pages Or you can create a new table from a select that selects around the bad pages If sometimes you can delete the bad rows just by CT ID Who's familiar with the CT ID column in Postgres? Sometime it when you're logged into postgres do a select CT ID from table name. You'll be surprised. There's a column You didn't know you had called CT ID. It's a unique identifier for every row in a postgres database And you can use it to identify a row if you have no other way of doing so. Please don't bake this into your application It's if I ever see an app with CT ID in it It's like and worse yet somebody say oh Kristoff said it was a okay idea So finding them you can write a little plpgsql function That iterates through them one nice thing about plpgsql is you can trap errors And so they don't automatically terminate so you can you can loop through select the row and just catch the error and Do a and emit a log line that says there was an error So just catch and log them so this is really nice technique This is very handy for finding toast corruption The toast table for those who are not familiar is a hidden table a semi hidden table You can see it easily enough if you look for it That exists alongside every table for holding large objects like big text fields and things like that the toast Structure seems to be a little more vulnerable to corruption than the main structure And so you frequently get errors out of it So this can be very handy because you actually have to pick up each row and make sure it decrypts decompresses the whole toast structure Okay, but none of that worked like Postgres won't start at all So what do we do? The problem is every corruption bites nature is a one-off and you post this was not designed to handle every possible corruption thing cleanly so Be sure you know the extent of it And be sure you can step backwards. This is why it's very important Well first there are no real recipes You do have to explore a bit on each case and remember work on a copy because you are going to be mucking directly with the binary structure of the database So First collect some data. Are there errors in the message or syslog? That are indicating in a hardware in OS problem. You know is is are you getting this huge stream of error saying? I can't read a block off the disk Is your own killer terminating back ends? That's very common Are you seeing these disc IO errors? Just try cp-r the whole thing to a dev null Can you actually read the files and which files can you and can't you read? Again, so if you have these very very bad data pages where Postgres crashes You might isolate the page and use DD to zero out those those blocks I don't give the formula here because I'm a little worried about turning this loose into the wild But that is one one possibility. It's easy enough to find the formula for how to figure out where in the file to do it If you do anything like this like turn on zero damage pages or use this technique You you need to rebuild all these index the indexes because the indexes will be pointing to data That doesn't exist anymore very common as wall files are missing and corrupt you start Postgres and it says sorry I can't find one of these wall files and Stops and what we'll refuse to start it up Or you went on vacation and you're responsible for the database server and the system ran out of this base And they called you now to say it won't start and We just deleted some log files Which ones? Yeah Yeah, this is this is a common enough problem that postgres 10 there we're changing the names of the directories They are no longer going to be called PG underscore X log and PG underscore C log They'll be called PG underscore wall and PG underscore X act specifically for this reason However, there is a tool that will that will get postgres to start up even in this case, which is PG underscore reset X log it tells the postgres, you know those wall files you need you don't need them trust me Everything's fine. Just start up, please and Read the instructions on this carefully. It's you know, it is a chainsaw. You want to make sure you're pointing the right end out and If you do this, there's a very high risk of inconsistent data. It doesn't write those wall files just to amuse itself It does need them for a crash recovery. So You will end up with a database that probably has some kind of inconsistencies or missing data So check it very carefully afterwards There's PG underscore C log, which is you know, it's the this extremely important directory No one ever looks at they did they don't hear about it because it's really small It's just maintained automatically by postgres everything, you know, they don't worry about it until you get one of these It throws out an error that says it couldn't read from read from a PG underscore C log file what PG underscore C log is is it's basically an on-disk bitmap of the state of Every transaction that postgres cares about and I'm not going to go into detail about what it cares about It's not every transaction in the history of the postgres database, but it's a very large range of them So it's a bunch of 8k files with Unpleasant hexadecimal file names that's a strip then with two bits per every transaction that postgres is interested in The good news about corruption here is it's rarely subtle. It'll suddenly you'll start hearing getting these awful messages being thrown out Usually there's a missing file or a truncated file or the two most common here You can patch it you can use DD to replace it with all zeros file 00 of in this bitmap means the transaction is in progress Which means that previously committed transactions can disappear if you do this? So again be prepared to do more cleanup, but you can probably get the database started at least And the worst possible thing is one of the system catalogs gets corrupted It's giving you an error that it can't read PG underscore class It's really hard to recover from this This may be a no matter how old your backup is is what you're stuck with situation unless you call somebody like us And it requires some expert attention to do some recovery or scavenging some of these are really easy like there was a Case where PG underscore namespace which holds things like the schema names It had a it had an entry where its parent was itself All it did was break PG dump the database ran fine And so I just patched that one column in that one row and everything went fine So that one was easy however serious ones where the database won't start because of the system catalog corruption is Are very hard to recover from if post-crisis loses the system catalog It can't interpret the tuples on disk because there isn't enough in the there's no information in the tuple structure itself Out in the rows to say what what what the columns are or anything like that? That's all in the system catalog. So you at that point you have a big bag of bites Okay, so let's talk about I'm going to talk about some actual real-life things that happened to me as a consultant and how I fix them So here is the error Oh Bad see a log entry. Oh, that doesn't look good. So how did this happen? The they had a primary and a secondary and the switch started going bad It was like dropping packets and the packets were bad and the packets were failing checksums And you know one of these things memory, you know bit of memory and the switch went bad so They promoted the secondary of the secondary to be in the primary Unfortunately, the secondary was a much less capable machine in the primary and the site started falling over Because the secondary was been getting crushed by the load. This is why you don't do this, but okay The that's like telling someone who's dying of lung cancer to stop smoking. I mean, you know, you don't you have to have a good bedside man or sometimes So they initiated a beef your primary a beef your primary and then they initially they Initialized it back from the secondary just the right thing to do But then these errors started popping out when it started up That doesn't look good the problem is they ran the all this back over the same bad switch and So our sink our sink dropped some of these files. It wrote errors for them But they were in a hurry. They didn't look very carefully and of course It's our sink is emitting this giant long list of things and they didn't like scroll back and see this that oh by the way I just dropped a few C log files and Everyone was in a huge rush and they pushed the button and it brought back online The good news is they hadn't decommissioned the secondary So we were able to just pick up the missing C log files and drop them in and the system worked Yay We were very lucky that C log file was available Had things gone in slightly a different order and they had decommissioned the prime of the secondary or started Re-initialized it as to turn it back into a secondary It would have overwritten those C log files and they would have been stuck so no matter how bad a disaster is rushing can make it worse and Just make sure you're not introducing new problems in the rush to repair the old one if you're doing an arson or things like that Get out of that. I love seeing output. I don't know. It's this thing I have so, you know, I run everything in verbose mode, you know all the time because the computer is doing something It's very satisfying. That's a bad habit. Don't do that Because you'll get a lot of output you have to grovel through to see the one error That that will cause the server to start fail Especially if you're in one of these to this extent possible minimize the amount of data you have to process in your in the human brain Okay, next one. Oh That's not good See toast toast about the toast table. That's one of these magic toast tables Just so you know, that's the rel file node of the of the table that it's associated with so if you can go query for it in PG class So a new primer was provisioned by promoting the secondary perfectly capable machine That wasn't the problem and these errors started popping up almost immediately, you know within seconds of the of it starting up But it was only one row on one table and only some queries spooky and The problem is we we isolate we could isolate easily isolate exactly the one record do a select star where primary key equals this And it works fine. Oh My god now what? So we okay fine. We'll reindex the toast table now that didn't work But if we iterated through the table did a select star from table and and picked up one row at a time that did hit the problem Hmm. Okay The problem is there were two levels of corruption There was a bad toast entry and There were two rows with the same primary key one of which had a bad toast a entry one of which didn't How could two things have the same primary key you ask? The problem is and the index scans only found the good row Just because it happened to be first in the index But a sequential scan found both and report the error So we found the CT ID of the missing row deleted it iterated through all the other corrupt rows and rebuilt the indexes Read the release notes because this was directly related to an existing bug in Postgres But the hosting provider Had an upgraded Postgres, so this is like every consultant's worst nightmare Which is You do a job you're done, and then you get this. Uh, we have a problem call Again a lot of this these these problems new primary was provisioned by promoting from a secondary Put in service in the old primary was dec was decommissioned everything looked fine for a few hours until there were missing rows and Some rows were duplicated. It was as if People are familiar with MVCC on Postgres So imagine you do it because an update in Postgres is effectively an insert plus a delete Is it inserts the new row deletes the old row, but imagine if that old row hadn't actually been deleted That's what we were seeing and There were no error messages. Everything looked fine. Could not get the thing to produce an error message Postgres bug Since fixed the problem is under very specific conditions, which this client Particularly happened to have C log values were not being transmitted from the primary to the secondary. So we were ending up with bad transaction states So somewhere marked as rolled back that whether it shouldn't be somewhere more marked as committed when they shouldn't be And it was really kind of a mess The good news is there was enough information database There are date timestamps and things like that to delete the bad rows and the old database was still available Thank goodness. So we were able to reconstruct the database from internal information inside the inside the user schema So we had to write some handcrafted scripts, and I never want to do that again. That was really painful so first don't exclude that could be a Postgres problem and Do thorough sanity checks on promoted primaries the the issue here was they just They just threw the switch and all everybody went home because it looked fine And no one checked it because the problem as soon as if you actually go in and do a bunch of checks like count star on the different tables it was really obvious there was a problem and Have you know? One of the nice parts about this is they understood it wasn't our fault You know that of that we're not we don't sell Postgres And so make sure they understand that open source software will have bugs And you can't always you sometimes have to work around the bugs because you can't always just call the vendor and get a fix Yeah, so this was a really horrible situation I get a call on a Friday afternoon from a dentist's office and The database was running on desktop hardware and the disk did not RF sync and the power went out with a UPS That hadn't been tested for a while So postgres started up correctly no errors everything was fine, but when you touch certain tables the back end crashed And these tables were central to the application. There was no getting around using them There were there was this garbage sprayed across all of these tables clearly the It was in the process of committing to of writing to these tables when the thing went down And also the sequence is interestingly enough were corrupt as well And you couldn't do a system. Why PG dump no way as soon as they hit these any of these corrupted things everything went south and You had to go through and do a touch crash recovery to find the damaged tables to find the particular parts of the tables that were damaged So I did a schema only dumped to get the blank database because system catalog was fine dumped all the undamaged tables and The good the where they were super lucky is while these tables were essential to the application They didn't change very much There are things like email templates and user interface widgets and stuff that was really not central that Didn't have a lot of important data in it things like the patient data was all in tables that were okay I Did have to do this kind of wonky transform because the old back because they'd upgraded the application since the old backup and It had a different schema, but we got it back and Monday morning dentist comes in. Oh, no, everything's fine blink blink And I had to read manually reset all these sequences from the high point in the data because the sequences were crushed So don't use desktop hardware Um, you know location is too small to have a secondary Even if you only want to run one server use something like wall e that sends backups and wall segments into the cloud Just for you for just to do it so that you have at least you have a disaster recovery backup somewhere and Even really old backups can be useful in an emergency like this had they not had that old backup. They would have been stuck so Who's seen this message without the uke's Anybody actually ever seen this message? Yeah, that's this isn't this fun So the client said there's just too many auto vacuums going on auto vacuum auto vacuum. I was with this database It's always auto vacuuming stuff So, okay, what we're gonna do is set auto vacu these parameters really high And now there are no more auto vacuum-free jobs. They stopped. We're so smart. It's great and then on Halloween Actually These wraparound warnings started appearing in the in the law where it says oh by the way, you only have X number of the X thousand were Transactions to go before shut down due to to prevent X ID wrap around but they weren't monitoring the logs Vlogs are full of these messages, but who logs at the Postgres text logs blah blah blah They did notice but it was too late because this is a really busy site in fact This is Halloween and this site was related to the election So because at this point the excitings were being used faster the auto vacuum could freeze the table Even if we so go in and start manually freezing the highest of the ones that were threatening a shutdown We couldn't freeze it fast enough with the system online So eventually they hit shut down mode and there is nothing quite so exciting as going into postgres in single user mode So what I had to do was manually vacuum the oldest tables to get the system back online and Once the system and started up we I turned loose a script that Said that did like 12 auto vacuums in parallel on the oldest tables because they had several because one of the other unfortunate things about this Database is it had over a hundred thousand tables most of which were close to shut down point so So needless to say I'm doing this and I'm on slack with the client and they're like saying It's really it's this it's that and I just say yes, there's a lot of IO being done suck it up Because there was the otherwise you just hit shut down mode again Which brings up something that I also talk about my disaster my general disaster recovery talk Which is there is that this unfortunate thing in most organizations where? Organizations have this idea that the reason that you the line engineer has not fixed the problem Is someone with a sufficiently grand title has not yelled at you and so you'll get start getting calls from vice presidents You didn't even know existed Saying why haven't you fixed the problem put your phone on silent reduce the number of input channels because if everyone's yelling at you You can't fix the problem so This is an unfortunate thing I that people will sometimes increase these auto vacuum settings because they don't want these These very expensive auto vacuum freeze issues of appearing the the thing is that's okay to do if You do manual vacuum freezes If you do manual vacuum freezes, you can't do just one just crank up the settings though The other thing to do is always monitor the logs There was plenty of time to fix this had they been looking at the logs so if the Use use log stash use something to capture errors this had been going on for days So it wasn't something that you needed to that you need an alert on for example every day run the logs through PG Badger everyone use PG Badger and Email the results to you and just go over to the events tab and click on and see what the errors are because there were thousands of these things and This also sometimes it's also interesting when you go and say why is my application generating five thousand duplicate key violations an hour This can be surprising to the app developers And don't go in and kill auto vacuum processes, especially auto vacuum freeze processes Not a vacuum freeze process is one of those in PG stat activity that says to prevent XID wraparound Think you'll deal with it later. The future is now just deal with it Okay, to summarize just remember basics use good hardware Test your backups Don't have a don't have a backup strategy that's untested Make sure you make sure you can recover and make sure you write down the recovery steps because you're gonna be doing it This at three in the morning with no coffee Stay up on the Postgres news at least subscribe to PG announce and Read the release notes the Postgres release note I people have a bad attitude towards release notes sometimes and some of them are some post not Postgres's But some are really dire because they'll bury the lead, you know, they'll say we added all you know All all save buttons are now are now green and down at the bottom will be fixed giant data corruption problem You know Postgres's release notes aren't like that. They release their The the first the first page screen or so of release notes is very valuable Monitor your log output Postgres is often telling you things. Don't ignore it and get plenty of rest because You know, don't don't try the these problems can be made much worse if you're doing it on a fuzzy head and Thank you and Questions. Yeah, it's um the build calm and it'll be on the front It's I'll do it after the conference, but it'll be oops The good news is control data is actually is actually recreatable if you need to But because it's a fairly fairly small file The that sounds like a problem with either base backup or our sync or whatever you're using choked on that one file It's I'm not aware of a bug that would do that although, you know bugs are It might have it might have died at that particular precise moment, you know at a When it for some what can happen sometimes is on a crash It will because it will it will rewrite control the PG underscore control at that moment If the crash is sufficiently bad that it doesn't get through the process It could have unlinked it but not relinked it which it can happen corrupt statistics can happen that way Also, one thing I suggest is if Postgres crashed It's not a bad idea to go look at the stats and make sure that they are not completely insane You may have lost some stat data because you know just like look at PG underscore stat that user tables and PG underscore stat user indexes because the way Postgres works is it sucks them all into memory updates them in memory and then commits them and Commits them to this PG stat file As it needs to and then when it shuts down it writes them back to the database to save them In a crash that write process could could corrupt that the stats data You may lose some stats and that's normal. That's to be expected on a crash But you want to make sure that suddenly you know, they aren't all like negative or something insane Yes, sir. Oh, yeah, that's true. Yeah Well to find defend it works exactly as documented. Let's put it that way If you if that's your only answer do it is basically the you know is It works correctly with the information available. Let's put that way So oh and just just you know the build calm. That's where the slides will be in a day or two Any other questions? Great. Thank you very much I just I just gave this talk at Paris last week and I ran over I I have I have no explanation Thank you so much Well, there was one of the examples was The backups were the backups. Oh great. Thank you. The backers can be very old You know was is one problem Sometimes you I don't think I gave an example of this But there are times when the backups can where the corruption lives in them is the corruption you're fighting is still in the backups Because for example, there's some bit of corruption You just don't hit it very often or but then the database you you change the app or you change the database in some way And you start hitting it all the time, but the corruption has been there for months, and you just didn't know about it Oh if the system catalogs are Well, it is backed up Yeah, it's just part of the database is but if the corrupt if but you could have a corruption event Well, one of those gets deleted or corrupted in some way because they're just a set of table. They're basically magic tables in postgres They're all the PG underscore tables Probably PG underscore class is the most is the one people are the most familiar with and And PG underscore attribute which PG underscore class holds like the indexes and tables and PG underscore attribute holds the fields Those are backed up like anything else in the database. They're just perfectly normal tables Their PG dump doesn't back them up explicitly because they're recreated when PG dump runs But but but they're not you so if you you like grep through a PG a PG dump, you won't see them But that doesn't mean they're not there it's just they're automatically recreated when you do the create tables and things like that But the problem is that their Corruption on those it's hard to come up with general rules on how to solve it because it depends on how bad the corruption is What type it is? I mean we have had ones where PG like PG class was damaged to the point They couldn't be read and there's not a lot you can do there, you know Unfortunately, you know because you the postgres won't start and it's almost impossible to get it to start at that point You're looking at like bite level groveling through the data Occasionally we've had like dropped in an old PG class and tried stuff like that But that's pretty that gets pretty hairy. So yeah, but no there they're there for sure backed up, you know If you do a file system level back up, they're just part of the file system So they just they just happen if you're doing a If you're doing PG dump, it will rebuild them as part of the restore process Yeah, use it to use it to prime staging is one thing I or developer machines You do it once in a while you don't have to do it every day, but you should do it once in a while Yeah, because otherwise, you know your backup process could be broken and you just never know