 Okay, for the next talk, please welcome Federico who's going to tell us about Which one is this red clouds? It's a reparker for pause breath. Thank you very much Hello, everybody. Can you hear me all nice? Okay. Thank you. My name is Federico. I'm Italian I lived in the UK for a while and now back to Italy. So very happy to be here here. I Will present these two I've written over probably six weeks And but before things let's start with me This is my cat is much better cutier. I'm ugly. So it's much better to show him My Twitter is for doctors scarf. I'm huge fan of doctor who and I was born in 1972 I look younger, but I'm a timelord, right? so passionate about it since 1982 join the Oracle secret society in 2004 fell in love with postgres in 2006 and I have a postgres tattoo on my left shoulder along with this guy is just joined Recently, so I'm very committed to the project. I'm also involved into the the postgres community Before starting little disclaimer. I'm not a developer not really pythonic. I'm a DBA. So I Aided I made it by everybody which is the role of the DBA and I ate everybody So to put things in the right perspective. I use tabs So please let the eight flow through you So let's start with the with the stuff with the strong stuff How many postgres user in there in the room? Whoa, I love you. I love you all Despite I am a DBA Postgres QL is the most advanced open source database is a great product Gain in momentum. I'm very impressed by that and I'm very happy with that Available as a database of service on cloud providers. So we have our DS Roku other stuff which limit the action you can do on postgres QL and Unfortunately have his own rub back to to is not Good as in some in some areas he needs improvement or is just architectural stuff So we need to leave with that. So what the point I want to stress today is the MVCC How many of you knows what is the MVCC? Thank you So multi-version concurrency control is the way postgres makes things consistent in read and out you can manage stuff in read and write with very very minimal lock the Implementation is very efficient. It's very different from other databases postgres It doesn't have the rollback segment for example and the snapshots are managed on the data page is amazing You get everything you need for reading consistently directly on the data page the Data page is the problem at the same time because the transaction ID is stored Inside the row in two system field is a transaction ID is just four bytes integer So you start from one wraps reaches four billion four billions then it wraps again There's a mention is to avoid the wraparound failure is a Topic for a different all but two system fields are used for track the visibility of the row or topple as is called in postgres There's a one field is the X men X mean is called also the insert transaction ID When the top all is created the transaction ID which created the table is stored inside this field And then there's another one X max which is set when the table is deleted So postgres it doesn't remove immediately the rows from the physical layer It leaves in place and this information is used for tracking the row visibility But there's no field for tracking the updates. This is a quite a mystery Nobody knows about this thing, but in postgres There's no such thing like an update an update is made using an insert delete Inside the same transaction the old row version is deleted the new row version is inserted So you get two rows one becomes dead and so becomes Wasted space, but they still needed for running transaction that needs to see these rows But these old row version are removed by vacuum when vacuum kicks in manually or automatically and When the rows are updated they might change the data page is not Automatic that row stays inside the same data page and the data page is the data block the physical block and when this happens when the row changes the data page the Indices need to be updated as well and these may result in bloat the indices bloat massively Just because the tables are updated too much So how to reduce the risk of bloat? That the base design that a model should avoid tables with large rows. It should be splitted the Third normal form is very helpful in that thing most of the grouping the Most updated fields in separate tables is a massive strategy to avoid bloat because if you have one row of 200 bytes and you update just a boolean flag you generate another 200 bytes Whether you change the boolean or change a different different field Those fields should be used for looking up to the rest of the data Remove the unused indices. There's a way to query posgress for knowing that thing or doing routine maintenance vacuum very low impact Less effective or indices or re-index. I impact I effective you can re-index stuff But until posgress 11 is a blocking procedure So locks your table in read-only mode and sometimes can prevent also the the reads It depends how the index is used Dealing with an existing bloat I found that a busy in very very bad conditions indices hundreds of gigabytes for storing just Two three gigabytes of data. So how do you? Deal the first thing built in in posgress is vacuum fool But this is a massively blocking procedure You get the table locked in exclusive mode until the vacuum is complete and on a On a cloud hosted database with hundreds of gigabytes of table you can take days because the IOPS are limited Is not excited. It's not really feasible this thing pg repack is a Historically the way for repacking tables Without locking Is a is a very good tool But the problem is you need to install the extension and on cloud database. You can do that So In June I had this cost customer. So they had exactly this problem cloud database with the Impossibility to install pg repack. So I I came out with this crazy idea and I believe me it was incredibly crazy considering all I had to develop. So they said, oh, let's go for it. And that Is the red cloud repacking in cloud. This is the the reason of the name Uh, he uses a similar strategy like pg repack pg repack Creates a new table Regenerates the data replace the data against this table and then swaps physically the the data at system catalog catalog level Uh red cloud. It doesn't use these strategies. So is a logical procedure It requires minimal access on the database and he For example on iroku rds. I think rds recently added pg repack to the to the list of extensions. So probably It will be a better idea to use pg repack if available But on all the others that do not allow this sort of extensions rep cloud can help to Reduce the bloat of the Of the tables. So how this thing works? Create a new table like the original one empty Creates a log table with the same data type of the original table posgers have this fantastic thing called composite data types Which describes also the tables and then the trigger stores the load changes That are happening onto the original table into this this this log table Then the data is copied with just a select insert into the new table The log changes are replayed against the new table And then attempts to swap the table. So Renames the old the old one renames the new one to the old one and live long and prosper Not exactly because the swap is absolutely not trivial Uh, because the Postgres dependency system Is something you will see if you try to drop a table when there is a view attached to the table and you get Oh, I cannot drop it. You have to use cascade. Don't use cascade. It's very dangerous uh, so Even when you rename your table all the dependent object follow that table So you have to rebuild exactly the same objects onto the new table and then You can drop safely the old table. Otherwise you will lose information So what they have to be recreated? sequences, of course, otherwise, uh, any, uh, insert with the default value using that sequence It will fail Views materialized views and foreign keys but The problem is any other object that is related with these objects Needs to be rebuilt as well. So the entire chain of Is it has to be rebuilt backward and then everything has to be dropped and recreated in the correct order. Otherwise, it will fail And also we have another problem the deadlock The deadlock is when one transaction is waiting for the lock on another transaction Which is waiting for the lock on the previous transaction. There's no way to exit this chain so posgress kills one Transaction randomly and freeze the uh, the deadlock When it happens and this happens in particular when you have a chain of Foreign keys or this table is referring to this one. This one is going back. This one is going back there So if you try to rebuild and or drop these Foreign keys, meanwhile, they are in use. So they are being checked and validated against the data changes You will get the deadlock. There's no, uh, possibility. This is this is a schema very simple This is the pg bench schema with foreign keys It's very useful for testing this sort of things and the first thing I learned when I tried to run the swap It was oh deadlock So How did I approach this thing? I decide to Add the choice on this tool so the user can can decide which which strategy to use basically Red cloud does nothing when he starts when detects the deadlock. So he waits for the database, but it's possible to to to tell Red cloud to Kill or cancel the offending query the other query. So the system Automatically checks if there is a deadlock potential deadlock happening and the block one of the blockad query Not the one that is running this swap is cancelled or killed But sometimes it's not a good strategy for this thing. So Uh, it's also possible to run just a Prepare swap. So everything is made Until the end before the swap. So the table is created the logs are uh, are populated by the trigger There is the replay functions running to replay and catch up with the existing table So it's possible to stop the application and say, okay, now we can swap without the risk of deadlock So few seconds of downtime for swapping the relations I think is something it can be afforded rather a long-running vacuum full also It can change the fill factor for the table fill factor is a very useful thing So you can set the parameters this is a Something still i'm working on it that moment they can set the fill factor but my idea It will add the change table space changing the stable space Everything is configurable at this level. So it's possible to create the new table with the different fill factor So the new table will be less impacted by bloating because there will be more space inside the data pages For the updates. This is one of the strategies For designing tables that might be affected by the bloat It can clean up the json b and json fields from the null keys So it makes the json fields more compact And also it's possible to remove the keys But just from the json fields json b fields because the function that runs this operation is present only for that data type more it will come And this is the uh the tool in in short. I would like to recognize Give some recognition. This is the company that gave me the Believe it in me And gave me the permission to release this thing as an open source project It wasn't supposed to be that but I think it might help a lot of people That depends they see resolution is derived from the pg admin team. I use their queries adapted to my need They were amazing They saved a lot of time for building up the crazy the dependency system in posgras is is insane So you have to get through a lot of outer join for getting everything And also the replace strategy is inspired by the pg reback code So waiting for the transactions before starting effectively the copy Is something inside the serial eyes The transaction isolation level is something that Helped me a lot to understand how to approach this problem This is the github project is uh on this is my github There's also other projects the other important project on this on this github page is pg chameleon a replication tool for from my sequel to posgras This pg chameleon is very usable And it's something I want to pick up. I've been Distracted in the last year, but it is something I want to improve And also this thing the is available on pipi so you can just Is an alpha release at moment, but we tested on production. It worked incredibly well for repacking a 300 gigabyte table It took about four days Uh, which were supposed to be downtime and it wasn't so the customer was very very happy about that And they The license terms are the posgras license. I like this license. I think pg chameleon started as a bsd license It will move on the posgras. He makes more sense is a posgras related project And if you want to break it and report or you want to help me to build up this thing I think I will build I will write the Collaboration guidelines very very soon for this project. Uh, I will be very very happy to to get any help about that thing Those are my contacts This is my blog pgdba.org where I write stuff about posgras twitter already Said github and my linked page work under the creative commons And if you want to hear about horror stories tomorrow, uh, I will speak at the posgras dev room about rtfm Anybody knows what rtfm means? Sure you do So, uh, I will tell about interesting, uh, and funny things is the the last talk of the day So I have to be entertaining And that's all folks and there are any questions. Thank you for listening It's not directly related to the rip plout, but do you consider partitioning as the solution against bloat? Well, uh, partitioning is not designed for a solution, uh, for the bloat is for having a more efficient storage when you have a monolithic table So, uh, solution for the bloat is, uh, looking at the storage parameters. So feel factor, uh, trying to avoid, uh, unnecessary updates And One thing you have to look into the, uh, pg start all tables There is the, uh, the field called, uh, tube hot, uh, counter or something like that I don't remember my my shared buffer is limited But there's something with the hot inside if this counter increases It means the table when it's updated it stays inside the same page. So it's a very good thing So if you have a lot of updates and a very a similar numbers on the other field You are doing very very right Any other questions? Cool That's all right. Thank you