 Okay, so let me introduce Federico here, our last speaker for the room after no weekend that you have been getting all this knowledge from open source project and free software project, probably you are a little bit tired now. I was lucky to be also the program committee of this year and we selected specifically Federico to be here because we know that he has good energy and very interesting topic to close up our room. So Federico is probably going to be able to deliver our very last talk in a very good way. He used to live in the UK, but before Brexit he realized he had some enlightening conditions and I will come back to my home country. So directly from Italy, please welcome Federico Campoli. Thank you very much. So yes, my name is Federico living in Italy and yeah, today this is the last talk of the day. We will have some sort of stories rather technical stuff. So I will keep you awake because I has been a very tiring weekend as the FOSDM normally is. So little word about myself. This is my lovely cat, Hodzi. This is me. It's better. The cat is on the front because it's cutier. So it's much better in this way. I was born in 1972. I look younger, but as my girlfriend says, I am a time lord. I'm a huge fan of Doctor Who, so that's the reason why I don't look old, but I'm really, really old in my opinion. I started with the IT in 1982 with the Tron movie, an amazing movie. And I joined the Oracle Secret Society in 2004. This is the way I like to describe that period of my life. It's like being part of the Secret Society for sure. In 2006 I discovered this crazy database. At that time it was the 7.4. It was a massive bet for me. And it's a bet I believe I won because now POSGRACE has been constantly gaining momentum. It's becoming something amazing. It's still amazing. And like Debra, I have a POSGRACE tattoo on my left shoulder. So I'm really committed to POSGRACE. I work as freelance DevOps and data engineer in the me and time when I'm not at the conferences. Those are my contacts, technical blog, I write stuff about POSGRACE. This is my Twitter, so if you want to follow me, very happy to get that. And this is my GitHub page where I have a couple of interesting projects. One of those I spoke yesterday. I talked yesterday at the Python Dev Room about a compactor for POSGRACE in cloud without the use of extensions. It's an interesting project. Another one is a replication tool from MySQL to POSGRACE. So it can help you to escape MySQL if you are using it. And the last one is LinkedIn. So today we are talking about RTFM. Anybody knows what RTFM means, right? So the F, read the F manual. What the F stands for? Fantastic, fabulous, funny, fancy. Well, not exactly. I'm not speaking that word now, even because it's in live streaming. I don't like to swear in live streaming, but that's it. So what is the talk about? Telling stories of people that didn't do the RTFM and caused issues. So issues, disasters. So like any other disaster, I decided to put DEF CON levels like war games. So we have disaster on DEF CON 5, just a startling noise. The DBA is vaguely impressed. So, well, something happened. The second level, DEF CON 4, tripping over feet, DBA alarmed. Oh, something wrong is happening. And earthquake for DEF CON 3, jumping on the seat. And I believe me, having an earthquake under your seat is something very, very startling. I live in a quite seismic area of Italy. So sometimes we get these sort of emotions. DEF CON 2, asteroid dropping from the sky, DBA freaking out. And then the DEF CON 1, Daleks invading the earth. DBA going berserk. It happened to me going berserk, wearing all around the office. Then we have the dramatic persona, still from war games. This is the professor Falken, myself, old, ugly, quite cynical. And it's the perfect incarnation of the DBA, right? And the others, David Leitman, young, reckless, very, very smart. And we have ideas that can cause the Third World War if they are not careful. So, table of contents. We have three stories and a wrap-up to discuss about what we've seen. The battle did it. Emergency shutdown and cast a spell. Three stories or three RTFM that never happened and what caused them. Now they were fixed by myself. So let's start with the battle did it. DEF CON 4, not a big deal, caused by the others. Let's see the context. We had, that was back in 2013. It was Postgres 9.1, still the old PGX log. There wasn't too much about monitoring. There were no fancy things. And we had this system with one very expensive Fusion IO cards. At that time, they were pioneering the disks in PCI Express. They were massively expensive. This system worked quite nice. But here at this strange table, we used that as a working queue. So some records, all the records were picked in some way. And these records were used for processing some stuff in other places. This table had just 100 million rows, not so big in my opinion. Quite big for that time. Now it's getting medium size for the sizes we are using now. And this table had two timestamp fields. These two timestamp fields were updated twice. One for the start and one for the end of the processing. For each row. And each row was 160 bytes in average length. Now anybody knows the MVCC, how it works. In Postgres there's no updates. So this thing was inserting twice these rows. Every time each row were processed. And they stable at indices. So the rows changed pages and the indices started bloating. And also there were the primary key onto integer fields. So the problem wasn't exactly the bloat. The table coped quite well with the performance. But the problem was the SSD. On SSD we have limited writes. And writing a lot of stuff on this SSD, even with the round robin for the blocks, caused a massive exhaust to know the writes. That was the IO just on the PGX log. Thanks goodness it wasn't rotating disk. Otherwise it would be much, much worse. And the data files were even harder in this thing. So the table was rewritten every day. Every day this table got new blocks. And the rows started accumulating inside it. The dead rows and out of vacuum started every six hours flashing more blocks on the disk. Consuming even more the writes on this Fusion IO. And in just eight months, the available write dropped for 80% to 44%. Just eight months left before the Doomsday. And the Doomsday in SSD world, it means disk in read-only mode. At some point it just stopped accepting writes. Daleks coming down to invade the earth. So how was fixed this thing? Well, the primary key I redesigned slightly this thing. I didn't touch the existing table because it was so deeply coupled with the rest of things. So I could not change this thing. But I created a new table aggregating the fields of the primary key. The first field has common value. And the second field I use it as a grouping key. Use it to be stored inside an integrated array. So accessing from the first field and accessing the single pieces of the array, physical location of the array, I could access exactly the point inside the other table without rotating the rows inside the other table. And when the fix came in, the world generation rate dropped to 40 megabytes per second from 1.5 gigabytes. And this is the mooning graph of the writes available. This is the operation, the mad operation. And then when the fix came in, everything went flat. No more risk. I estimated that the remaining PB will last for at least 10 years at that writing rate. That was amazing. So how do you avoid this sort of situation? Doing RTFM. And for RTFM, I can recommend you these two links. NVCC slides and video from Bruce Momyan. I love Bruce presentations. Go to his website, his momyan.us. Watch it. It's something you will learn a lot of stuff. Every time I visit, I come to the conference and there's a talk from Bruce. It's something like bam from me. My brain explodes every time. So watch this video. It will explain very, very clearly how the NVCC works and how you can avoid the risk of exhausting your writes but also reducing the IO. So reduce the IO. It means having a more efficient caching, a more efficient database and more responsive working for you. This is the first story. This was the first story. So let's move on to the next one. That was me. That was the DEF CON 1 situation. Emergency shutdown. That was an amazing discovery. By the way, this story, tomorrow on my blog, it will appear the blog post about this story with all the explanation. So if you look at the blog tomorrow, you will find even better described about what I found about this experience in this experience. So context. Virtual machine. That was in 2016, 2017 probably. Postgres 9.6. Virtual machine for business intelligence. Fairly big database. 1.4 terabytes. Real-time replica from MySQL to Postgres. Using MyTool, PGChameleon. This tool can first read the rows from MySQL, stores into Postgres and then use this information for generate the DDL and replay against Postgres. And now this is done through a PLPGSql function. Replace the data and also manage all the error that can appear during the replay. So you can exclude tables. You can do some interesting stuff. So everything works very, very nicely. Monitoring not yet implemented. My bad. I was too busy on working on that thing. And yeah, everything went fine. People and analysts were very, very happy. They gave me this t-shirt as a gift for bringing Postgres in place on MySQL. And the day started normally. And then I screamed. People not running any more queries with horrible errors. But the database was up as usual. But the nightly maintenance failed. And I got this message in the log. Database is not accepting commands to avoid wraparound data loss in database analytics. This was the only shutdown emergency mode I even had in my career, starting from 2006 on Postgres. And I was the cause of that. So I'm very proud of that. So I don't know if you have even read this part of the manual on the Postgres.org website. But it's quite scary. What happened when these messages appear? It means you have at least one tuple at just less than one million rows from disappearing from the database, which is inside the database architecture. The rows are compared against numbers. And these numbers are just 32 bytes integers. So they are 32 bits integers. So they wrap every four billion times. Every four billion numbers. They start from the number one and restart the sequence. And everything bigger is in the future. Everything smaller is in the past. So as soon as it wraps around, everything becomes in the future. Postgres have a very clever way to avoid this scenario. But this alpha, the number of rows is a lot available. Two billion in the future, two billion in the past, whatever is the number. And that is the reason why. So you have to run vacuum or auto vacuum to perform the operation of freezing the rows before this problem happens. So what caused that? Insufficient auto vacuum processes. Databases should have the auto vacuum on because auto vacuum does a lot of stuff and in particular can save you from having these sort of issues. But tuning the auto vacuum is very important because fewer auto vacuum on databases with a lot of updates may not pick the correct table for doing the auto vacuum when it's needed or when the auto vacuum to prevent and XID wrap around can start too late for saving your database. And the other reason was myself. There is an apparently, I haven't found any reference about this behavior, an apparently undocumented behavior of the PLPG SQL functions. You know, the functions in Postgres are single transaction. When you run this function, when it finishes and commits, you get all the changes. Otherwise, you roll back everything. So they consume just one XID. Right? Well, not really because if you have an exception inside your function's body, inside this exception, there is a DML, insert, delete, update. Each time this DML is executed, it consumes an XID. So my function for replaying stuff in PG Chameleon was consuming 100,000 XID every time completed a batch to replay. And it replays exactly the DML for loop in an exception. Consuming an XID. So I will show you slightly more in detail this thing. So we create a table T test with the ID integer, no foreign keys, no primary key, just the data. Then we have this function that fn loop with no exception. We declare this number and then we loop over this number from 1 to 1000 and we do an insert for each loop. Okay? Then we have the same function but inside the loop, we have a begin exception when others then null. So if there's any error, just the function, the iteration and continue. So you have little sense in this context but it can be useful to have the exception. So let's check with this database. Vacuum3 is my database starting from 3. I run three queries before doing that. Select that name age of that frozen XID from PG database where that name is equal test. So it returns the name inside the database that is not frozen. Then I execute the fn loop no exception and after that, my XID increased by 1. So jumped from 3 to 1 as expected. Now let's do the same with the other function. Age starting 5 after this query. Then I run this thing and the age jumps to 1006. So 1000s XID 1 per each dml inside the fn loop plus the XID for the function itself. So how this was fixed. This is the emergency handling in the database universe. You forget everything, you silence everything and first thing, silence slack because people were screaming in private message getting a lot of noise from slack. So silence slack. Then I make the meeting room put the message I know on the on the screen on the door and then start the cluster in single user mode. It is an amazing experience I don't want to repeat anymore. If you have ever worked with PSQL PSQL is fantastic line completion backslash commands no. The single user mode is something horrible. You get a lot of stuff, a lot of noise and you have nothing to do you have to type every single letter of your query. You can do probably maybe copy and paste maybe it may work but there's no way for loading files in external I think it should be improved this interface so it should be kept that way so people are so scared that they will try to avoid at any cost. So start the cluster in single user mode get the A.J. in tables and vacuum the A.J. in tables and do a post mortem analysis so analyze what went wrong who was the the cause of that thing myself and so yeah and this is a very useful query for getting the A.J. in tables. I think this one is from the Postgres QL Wiki which is an amazing source of information or from the PG Check Postgres the Bucardo Perl script for Nagios but this is very useful because taking into account the frozen XID for the table but also for the associated TOS table because you can get a table that looks nice the associated TOS table is aging so this query gets everything outside for any relation table or any materialized view because do not forget about the materialized views materialized views are basically tables that follow the same rules and when you get this maybe something more fancy can be done something like using format for building up the vacuum statements straight from that thing dumping on a file this is a very oracle style for using and still using that approach so RTFM what I missed in this thing well I wasn't able to find this case on the documentation if you look on the exception part trapping exception on the Postgres online manual use exception with quotient because they are more expensive compared to the function with no exception blocks but it says nothing about the exhaustion of the transaction XID so it may be useful to add the warning I will add by myself I will submit a patch for the Postgres manual I need help for building up the entire thing so how to build the documentation I tried and I failed so I need to learn and that's the amazing part of this job you learn constantly so cast a spell that is another amazing story let me recap the day 2012 I started the second job in the UK on Postgres QL very large database and that time there were a mixed environment between Postgres 9.0 Postgres 9.1 then I upgraded everything to Postgres 9.2 it was an interesting experience so we had this large database at that time it was large 2 terabytes still fusion IO on that thing on that database I remember I joked about playing Tetris with the tables and table spaces periodically the active partition were moved on to fusion IO but there was no space for keeping everything on to it so the expiring partition were put on rotating disk and every month there was this movement of Tetris so playing a lot with the space and the other horrible design and for horrible I mean table with few fields one etch store field and Java mediated the structure straight into the etch store field and that was causing a lot of problem until the day I removed that thing and we ended with a 91 field table, quite big and we had it was one of my first assignment doing performance tuning on a query that retrieved just 150 rows in 6 minutes okay the storage were big but that was absolutely insane the storage it was in SSD it was in lightning fast so there was something else we had this super expensive fusion IO storage and super expensive CPU and memory state of the art in 2013 we are talking about a 35 thousand pounds server bare metal, no virtualization everything installed on that thing and the performance horribly so checking the execution plan first check the statistics are okay then check the execution plan the execution plan was absolutely okay, no problem at all there were there were subquery with wrong join criteria try to fix it but no success in improving the speed of this query but at some point I did a select start from the same query and the query completed in seconds not minutes something started alarming some bells started ringing in my head so what went wrong in this thing HStore is not so complex it's some sort of grandfather of the Jason so is every time whatever you store inside the key you get the text data type so if you are a strongly typed language on the other side you have to do a cast that is data type and the developers instead of doing a classic cast column, column data type decide to write a plpgsql function for each type they wanted to cast you are reaching the point you are seeing what's the problem so we add this function this is pseudo function it's not exactly the same how it was trying to follow the same idea cast to integer, cast to text why cast to float cast to timestamp they passed the key name and the metadata store the meta store as a parameter so they used this information to retrieve the information and doing a cast straight into the return row it was an amazing idea well and we have this sort of select count another 80 meta columns written in this way so every time, but it was also worse this is the optimized version they made this thing at lower level of the sub select when the sub select processed hundreds and hundreds of rows before doing the final filter for the 160 rows so all time every time you have something like that inside your select list the hstore.so the shared object get accessed and the transformation from the hstore stored on disk which is basically text separated by pipes from the hstore get transformed in memory transformed in datum the library does the magic there's a lot of context changes between the memory, the main memory and the memory used for the shared object then this thing it gets passed through posgres posgres does the final cast because he has to cast into the primitive type and finally the shared object get discarded this one, then he start again and again and again count for every single row it was even faster, 6 minutes it was incredibly faster in my opinion so how this was fixed yes oh no, our method is ok, you are wrong you don't understand all the implication we are using all the structures we are using and I want to do a simple cast well, there was another function I don't mention it at this point which made a slightly different thing converted to text and then perform an internal aggregation in a different way array aggregation works not skipping in the nulls they kept the nulls inside the aggregated array they've written some sort of different aggregation function just for keeping in the nulls inside the array, I don't know why why people want to keep nulls which are basically nothing but they made it in this way that was the only function left in place because there was no way to change in another way well, at that time I didn't know too much about the custom aggregations custom aggregations are something amazing in posgres, have a look at the blog post about from Josh Berkus and on my blog how to build up a custom aggregate, you can do fantastic things with this thing so, it started with the argument when the argument finished probably a couple of weeks I finally I've been outrised to rewrite the join and get rid of all the select list and the query dropped to 10 seconds it will be faster probably if I will be able to remove also the final the final piece of code so live long and prosper and what to do for RTFM this is a very complex this is a complex answer because working with posgres is not simple, it seems simple but posgres is an incredibly complex environment an incredibly complex product the first thing to do is read the docs so get into the concept of what is a PLPG SQL function maybe if they would have used the SQL function the damage wouldn't be too big because the SQL functions are built in in posgres and they get in lines so it would be much much better so use the procedural language only if you need the procedural language I also seen procedural language used for transforming data in ETL procedures this is the wrong way to do it because the performance is completely screwed the best way to do this sort of stuff is to do insert select so if you can stick on the pure SQL as much as possible you will get the best performance and the less churn in memory caused by context changes if you have some idea before starting implementing it and maybe wasting your time ask the community we have a lot of channels for doing that of course mailing list they are very very nice IRC on the posgres channel, on FreeNode there is a Slack application and there is the telegram channel if you use telegram and in this room there are two admins me and Janine on the bottom so we are the admin of the telegram chat so and the last thing if you can get it because it's still quite difficult to find pure DBA or posgres expert at DBA level higher DBA and more important listen to the DBA is do not I remember there was a t-shirt called with written I am a DBA can we start assuming that I am always right and then we can continue discussing well, it's not that true because DBA make a lot of errors and a lot of mistakes I'm here for as a testimony about that so important if a DBA comes in and asks for some optimization is because not just because he is arguing for something abstract is because there is a real problem even if the problem is not yet there DBA usually look forward to the problems so I've been incredibly fast so let's go for the wrap up the well I can explain how to cook a carbonara in the extra time if you want seriously, do you remember I presented how much cream no cream so wrap up about this thing greetings the only thing in the move is not to play, not really true RTFM reading the manual is important but it's also very important to read between the lines of the manual and in Postgres in particular we have an amazing opportunity other databases closed source databases or the databases starting with all do not allow the users and the DBA to access what's happening, to know exactly inside the database the even wrap and obfuscate the procedures in Postgres we have access to the source code I like to think about the postgres source code as a super poetry written in C this is what I've written on my book because the is it poetry you can read you don't have to be a C developer for understanding what's going on in Postgres first because the read me is they are amazing every single section the read me will explain a lot of stuff so if you are in doubt check inside the source code and if you are C developer you will be massively advantage about the what's going on inside your database and maybe you will build up something more and more efficient so basically doing RTFM with steroids maybe I think we are finished and yeah this is the license and that's all folks I've been 16 minutes earlier, sorry been very very fast and there are any questions who ask questions if I go to the right so when you wrap your query then you talk to the sub query it becomes faster why? no sorry I didn't explain clearly basically the cast functions the PL function were inside the sub query so the query was even slower because the sub query process more rows than the upper query if inside your sub query you have a filter this sub query first filters these rows and then returns these rows and when you process these rows at where condition also they get passed through the select list and then discarded meanwhile they got filtered so moving all the expensive computation part outside at the maximum level according with the eventual aggregations or anything it can improve massively your query I remember there were a query we format function a two-char function at lower level just moving this two-char to the upper level it changes completely the query drop it from seconds to milliseconds the first example sorry? can you repeat the question? what was the piece of mind in the first example? the first example the butler the exhausting of the right so what I did I changed the logic inside this table so the original table add two timestamp fields so the processor updated the first timestamp and set starting the processing and then when it finished updated the second timestamp to say okay I'm finished and then this was used in a select order by timestamp to get the row no yet processed so this was causing a lot of rights what I did I used the primary key which was a composite key by two values to aggregate this information inside a one common value and an integer array are you familiar with the arrays in posgust? so the integer array is positional so pointing the single element to access the exact value of the primary key and using this primary key without updating anything I will just look on the row and say okay this is the row I want to update and then another table change this counter move my pointer inside this this array without touching anything and then I add the periodic re-aggregator which run every day for collecting new rows inside this big queue so one of the way to improve the performance in posgust when you deal with queues keep all the updated fields in one table in one separated table because in posgust you generate new rows every time you perform an update if you have a row 160 bytes and you update the boolean you generate 160 bytes not just the boolean so this is very important to understand for avoiding this sort of stuff but more important avoiding bloat on table and bloat on indices any other questions? well that was interesting because oh sorry okay I forgot about the there's no okay so why autovacuum didn't work in my second example that was a problem caused by the number of autovacuum workers so there were too many tables that needed to be vacuumed and there were too few autovacuum workers so autovacuum spend a lot of times on the same tables without picking the tables with the data aging okay the ydxid ydxception adds to the to the comp of the function well I can suppose I spoke with a friend of mine which is sql developers c developer is also I think he also contributed to some part of posgress so and he said it may be because this thing because it needs to be rolled back inside the transaction needs an extra transaction for managing the rollback at data area level so inside the pgx act where the commit status is so you basically get one xid then all the xid that they get committed in a virtual way meanwhile the function progress and then when the function commits everything gets committed and everything becomes visible so this can be a structural reason so is a choice made for having possible the rollback the exception to cancel and rollback the pieces executed okay asking about posgress the stories were all related to posgress 9 so it's back in the past it's a flash from the past about this thing and in posgress 12 probably you will get the same results because the it didn't change too much we still have the mvcc we still have auto vacuum which works in the same way like in the version 9 and also the plpgs function works in the same way like the like it was in the 9 I think in 9.6 starting running auto vehicle when auto vehicle starts because of the xid repair so auto vacuum is prioritized over the xid repair so probably the okay just the the rows so it will be much more effect so probably the second story wouldn't happen with the newer version cool thanks for the information I think so I think the oh yes is asking if the save point inside the transaction will consume an xid I I want to try but I think he may because the in any case is managed at xid level so yeah but I will try I will try thank you I have a question you mentioned ways of getting help from the community like the main list in the telegram chat and everything but when you figure out that there is something that is missing from the documentation how do you contribute to help the documentation to be better well this is a very good question so if you find that there is a missing part of the in the documentation how do you contribute well I will start with following the listening to the talk of leticia demystifying contributing postgresql which is a very interesting talk so so it should be still online so it is an amazing talk for that can drive to contribute to the postgres community and how to write the documentation so I will just and this is probably what I will do submit the patch for adding this information to the alert box on the postgres documentation so filling up the missing parts ok ok the question is if using the arrays for handling the queues is the best way to manage queues in postgres and the answer is yes definitely because the arrays in particular if you do not update the array ordering you will have a complete static table and you can just use a pointer an integer for pointing the element inside the array so you will limit the updates just to that tiny little integer so yes can I do product placement well there are a lot ok the question is where to find how postgres works internally so I will start from the manual from the source code this is where I learned how postgres works internally I have written a book about this thing it will come out next week but there are a lot of informations about this thing so I will start from the readme inside I don't remember is the storage tuple or something like that or epage.h there is the readme with all the discussion how the entire thing works but I will start from that thing read from the source and you will never get wrong do you know explain how when you run explain analyze and then your query is going to tell you whether it is using oh you are mentioning you ask and then you will see ok using this index or it is doing a sequential scan oh you mean sorry I misunderstood your question ah you want to know the internal ok southscode you can't get wrong but if you want to know what your query is doing use explain or explain analyze display the execution plan and see you will see what the query is doing from that point you continue with the source code ok ok so thank you very much thank you very much