 Hello? Okay. Okay. I'll turn this off. Okay. Does it work? Yeah. I think it does. Thank you. Is that better? Okay. So this session is for folks with an Oracle background who are looking to migrate from Oracle to Postgres or just anyone who's interested in learning the difference of similarities between the two database engines. Okay. So what we have time to talk about today is we're going to start off with a quick intro to Postgres, and then we'll compare the differences and similarities between Oracle and Postgres. We'll look at the architecture, we'll look at MVCC, which stands for multi-version concurrency control. We'll also look at the differences between Oracle and Postgres, as well as the Postgres extensions, which are used to add additional features to your core Postgres functionality. And then we'll review some of the common mistakes we see when folks migrate from Oracle to Postgres, and then we'll end with some key takeaways. All right. So let's start with a quick intro to Postgres. So Postgres started as an ingress project at UC Berkeley, led by Michael Soenberger. It was written in C and was first released in 1997. And since then, it has been grown to be one of the most popular open-source database engines. It's supported in all the major platforms, and it offers a wide variety of support, be it community or commercial support. And there's a major release every year, and the current version is version 16. So Postgres has a lot of features that are quite on par as commercial database engine like Oracle. It can support HADR, Acid Compliance. So it can do acid transactions, partitioning, sort of tiered storage. And it uses NVCC where readers don't block writers and writers don't block readers. Also Postgres, you can do online maintenance. You can do log-based or trigger-based replication. So Postgres has a lot of features in store that are on par with commercial database engines. So some folks migrating or thinking to migrate from Oracle to Postgres wonder if Postgres can support large database sizes. So here's a list of maximums that Postgres can support. You can see that Postgres can support a database size up to C4 zettabytes and a table size up to 32 terabytes. So definitely Postgres can support large databases. But the takeaway here is you shouldn't have to concern yourself whether you're going to hit these near maximums because if you are, you're probably hitting other limitations for hitting the database maximum size. For example, if you had a table size near this capacity, you should probably be partitioning. Or if you have a row size that's near this size, you might be hitting a limitation like the max column size, or maybe you're not using the right data type. So now let's compare the simulators and differences between Oracle and Postgres. So there are some folks who are looking to migrate off of Oracle just due to the high cost or concern for vendor lock-in, or just the lack of ability to customize on Oracle. So Postgres has become a popular alternative, and folks with a background in Oracle find it an easy transition migrating to Postgres to a lot of the similarities. So let's start with the terminology first because there are some slight differences in the terminology between Oracle and Postgres. So to find a row ID, in Oracle we use the term row ID, where in Postgres, it's CTID, which is a tuple identifier. And rows are tuples in Postgres, tables are relations, data blocks in Oracle where data is stored in pages on Postgres. And for redo, Postgres uses wall, which is right ahead logging. And for undo, multi-version concurrency control. And for SCN, the terminology in Postgres would be LSN, the log sequence number. So now let's get into the comparison of the architecture between Oracle and Postgres. So here we have architecture, the process and memory side by side. And you see there's a lot of similarities in the architectural components. So shared buffers is similar to the database buffer cache in Oracle. This is where the data pages are read from disk into memory. The background writer is similar to log writer, or database writer, where it flushes the dirty disk pages to disk. And then we have the wall buffers, which is similar to the redo log buffer. And the wall writer is similar to the log writer where it flushes the redo logs to disk. And then Postgres has a checkpointer process as well as an archiver process that's similar to Oracle's checkpoint and archive process. All right, so let's walk through the Postgres processes. So Postgres utilizes the multi-process architecture, which is similar to Oracle's dedicated server mode. And the three main types of process in Postgres is the postmaster process. Starts off as the primary process. This is the first process that gets launched at startup. And Postmaster responds to other processes and supervises them and starts with processes if they fail. And the second one would be the back-end process. And there's a back-end process per connection. And these back-end processes are the worker process. And its job is to fetch data and communicate with the client. And then we have the utility back-end processes. And those are similar to Oracle's background process, where you have a checkpointer that checkpoints wall writer that logs to disk, an auto back-end that cleans up stale data from tables and indexes. So let's walk through the process how a connection is established in Postgres. The first step would be the client initiates a connection. And that connection request is sent to the postmaster. Their authentication would be performed. And Postmaster is similar to Oracle's listener process, where it listens to connections on a specified port, and then it will spawn a back-end process. And from there, the back-end process calls back to the client, and then the connection is made. Yeah. With Oracle, you have the share pool, and with you have the dictionary cache, you have the SQL cache. So one thing to point out with Postgres is that all part of those caches happens at the connection level. So the catalog cache is at the per connection level, and as well as anything like a prepare statement, that's cache at the connection level. So with that, what we can't imagine is with Postgres, connections is kind of an expense resource, because with the similar to Oracle dedicated server mode, and essentially what it means is a user process will be mapped to a server process on the Postgres site. Even though the user process could be idle or doing some processing of business projects, it's not using or submitting queries, and it's still going to hold a server process on the database site. And plus, because of the memory consumptions. So one thing about Postgres performance you want to kind of like optimize for is to manage your connections. And we typically recommend using a connection pooling solution on top of Postgres to scale your user connections. And the typical type of connection poolers, and there are many that are available out there, either that are community developed, like connection pooling solutions like PG Bouncer, and a third party is PG Pool 2, and also if you run on AWS and there is a fully managed connection proxy called Amazon RDS proxy that's available. So any one of those will help to enhance connection management when you are running Postgres. So if you were to ask me what is the single topic that you want to talk about when it comes to Postgres comparing to other database platforms, I would say it's MVCC. The way that Postgres implements MVCC really impacts a lot of how it needs to be maintained and it needs to be optimized. So what is MVCC? MVCC is multi-version concurrency control. And the problem that is trying to solve is that when you have readers and when you have writers and that are concurrently trying to access a role, then it can result in lock conflicts and which will create blocking and in the worst case that locking which slows down the overall system performance. So what MVCC is trying to solve is that what if the reader and the writer are like working on their own version of the data. So there is no longer like issues with locking or at least minimize the chances of locking and to improve concurrency. So the concept of MVCC is not new and it's common to all relational databases and however from an implementation rise, the implementation is very different depending on the different database engine. For Oracle, well, before we go into that, step back a little bit. The key implementation differences lies into two areas. One is that where are the multiple versions of data stored, especially the older version of data? Where do we store it? And the second aspect that different databases implements differently is that what if we no longer need that version of data anymore? How do we clean it up? So those are the two implementation details that different DB engines treat it differently. With Oracle, the older versions of the data is stored in the rollback segment, the undo. And when an update happens, what Oracle does is to move that old version of the data over to the rollback segment and if we need to recover and read that data again, we can reconstruct the data from redo. And then we update in pace. And the cleanup of the old version of the data goes along with the undo management. So there's no kind of anything special that you need to worry about it. Similarly with SQL Server, SQL Server stores the versions of data in tempDB and maintenance goes along with tempDB. Over with Postgres, Postgres is different. Postgres, when an update comes in, it always writes a new version. And then it just marks the old version as a dead row in Postgres terms or dead tuple. And that tuple does not get deleted or removed from a space perspective. And the storage of that older version happens inside of the data table. And later on, a garbage collection process has to come in to clean up that data, make that space available again. So that's special about Postgres. And that's what's kind of like causing the side impacts of every other maintenance activities that are special to Postgres that you want to know about. So when it comes to tuple visibility, that's driven by transaction IDs. In Postgres, each row has two system columns, X mean and X max. And that determines if a row is going to be visible to a transaction. If we look at the DML activities and insert, when an insert happens, the X mean gets updated with the transaction ID of the transaction ID created the row. When a delete happens, X max gets updated with the transaction ID to mark that this transaction ID is in that deleted row. And with an update, essentially we marked S max with the transaction ID. And then we also create a new row and mark the X mean with that transaction ID as well. So a row is visible. If you look at that equation there, if the transaction is created or the transaction started after the row is created, all that as well as for a row to be visible, it has to be active as well as it has not been deleted or in the process of being deleted. Now, with the understanding of how Postgres manages and implements MVCC, one of the side effects of MVCC is what is called the Broke effect. And anybody heard of Broke in Postgres? Well, if you ever go into an interview for a Postgres DBA, you want to know what is Broke. And it's a very popular interview question. Essentially what that means is that the database exploded because of the side effect of MVCC. If you have a lot of update and delete on the database, eventually your databases may be occupied by these tuples and essentially exposed the size of your database. This definitely needs to be controlled and managed because if you don't control it over time then the size of database grows and your performance also tanks. I once have a customer come to me and says that my query was performing well since yesterday. And all of a sudden today it's running two, three times slower. What happened? Well, it turned out that overnight there was a batch process that did a whole bunch of updates and all the vacuum, which is garbage collector, was turned off. And so the size of the dataset exploded and a lot of IOs has to be done and process has to be done to get the query returned and that's why the response time was higher. So if you do see that type of performance problems, that's most likely what happens because you have table or index both. And the process to control that or manage that situation is vacuum. And vacuum can be run manually or you can set it to run automatically. So vacuum will regain space. So those space can be available for subsequent inserts. However, one thing to know about is that vacuum will reclaim the space for subsequent DML processes, however, it's not actually releasing space back to the OS, which we will talk about how you can release the storage space back to the OS later on. So hold that thought for a little while. So how does vacuum works? Without going into the details and just very high level vacuum essentially has two main phases. One is that it has to scan the heap and remember all the tuples, essentially we can look at the city IDs, the tuple IDs in memory so that it can work on those tuples later on. One of the optimization postgres does to control or manage the number of pages that it has to scan full by using something called the visibility map. So the visibility map gives information of which heap pages needs to be cleaned and so that can reduce the amount of pages that has to scan. But next thing that vacuum has to do is to vacuum the indexes and the heaps and then finally it needs to perform the cleanup and free up the spaces that those tuples used. And also this step one to three may need to be repeated if you don't have enough memory allocated to vacuum for vacuum to kind of go through these whole things in memory with a single pass and if your table is too large then it also has to kind of repeat these steps. Now one of the key roles for vacuum is to clean up those tuples, perform garbage collection and vacuum also in addition to that vacuum also have other very important roles in postgres. One of the very important things that vacuum does is update statistics. Well we all know that query performance is based on statistics. So definitely a critical role there. And also we talk about the visibility map and it helps to kind of like improve the vacuum process. So it also updates the visibility map, updates the free space map and then another very important role which is another concept is very postgres specific to protect postgres from a transaction wrap-around issue. So specific to postgres, postgres has this MECC implementation which is based on transaction IDs. Transaction is assigned an ID so maximum that it can support 2 billion IDs in the past and 2 billion IDs in the future. However if you cause that 2 billion boundary then the future and the past started to get mixed up and what it means is data corruption. So you don't want to get into that situation and the way to avoid that is vacuum. So vacuum being so important, you definitely want some help to manage and so postgres allows you to enable auto vacuum for a vacuum to automatically wake up and run periodically and clean up the data, look at transaction wrap-around issues and make sure that does not happen. There are many tuning options that you can make your vacuum run efficiently we did talk earlier about memory, how much memory that you can assign to vacuum to ideally you want it to go single pass and then you don't have to repeat the vacuum process. And also you can tune for the number of vacuum workers. In general a single worker can work on a single table at any one time. Started with postgres 13, postgres 13 introduced a parallel index vacuum. So if you are at that right version and if you have a large table you have multiple indexes on the tables and you can take advantage of that. And then how is vacuum trigger? And the way it's triggered it's also something that can be configured. It can be triggered if like say X number of rows has changed on the table or X percent of rows has changed on the table. There's global settings and there's table level settings. Obviously if you have a table that is 10 terabytes versus you have a table that's 50 megabytes and the settings will be very different or you want it to be very different. Like 10 percent of a 10 terabyte table is still huge. So for large tables you want to look at options at the bottom of the slide and to turn at the table level. You can also control how long vacuum runs by setting the cost limit. Because if you look at the vacuum phases it does need to scan the table or pages and it does need to perform operations on indexes. And those are going to be IO intensive and it's going to be overall resource intensive. So if you wanted to reduce the impact of vacuum you wanted to look at these tuning settings to adjust based on your specific scenario. One worst thing that can do is to turn auto vacuum off because I do have customers who say that we see vacuum was the main process that takes the system resources and cause our queries to go slow. So we turn it off. Well unfortunately at that point of time it may seem that turning off vacuum will make your query run faster because there's more resources to the query. However over time you will see that your query performance will start to degrade and over time the worst case that you can hit is the transaction ID wrap around which if you ever get into that situation the entire database will become unreachable. So you don't want to go into that situation. So some of the best practices of how do you control the board of table and index. Essentially it's a multi kind of like step process. First of all you do want to have processes running to monitor the table or index board ongoing to know the board situation. And if it ever goes beyond the kind of percentage that you would expect or you like it to be then you will start looking into tuning vacuum. And making sure your vacuum is running frequently, is running effectively to manage or keep the percentage blow on your table to a preferred percentage. And also if you do get into a situation where you like have a highly loaded table and it's so fragmented as you wanted to start looking at rebuilding the table. And with the reviews either with review of index which you have an online option to do it or that you can rebuild the whole table and there is a PGVPAC extension and that allow you to do review online as well. So if you can take time and take the database offline then you can run vacuum to vacuum and repack the table. With that review operation you also can release the space back to storage. So that's the way that how you can shrink the size of the database. All right so let's talk about indexes. So when you migrate from Oracle to Postgres there's going to be some indexes that may not work exactly the same as it did before the migration. So here is a list of some of the common Oracle indexes and say the equivalent that you would use in Postgres. So B-Tree works exactly the same way in Postgres. For composite indexes if you have that in Oracle multi-column indexes work very similar. If you have function-based indexes Postgres has expression indexes and if you have invisible indexes there's a Postgres extension that you can use called HyperPG that you can utilize to get the same functionality. So IOT is not supported on Postgres so then you can use cluster index which would be similar. And for Bitmap indexes there's a Postgres index that you can consider using called Bridden Index. All right so here are a couple of popular Postgres indexes that are available in Postgres that's not available in Oracle. Postgres has a lot more indexes to offer. So say you have a GIN index. These are optimal for full-text searches or Bridden Indexes are good for time series data. And here's an example of an index of the GIN index. We can just simply create a GIN index or we can run a query, a light query, using a wildcard search by just creating an index. We didn't have to implement any complex, we'll search anything. So the takeaway is when you migrate over your indexes might not work the same. There are some compatible indexes in Postgres as well as if the indexes are not in Postgres there's indexes that are very similar. Also explore the wide variety of indexes that Postgres does offer. Now we'll move into Postgres extensions. So Postgres was designed to be extensible and Postgres extensions are used to add additional functionality on top of your core Postgres features. Currently there's over a thousand extensions that you can utilize and it's continually added more through the large community support out there. So once you load extensions into your database it works as just like it was built in. And here is some popular extensions that are used from Postgres. For example, if you wanted to add a vector search to your database there's a PG vector extension. Or say that invisible index that I just spoke about, the hypo PG. So now let's cover some common mistakes we see when Oracle is migrated to Postgres. And the first common mistake is there's a misconception that Oracle applications can't migrate to Postgres because Postgres doesn't support synonyms. Postgres uses a schema search path instead. And what that is is it's like a variable you add your schema into it and Postgres will search the objects within that schema order. Okay? Now with Oracle and Postgres there are a lot of similarities, right? However, something that looks similar may not necessarily perform or behave the same. One thing to be aware of is how Oracle and Postgres handles nulls. And in Oracle, nulls essentially are equal to empty string. And however, in Postgres Postgres evaluates a null as a non-value. So what that means is that if you were to do a string concatenation or if you are doing a null comparison or if you have unique constraints defined on columns that has nulls after you migrate to Postgres you may get some unexpected results. So let's go ahead and take a look at in the string concatenation scenario. Let's look at this specific query and select first name, concatenate middle name, concatenate last name. What happens with Oracle? Well, with Oracle there's ten rows return everything is just nicely as you were to kind of expected. With Postgres we have two rows return. What happened? Some of the columns has null values and you cannot concatenate a null value column with columns that are string. So how do you deal with this kind of situation? Let's look at what we can do on the Postgres site. So there are two options on Postgres to work around these issues. One is to use coalesce. So before you do the concatenation you first do a coalesce. And all that you can actually use Postgres native function. In this case it's concat underscore WS that will do the string concatenation for you. So there are just like different ways of doing it with Postgres but at the end of the day you just need to be aware of these kind of differences. Another kind of more challenging part is unique constraints. And so let's say that you have a unique constraint defined on middle name. If you do insert and multiple inserts and with the middle name that is null Oracle will return a unique constraint violation because they are the same as empty strings. And in Postgres because you cannot compare nulls and null is not equal to null so which means that you are not getting a unique constraint violation. In this case it's going to be kind of like unexpected. Starting with Postgres 15 and Postgres 15 has supported a keyword that says null not distinct. So when you create your unique index you can specify that keyword and which allow you to get the same kind of results that Oracle like. So another mistake is migrating from Oracle to Postgres is the conversion of number to numeric. And most migration tools will translate it automatically. And here you can look at number and numeric they look very similar. You have precision and scale. But if you look at the digits the details below Oracle only supports up to 38 digits at a decimal point where Postgres supports a way bigger number of digits 131,000. So it's important to know that when you have migration tools that migrate it automatically go in there and look at your precision of your data type. If you have less than nine before the decimal point try to use int instead or if you have nine to 18 before the decimal point the size of this numeric data type affects the storage and performance compared to say the number that it's going to move over to to numeric. So it's very important to choose the right data type at the start. And another advice it would be to never use numeric on your primary keys and your foreign keys instead use big int. Well both Oracle and Postgres has different data types that supports characters right. In Postgres it has text and for example text and watchers. And people look at text and say oh that's just maybe a whole bunch of character data types maybe larger data character data types and Oracle has and which is for holding large character sets. So maybe that's equivalent but it is really not if you look at Oracle that's a log data type and SQL server has text which is again a text character log data type that's why there's so much confusion with Postgres text people just think that it's another log data type but it is not a log data type and any log data types will require special operations to get things such as length and to insert into it all of that but text data type in Postgres is actually identical to watchers and it behaves and you operate the same as watchers. In fact like you can look at text as just watchers without a specific length and with Postgres watchers are watchers is implemented on top of text. From a performance perspective text actually has better performance in most cases than watchers so something to be aware of. Now we'll talk about exceptions which is another common mistake. So migration tools mentioned can convert a PLCQL code into PGCQL and here you can see that the code looks almost identical. You have an exception on the left and on the right but the main point to point out here is Oracle and Postgres handles exception differently. In Postgres to handle exceptions Postgres uses sub transactions which are very expensive and heavy lift in Postgres. It uses a save point so we can roll back to the save point and having a lot of subchance actions if you have a heavy loaded current database it will have a big performance problem. So make sure when you're using a migration tool where there's a lot of exceptions used in Oracle that if it's migrated over take a look and see if the exceptions are actually needed because sometimes they're actually not. Like say if we use the same code earlier if we moved out the exception that raised an exception if there was a null if you run the same function on Postgres you'll see that you'll still get a result. Say the first one returns back the result of Michael where the second one returns back a null value. We didn't need to have that exception but we wanted to have that extra expense of that subchance action. Also be aware that Oracle handles exceptions differently in terms of exceptions raised. Say for example Oracle uses that no data found or too many rows. So here we have that function if we were instead of to return back a null if it wasn't found we want to return back not found. But if we run this in Postgres that exception wouldn't be raised. It returns a null. So if we wanted to use the same response like in Oracle we're going to have to use the keyword strict. And here we see that we do get the result back and not found. So make sure if you're using migration tools that add the keyword into it and that the exceptions are handled differently. And with Postgres it's very expensive with the subchance actions. So now we'll end with some key takeaways. As mentioned Postgres you can improve your scalability with using connection pooling. And vacuuming is very important to performance. It's the topic that should be looked at. And also there's some Postgres native features that just aren't available on Oracle. So you should take a look at the functions the index types and utilize that. As well as the extensions that are out there that could expand the functionalities to your core Postgres features. And the common mistakes that we mentioned with the migration is synonyms. Postgres uses the open search path instead of synonyms with the same behavior. Nulls behave differently between Postgres and Oracle. And there are a lot of data types that are converted over using a migration tool. And be sure that you're using the right one in terms of storage and performance. And exceptions handled are differently in Postgres. Postgres uses subchance actions which are very expensive. So remove the exceptions if you are able to. All right. So now we'll have some Q&A if anyone has any questions. Yes. Thanks for the presentation. You mentioned in one of the slides that vacuuming full is not recommended. Instead using PG repack or rebuilding index online. So is that mentioned in the slide? Yes. So the question is when we talk about rebuilding of index using PG repack or vacuum full to rebuild the table does it apply to only large databases or does it also apply to small databases? So the answer is regardless whether or not the database is large or small it all depending on how broad the databases are. Because even with smaller databases and if your data is compactly packed there are less IOs that you need to perform in order to bring the useful data in memory and efficiency of memory page will increase and you can cache more in memory and performance will be better if you do all those things. So of course I mean it's not like you have to review all the time but it's just the key thing is to control the percentage of broad data to make everything essentially have a balanced system. Try to meet you halfway and make questions go quickly. Number one real quick how widespread is the migration from Oracle to Postgres and two where's a good tutorial I'm managing the roles and permissions and schemas. Okay so the first question is how widespread is the migration from Oracle to Postgres? Like being a Postgres specialist in AWS and we see a lot of like Oracle on-prem to Postgres migration. I mean the drying factors of it it's pretty obviously like especially in this economy and organizations are trying to cut costs and Oracle licensing is pretty hefty costs so that's why open source is becoming popular and because of the flexibility and extensibility of Postgres and the kind of large community a lot of people preferring Postgres as the de facto of open source database. The other quick question is does AWS or the open source community have really good tutorials to manage the roles and permissions? For roles and permissions management? Yes yes there are a number of AWS block process if you were to kind of search AWS blocks on specific to the security aspect on how it manages for example IAM authentication which is specific to AWS allow you to essentially use the AWS authentication to authenticate against your Postgres database on authentication which integrations with AWS manage AD with your on-prem AD and there's a lot of informations on that if you were to kind of thank you for your time I've used tools in AWS such as DMS to migrate from one Postgres database to another Postgres database but from an on-premise Oracle database can we use tools like Golden Gate to replicate data from Oracle to Postgres inside AWS or is that not possible? It's possible DMS the database migration tool can work with Oracle as the source and Postgres as the target as well however if you already have license for Golden Gate and you prefer Golden Gate over like DMS that's an option as well so DMS will allow you to perform minimum downtime migration from on-prem Oracle and also a log-based replication solution. With Golden Gate I'm able to do bi-directional replication so I have the option to fail back if something goes wrong with the migration can DMS do bi-directional replication yet? DMS also have options to bi-directional replication you essentially it's possible to replicate or like migrate from A is Oracle B is Postgres on AWS and also from B to A. There is options to make sure that it does not do circulate replication. Although like we may recommend you to instead of doing from A to B, B to A it will be better off to do from A to B and to C C as your fail back option but that's the more details that if you are interested we can take it offline and discuss more yeah. Is there any extensions that you would recommend for related to performance tuning? Extensions is there any recommendations we will recommend for performance tuning there are many extensions that are performance related for example one very popular one is PG stack statements which allow you to look at more details of what has executed there are extensions such as like Hypo PG and it essentially allow you to do analysis for creating indexes and without actually creating the index and impacting the existing queries so there are many of them and it just wouldn't be possible for me to name it all here. Thank you. Thank you. Thanks for coming. No? Why? Does it work? Thank you. Thank you. Let's just keep it in here. Thank you very much. Testing? I cannot hear myself. Does it work? Hello folks and welcome to the session on vacuum give it up for Debra. Thank you Mar. So in last year's PGConf New York and last year's PGConf EU I gave a lightning talk it was for 5 minutes and I hit 60 slides 6-0. So based on the calculation you can assume that I have about 700 slides or so including the but I'll make it because I learned that we have an hour so bear with me like I'm coming from a family where everyone is a teacher so like my dad my aunt, everyone my sister except me so I'm just trying to teach everything in here. My name is Debra Gunders I'm a postgres expert this is my title at least. So I'm a postgres major contributor so if you have ever used the rpms of postgres in reddit in suze in fedora this is what we support nowadays I'm responsible for the official rpm repos I used to be a website person in the past but I broke it every time I committed something I still have some commit access but I try not to break it anymore I also contribute to fedora and reiki I'm a community member I live in London and nowadays I'm also DJing which is I think if I get fired someday I have a backup work to do anyway so I'll start with mvcc today and I'm going to continue with the data snapshots about how victims process and freeze vacuum tuning and vacuum full and hopefully we will have some time for the questions so let's start with the star so what is this I mean I'm sure you're all now sico right or isquiel whatever what is star tell me wrong answer not everything sorry it's actually there is for user defined columns right so when you say select star it doesn't select everything it gives you the user defined columns in a table it means there are some columns in the table in every postgres table where there are some hidden columns so we will talk about these columns before we go into the vacuum processing so how many people were here in the previous talk so sorry about this because there will be some duplicate content but sorry about this but I have to talk about this before before going further so in most of the databases all the databases have different solutions for concrete control in postgres it's called SPL or something I don't remember and other databases have different implementations so this is the MECC implementation of concrete control in postgres so this is the definition in the in the glossary let's say the basic thing is readers do not block writers and writers do not block readers so this is quite clear and you should also notice that we don't say anything about writers and writers so for good reasons writers block writers to the same column or columns so I will give more examples in the next few slides by the way these slides are going to be online so if you want to take a note of course feel free to take a note but it's on my website that I last updated 70 years ago it will be probably on the conference website as well so or you can just email me if you want so we can have multiple versions of the same row in a database so because when we run an update in postgres basically it's an insert and delete we insert a new version of the row that's where the version comes from and then we the previous version is deleted actually it's not physically deleted we will get there and there can be some uncommitted transactions as well which causes the multiple versions of the same row and one of the most important things probably if most of the people have probably heard if you are dealing with postgres, dead tuples which we will talk about in the next slide of mvcc so there are on the internet there are discussions about which implementation is the best every implementation has its own problems or oracle's implementation has its own problems postgres implementation has its own problems or let's say side effects every year and every release especially in postgres version 14 and 15 which I will get there postgres made some great improvements over there so let's start with some keywords that will make you sexy when you are speaking with other postgres people hey have you heard about elison yesterday like let's get some dinner tonight and talk about excite, txid stuff like this so in postgres world when you say txid it's called transaction id it's a unique identifier currently it's 32 bits there are some discussions I think discussions started before postgres 15 so it's been a few years since the discussions started about adding 64 bits transaction id's I'll give some example about that one in one of the slides it's a unique identifier for each transaction but it's a circle first of all it's a circle so in a given time when you get the transaction id it's either 2 billion there's 2 billion in the past and there's 2 billion in the future so basically it's a circle there are some special reserve transaction id's if the transaction is inverted it's 0 this is reserve and one is used during bootstrap or in tv and the second the number 2 is 2 is reserve for frozen which I marked in red because we are also going to mention in the next few slides so basically let's sum up this slide we call txid as a transaction id it's a transaction id is acquired for each transaction in postgres it's a unique identifier for a given period of time but the current problem is just limited to 4 billion which is which will have some side effects negative side effects especially on the busy databases will get there so let's talk about more about the transaction id let's start with selects which is a basic thing actually it utilizes a special transaction id called virtual transaction id it's called virtual txid and you can get the transaction id with the function within postgres it's a specific transaction and it's stored in the header of each row so remember the first question I asked you about star again let's define star as it gives you it gives us the user defined columns for the table however in postgres like in many other databases there are some hidden columns which helps to store the data do the maintenance let's say basically do the transaction so let's start with xmin and xmax xmin is updated when we are running an insert and xmax is updated when we are doing update or delete and it is zero when there is no update or no delete with xmin and xmax then there are some slides in the next few slides I'll give you some visual examples so that you can see what's going on this is just for the theoretical part for now so let's continue with the insertion first insertion is done to the first available space okay the first available space is a critical term because when we learn more about vacuum it's going to be more interesting let's say we have an empty table and in order to even to delete or update something you should insert the data so let's start with the basic insertion first insertion is done to the first available space by default the page not just by default 99% probably of the installations in postgres world is 8 kilobytes by default for the page size so for each 8 kilobytes it's done into the first available space the xmin is set to the transition ID because of this as I told you xmin is an insert thing and xmax is 0 for inserts so let's take a look at a good example so we first create a table with a single column which is one of the best tables I have enough one column one scotch, one beer all good right and create a table so in this transaction you know first of all if you are not familiar with postgres again in postgres by default every statement is a transaction so if you don't start the transaction with begin like in the specific case this is a transaction this is a transaction so we have basically four transactions in here anyway so in here we insert the two rows to the same table and another one in here and another one in here so first of all we have star if we issue just a star we would get 1, 2, 3 and 4 but in the specific case in specific example at least let's say cmin, cmax, xmin, xmax let's start with columns we will get there we will get cmin and cmax so xmin as you can see when c1 is 1 and 2 the xmin are the same, why? because they were inserted within the same transaction so this is a transaction id which actually inserted these two lines sorry so as you can see next transaction id and next transaction id I mean this is not a real real-world case I mean in a busy databases of course the numbers would be like not in a not in a sequence but again this is for testing so don't worry about that one so 2 and 3 and then we have xmax 0 because we haven't done in updates yet remember the previous slides changed when we do some updates and ctid is the physical location of the row in the specific page so it means this is the page 0 which means the first page of the table remember it's 8 kilobytes by default in most of the installations so this is the first page and this is the first data in here so it goes like this any questions so far okay please when you say that the ctid says the page and which row it is how does it know how long each row is to know where that is in the page I'm sorry can you repeat the question again so the ctid said it had the page number and which row it was in the page how does it know how long the row is to know where that begins in the page long ctid 0 2 there is that byte offset 2 or is that byte offset 2 times no no it's the second row my question is so a page is 8k how big is a row it's variable so how does it know actually hold on I got the question out in the next few slides there's an example which is going to make things more clear for you okay thank you so let's go back to delete so as I told you a few minutes ago it's a logical deletion when we say delete it's not physically deleted I mean so my partners were always just angry about me when I say I'm going to do it in the next few months they asked me to clean my room I said well I'm going to do it later on because I'm a postgres person we want to delay things so I said well logically I cleaned my room logically but I do the vacuuming later on okay so that was the problem so logical deletion because there are some problems with the delete the reason is even if you delete a row another transaction may still need to see the previous version of the data I mean the undeleted version of the row let's say you are running a PG dump don't but anyway again it's a table so let's say it's going to take 4 hours even if you delete a row which hasn't been backed up yet even if it's deleted it cannot be physically deleted because another transaction actually may need that row is the key in here and then we set the xmax to the transaction ID hop, we have a dead tuple let's see this time I'm starting a transaction intentionally delete from t1 where c1 equals to c1 we don't see the data when we say the regular when we run the regular select star but we know that 0 1 is there and this is the first session again I intentionally started the transaction in here and while the transaction is running within another session I run the same command and now we can see something else in here the other transaction sees the data without any problems but Postgres internally puts the xmax in here because any transaction ID which is greater than this one is going to see this one is deleted when the first transaction is committed when we commit this transaction any transaction ID greater than this one for a while is going to is going to see the data is deleted but it's physically there so make the updates it's an expensive operation it's insert and delete again it's a dead tuple is a part of deletion let's see and this is going to give us another hint about CTID we updated the row where C1 equals to 2 just to remind you CTID was 0.242 for the next one now we did an update right so again update is insert and delete so we inserted a row with a new transaction ID to the next available space it doesn't have to be 5 I mean in the real world you are going to see some gaps because of the sizes of each row it doesn't have to be 1, 2, 3, 4, 6, 7 it doesn't have to be like in a sequence it won't be in a sequence eventually in the real world and now we have one is gone because we deleted the first row the second one is updated still again we start the transaction let's go to another session now we have another session where Xmax is deleted to, I'm sorry set to this transaction ID 35 let's come back to here in the previous slide the insert, the insertion the Xmin is set to 35 which is the insertion transaction ID this is the same for the deletion transaction ID for the same row okay, for the same row so there are two transactions which are seeing two different versions of the same row this version sees the 20 version this one sees the previous version anyway okay so now we need to consider the huge side effects of the excessive deletes and updates right, because eventually delete, I'm sorry, updates includes a delete so in pretty historic way so how many people started using Postgres before 8.3 in this room so this is pretty much historical like just the three of us or four of us, I think you also started Corai like before 8.3 really, really, I thought you were using before then, anyway so per this C file there is a command in here now they are overlaid in the same field in the header I got this information from from doxygen.postgres.org which I found quite useful it's a hidden website that I seriously love anyway, so let's talk about the data synapses it's not a physical synapses first of all and when we say synapses we are more talking about isolation steps and this synapses is created at the beginning of the transaction contains the committed data and the uncommitted data is ignored within a synapses and this also now let's start speaking about the vacuum topic of this talk it's 20 minutes it's my first time talking about the vacuum I told you I have lots of slides it determines the vacuumable rows or non-vecumable rows in here so data synapses is important when there are some long-range transactions with like pgdump which causes more problems because when we have pgdump running against our tables or database schemas whatever if you have some dead tuples caused by deletes and updates they won't be removed because pgdump for example in this specific case or long-range transactions are going to need them so long-range transactions are sometimes often cause problems for the database so we have two parameters this one they are disabled by default and idling transaction session timeout might be important because you know I mean when I was working as a dba billion years ago I had a co-dba who started running transactions before he went to lunch begin update a bunch of rows and he committed the data after coming from lunch so to overcome that problem what did we do we figured this is my way of solving problems I'm just joking we are disabled with access so anyway seriously speaking that was a joke so let's talk about visibility so visible tuples actually we talk about visibility in here where this row is visible to this transaction and this version is visible to another transaction that's the visibility comes from and it is done by comparing xmin and xmax and based on the transaction ideas that the current transaction idea has so only one version is available in a snap shot so you cannot have both two and twenty for the same row in one single snap shot the definition of visibility is the row version that row version is already committed before the transaction time transaction start time so it is visible for us if the row version has been committed before we start the transaction then it is visible to us it could be insert or update I'm sure like we don't store commit times in Postgres at least for now and there are no role-based segments in Postgres in Postgres either because I mean that's the Oracle implementation as far as I know so let's talk about vacuum which is I do every week in my home but it's not the same thing anyway right it's the most maintenance process for homes, sorry it's the most maintenance process for Postgres so you may think that you start when we talk about vacuum to the nearly DBAs they said no we never run vacuum which is not correct actually even if you don't Postgres does something for you which we will get there it cleans up no more needed no more dead tuples no more needed no long-running transactions require this version of the row nothing is blocking this version of the row so they can be deleted we need to delete them now we are talking about physical deletion in this case remember when we say we run the sequel version of delete is I told you it's a logical deletion physical delete data because we are lazy and we want to do in groups instead of cleaning one single glass when we do the when we are cleaning our home I clean 10 of them just to save some time this is the same thing so it can run against a single table a few tables a database a few databases and all databases so based on the parameters that you give you can vacuum these ones so it has two main tasks and actually I'm also going to mention about that one it has three steps it removes the dead tuples and also it freezes the transaction ID so what does freezing transaction ID mean so remember we have a problem I mean whatever the problem is it's going to be the same problem in a larger scale well less scale maybe I should say even if we have 64 bit transaction IDs currently again as I told you in the very beginning of the presentation it's a circle eventually it's going to expire eventually we will get there we will get to the point where we will issue the same transaction ID to the database again this won't happen in the past like very past probably before some people in the room were even born when we didn't have a vacuum we used to reach a point where Postgres used to shut down itself so that we can start in a single mode and kickoff automatically and now vacuum manually to be able to run continue running the database so that was again maybe 20 or 25 years ago maybe more it does not block most of the queries not all of them I will get there first of all concurrent vacuum so the same table is not allowed so you cannot run vacuum you cannot create an index when you run a vacuum okay even if it's concurrent or the regular create index you cannot create a trigger when you are running a vacuum you cannot refresh a materialized view you cannot remove columns from a table and of course you cannot drop the table okay so it doesn't block most of the queries it's not going to block your day-to-day queries select no selects you can select insert update the list it can go on but the unusual stuff or irregular stuff will be there and it creates IO and it's a part of the next few slides and that's the problematic thing again it cleans up the tuples and it also cleans up index pages which points to the date tuples so we don't just clean up the table there are indexes or indexes against the table we also clean them up okay and it updates visibility map and free space map as well and if we press analyze to vacuum this is not the analyze and explain analyze if you are familiar with the explain analyze in Postgres it is something else but if you press analyze to vacuum or run a regular analyze anyway you can see the statistics of the table so let's start talking about the vacuum process it's done per table per page first of all it scans pages for date tuples removes index entries points to the date tuples and again updates these two and finally we will get there truncates a lot pages of the table if the page is empty if the last page or pages are empty after vacuum Postgres is going to truncate those pages to save this space okay and there is an extra parameter about that one which I will talk later on it's going to update statistics again we will get there it's going to update catalog tables about the last update of the relation or the database whatever so again as I told you there are three phases the first phase is called ingest phase also ingest phase it scans the table first of all and creates a list of the tuples so it knows which rows the tuples will be removed it just creates a list at this point it doesn't mean that it's going to clean everyday tuple because of divisibility rules if another transaction needs the previous version of the row then it's not going to mark it's not going to be vacuumed so the second thing is to reach data into the memory I mean the things into the memory and it's going to freeze the tuples again we will get there and it's going to clean up the index tuples at this point so as you can see that tuple clean up is not done at the first phase okay that tuple clean up is not done in the first phase so it freezes the tuples cleans up the index tuples I mean rows which point to the data and remove tuples and then done so how do we control vacuum behavior one of the most important parameters in this case is maintenance workmen it can be set globally in PostgreSQL.com or you can configure it per session I mean if you want some larger maintenance workmen then you can use this one vacuum can utilize up to a gigabyte for maintenance workmen which matches on this data file size let me explain this one what does it mean in Postgres a data file can be a gigabyte maximum it can create a table okay it can spend across multiple files but each file can be one gigabyte as far as I remember this limitation comes from the days where we could transfer between Linux and Windows and when we transferred the files the limitation was two gigabytes or so so we are below two gigabytes limit that's where the limit comes from nothing else I mean not nothing else that was the motivation when this was set anyway in the second phase the good thing happens first the date tuples are removed now we are talking about physical deletion of the row of the date tuples so it means now there is going to be gaps in the data files let's say we have one two three and five and let's say we removed three and four okay eventually we will have one two and five but the three and four the space occupied by two rows now is going to be empty but Postgres is going to know about this with the help of free space map and the next insertion will be done to that space okay so basically if you don't run vacuum enough on a busy table behavioral updated or deleted table your relation may go like extensively larger and it's not going to shrink often not going to shrink if you run regular vacuum but it won't grow up either because the next insertion will be done in the available space so it's going to reuse that space the size on this size will be larger so it also updates the free space map the map where the free space is and the visibility map per page and it also repairs fragmentation per page and this is done and finally it does the index cleanup updates statistics and system catalogs per table so if you look at the pg state user table we are going to see the data about when the table was lost, vacuumed, analyzed auto vacuum and auto analyze it updates the statistics and system catalogs table and try in case the final last lesson not final, they say lost pages if there is nothing there so eventually at the end of vacuum the on disk size may be smaller than the usual one so what has changed in postgres 16 in the previous version it used to have ring buffers to perform vacuum in 16 it saves shared buffers it uses a different strategy as of version 16 and uses them circularly this actually saves some space in the memory so if you want to get more details I did a link so that you can just read more details from there and there is a problem as I told you let's say we started transaction ID 5 and eventually you perform 2 billion transactions it's impossible but almost impossible even if they were not vacuumed if there were 2 5 transaction IDs in the same database it would be corrupt in a second because postgres wouldn't know which one is greater than the other so this call transaction ID represent problem in the past it was a problem big problem because there was no auto vacuum nothing to control this and again if the database didn't vacuum at all you would have lots of problems but this pass was again 20 years ago not yesterday so it's a must avoid problem and if the transaction ID is no longer needed I mean there is no other transaction waiting for the specific grow version remember xmin and xmax let's say we inserted something we deleted something and nothing else is actually looking for the data at the specific time at that point postgres freezes them freezes a row so now that specific transaction ID will be usable for us remember the 35, 34, 33 that we used in the past in the previous slides so they can be reused again for the transaction ID represent problem and it uses specially reserved transaction ID too which I mentioned in the third slide or fourth slide or so this is always older than other transaction ID which is good because it means it's always visible for us when we finish the vacuum it's always visible for us which is good so at that point first of all the transaction ID is available for all transactions our data won't be corrupted and we are fine so again as I told you there is a patch for 64 bit transaction ID so I went to their GitHub repo and extracted these two sentences from there because actually having to vacuum frequently having to vacuum I'm sorry freezing frequently causes performance degradation because of read and rewrite of the older not yet frozen pages while being used sorry I didn't paste the remaining one so basically it's a problem nowadays so I hope that page is going to be committed but I'm not a committer I'm not a hacker I have no idea about the code so let's talk about the vacuum problems please I mean these are in the order of in the PSQL not in my personal order so if you are familiar with Postgres and if you see full and on the top don't kill me please because we will get there I'll skip this one for now you can enable or disable freeze if you want you can have verbos we always love verbos right because we want to see the details we love spending our time while Postgres doing everything we love debugging everything even though we missed the output we love seeing the things oh things are processing just enable verbos all the time if you want to analyze the tables while running a vacuum analyze means updating the statistics updating the program of the data in PG set tables so and then you can also analyze the you can analyze with vacuum you can also skip locked pages which is going to speed up the vacuum process but then you may end up with on vacuum portions of the of the tables but again again it's going to speed up the vacuum process because it's just going to skip the lock this is disabled by default by the way you can disable in this cleanup don't but if you want just to process the main table first because of good reasons like the performance problems and etc you can turn it off but by default it's on you can just process the main table which is on by default but there's a toast table table oversized storage attribute technique so there are toast tables in postgres you can disable vacuuming that one if you want not encouraged but again if you have some problems that you need to solve immediately please you can also disable that one and you can disable the truncation process if you want it's on by default and in the last few releases you can use parallel jobs to perform the vacuum so that it finishes more quickly in the previous versions we had one single one starting version 16 we have 3 more and it's also available in 17 of course as well it skips a database statistics means don't update the database by statistics just to save some time and other things or just update the database statistics do nothing else so basically you can run skip database stats if you are running vacuum against multiple tables in a single secret run with skip database stats finish it first and the last line could be vacuum and only database stats so it's going to save some time for you as well and then you can also set the buffer usage limit which is also set by the vacuum buffer usage limit in PostgreSQL config file so vacuum and wall you know if you are familiar with PostgreSQL already wall is logging of transactions it's called write ahead lock and all the modifications are locked all the changes are physical changes are locked to the wall files so since vacuum creates does page modifications because it's doing some physical changes especially on the second and third page third phase of the vacuum process it's going to create some wall IO it's also used because we need the same thing for crash recovery and we also need quite the same things for the replica servers as well you know the replication is based on wall in PostgreSQL most of the replication is done with the wall records so we need everything to be locked to the wall files eventually vacuum causes extra your pressure on the wall files which is going to affect your backups as well because like all the tools like PJ Backrest or Barman or other stuff are going to backup your wall files for continuous backup and you will need to have more space for these ones so what about vacuum and replication you know you can use replication for say backup purposes not backup but sorry like failover or switchover scenarios or running some read on the queries against the database so when you are running some queries which last longer than usual on standby we have a problem the problem is rows are updated on the primary and vacuum kicks in we are in an update on primary node and vacuum comes and deletes that one so it's deleted on the primary as I told you in the previous slide it is written to the wall file so if there is a long running transaction on the standby which requires that row eventually we lose the row and at that point by default Postgres just cancels the query on standby it says cancelling statement to conflict with recovery so of course there is a solution for this which is called hot standby feedback hot standby feedback gives the primary server about long running transactions and says please don't these on primary until I'm done and the negative side effect of this is bloat because on the primary vacuum won't be able to do this job so there will be bloat on the primary node so again there is a tradeoff in between so pick up whatever you need yeah bloat will increase so there are a few things that you can use the throttle the vacuum process by default there is no throttle on the vacuum process so if the vacuum the page which will be vacuumed is in the memory this is one by default these are default values if there are these 30s, 20s it's going to be red from drive this is two and if they reach these three parameters reach 200 by default Postgres is going to vacuum is going to sleep these many milliseconds and then continue so this is how you throttle the vacuum process again this is disabled by default this is going to slow down the vacuum process but again it may be useful for you it may be useful if you have a short of IO so it is going to result in less IO but the vacuum is going to take longer less IO over the time eventually it's going to do the same IO I feel like too old I'm not young but I remember the days that we had we didn't have auto vacuum in core so it's been there since 8.1 it's like forever anyway there's a process called auto vacuum which also does auto analyze so it is kicked off based on some parameters in Postgres config file or per table and per table with alter table command and also it kicks off auto vacuum to prevent transaction ID repair so if there's a DBA who doesn't run in a vacuum nothing else based on the configuration parameters in Postgres config file Postgres is going to start auto vacuum to get rid of the transaction ID repair problem and this is a problem when I say problem let me explain what it means first of all you cannot kill this process I mean it's going to restart even if you kill it and often by Merphilove which is also available in Postgres it's going to kick off in the busiest time right so make sure that you don't disable it it's in here in the slides as well so it may or will prioritize some busy tables so and some table can be untouched it can cancel it may cancel itself when there's a higher lock level is required by another transaction so you may say auto vacuum task is cancelled in the logs because someone wants to create an index so eventually you may end up don't trust on auto vacuum I mean this is my idea there are other good Postgres people out there who thinks opposite but I wouldn't trust on auto vacuum just by itself so there are some parameters which control the vacuum auto vacuum process as I told you you can tune it per table with auto table command all auto vacuum parameters so you can just disable auto vacuum for some like static tables like states in the states in the US you can just disable auto vacuum on that one because it's not going to change but if you are keeping the number of women and children killed in Gaza then you should just keep it enabled because it's updated every day so vacuum and auto vacuum they can live together I always use chrome based vacuum just in case it's done in a good way so vacuum full cut my life into pieces you are this is my last resort this is the last resort I mean because it rewrite the table means it locks the table when I say lock the table it's an exclusive lock which means no inserts no selects no updates nothing can be done against this table until vacuum full is finished that's why it's last resort but if things have gone too bad at that point you may have to need it but this is the only transaction that runs against the table requires this space similar to the table size okay it's a downtime so there are some alternative exists PGRE and PGErepec it's probably much better than PGREpec because PGREpec has a sort of cron-ish thing that you can also schedule the maintenance process within Postgres okay so again you know this is the one of the variables outputs I think in Postgres 14 and 15 some people they did more outputs in here this is one of the things that I really like one of the views in Postgres which is called PGRE it shows you what's being done how the process is going and etc yeah I'm finished almost done so first of all thanks for coming here do you have any questions that I cannot answer thank you do you have any questions that I don't know the answer of my question is you mentioned that vacuum does it on a per page what if I have like three pages with one row inside them because of delete I can't hear you can you hear me now? okay so you said that vacuum does clean up of the per page in a table what if I end up with three pages with one row in them one record in each page I'm not sure does it gonna clean up and compact the table or I have to rebuild it so pages yeah because of delete I used to have like full pages and then they deleted the rows and now I have three pages one yes three pages so it's gonna I'm sorry I don't know why I couldn't understand both of the questions today I don't know why just blame jet lag or something so you start with three pages all full and because of delete there is only one row in each page left and never vacuum the table if I run vacuum is it gonna make one page out of it no how do I get one page out of it so the only thing is vacuum full which will do that but again but don't do it because it clogs the table again all the new inserts are going to be done over there I'm talking about static pages that's mostly used in like warehousing another question that I want to understand and won't be able to answer it's just a bad day I think it's also being recorded as well so we need the microphone yeah I have a follow up to that last question so let's assume now that there is one row in each page in those three pages and I update those rows now updating them will move them most likely to a different page so if let's say you update the row in the third page okay it's probably going to insert the new version to the first page the first available page so it's not going to create a fourth page it's going to use the first available space on the previous pages actually that's why we run Vacuum to be honest so Vacuum let me explain it in a different wording when you run Vacuum it makes those space as reusable for the future transactions so if you have some space on the first page Postgre is going to try inserting that row to the same page if there is not enough space of course it's going to be complicated but it depends on how much data is going to insert how much available this space over there it may create a new one if there is no available space in the previous three pages but often it's going to try to use the available space in the Vacuumed pages I think they will be available in scale website or you can just email me is devrim.günduz.org devrimedgünduz.org of course without devrimedgünduz.org I'm sure if you google my name you'll find an email easily you can email me but I'm sure it will be on the conference website it was a case for last year an older version is also available in other conference websites as well I give this talk in other conferences more or less the pre-16 version of the same talk the basic infrastructure is also there so I do something after my talks there is a twitter account which is cheering for Postgres which actually raise your hands and we take a selfie with the attendees that I have included in the picture you can leave otherwise I'm going to take a selfie again this will be posted on social media just let you know so if you don't want to be in the picture let me know thank you no you can't again I don't want to do it for the table so I have like I know you can have a current job that changes the thresholds up and down you can increase the thresholds for the same customer so they were just running from 9am to 6pm so what they did was increasing them just significantly dangerous to fire so that they don't hit by auto-vacuum during the day so after 7pm we use mini-o-vacuum to process data so you disable the auto-vacuum not entirely but if you increase the cost of fans where the cost would be like if there is no activity it will still run there is nothing else to compete so even though my cost is too high I will run so by raising the cost we are saying don't run if it's not going to respect level then it's fine even if the cost is that high there is nobody else to compete accept it in your process and it picks up a bad time I know we have the same problem but we still have a question which version are you 15 is great on that point this problem in a beta bank with 9.2 and in the price case just for the specific purpose so because that was the problem with a number of tables over there they were using auto-partitioning on the table so it was a big problem for the auto-table so working it was basically on the other hand you can use the output table but it's going to do some working when you go to the output table but the idea is to be able to do maintenance functions where you say don't do this and do them but the cost is also sorry can I find you in 12.30 yeah 12.30 oh oh the lunch break is 1.30 yeah I'll find you great sorry hello what sure call me ok ok ok ok ok Richard I don't believe you have met before yeah I've seen your name around a lot ok I'll take that as a good thing you were at EDV I used to be at EDV at one time yes so I had two stents just the beginning in 2003 until 2011 and then I left and then I got acquired back into EDV oh yeah ok same I left pretty soon I left a few months after the acquisition now you're at I run professional services professional services focus on Postgres so you're North America? no I'm actually based out of Pakistan yeah that's my hometown and Dubai and Maldives I thought that we decided on Bali and Bali and Maldives please yeah TGD Maldives we took a bit of a leave with the God let's do it somebody registered this but I'm glad you did I'm glad you did so one of the things that I am supposed to do is do that oh you don't have an HDMI? the old one that I had before you don't know you don't know everything well this Mac? one of the things I like about Enterprise and being a contractor is I have an actual laptop not a Mac so I have ASUS in here oh come on this is a little much better than anything else I'm sure but I cannot call Mac not an actual laptop come on Mel hello folks so did anyone come in here in the hopes of riding an actual elephant well sorry to disappoint you give it up for Richard alright thank you very much can you all hear me okay cool thank you welcome to this talk entitled riding elephants safely it's basically about working with Postgres when your DBA is not around as you may know the mascot for Postgres is the elephant so you want to work with your database without breaking it and taking your application down so we'll just go we'll be talking about that today so just real quick about myself I am a software developer and I was a support engineer at Enterprise DB since 2015 we worked with a lot of large scale clients like insurance companies banks, military, governments and yeah I think I just want to share from my experience what a lot of our customers had to deal with so hopefully get you orientated with Postgres if you're not already familiar with it these days I'm actually using Postgres behind some apps that I'm writing in Golang and Python and before that I was a web developer using Perl before I switched over to being a DBA and I've been working with Postgres since 2002 and I really like using this database I was first a MySQL user but then after working with Postgres I couldn't go back so yeah this talk is this for you I think if you're a software developer or a Q engineer someone who writes applications that uses a database but you don't actually interact with the database itself on a regular basis this talk would be for you maybe if you're interested in being independent from someone else who manages the database so if it's like a sysadmin or a DBA someone who manages these environments and you're actually not able to use them hands on on a regular basis this talk would be for you maybe your DBA is on vacation or maybe your DBA quit or got laid off or if you've never had a DBA and you're just inheriting this database that you have to work with this talk is for you yeah so what are we going to talk about I can't do justice to Postgres in one hour because there's so many things to go over so we're going to limit ourselves to just a few of these topics so that way at least you can get your feet wet you can find your way around the database maybe even the server itself yeah so we'll be talking about just a handful of these things and I'm going to detail them here okay first we want to be able to log into the database and maybe even start and stop it if we need to second we want to be able to take a backup before we run into some kind of traffic damage and then we also want to be able to diagnose some pretty common performance or stability issues by reading through the logs and through some views within the database and then we want to be able to identify any schema changes that might improve the performance of the database and then understand the file and directory structure within on the disk and most of these commands if not all I didn't keep track of all of them but most of the things that I'm going to share with you today can be done while the database is up and running so you will not need to restart it you will not need to have a down time we want to keep the database up and running because you are not the DBA someone else is and you want to be responsible for taking the database down okay so just to kind of if the database is not running we might need to start that up so we will take that first step of getting the database started okay now for the majority of users out there you guys you all might be able to SSH or use some kind of remote connection into the server if you are using a managed database like RDS you will need to start and stop that from the console now you mean that you can actually SSH into the machine most of them are using Linux and we are at a Linux conference so SSH is probably the way to go you will want to first perform some sanity checks before starting it up so the database is down you log in you want to check to see if the disk space is if there is free disk space to be able to start the database because sometimes database will crash because it ran out of disk space so you want to perform the df minus H to see if any partitions are full and then address those partitions accordingly and then you will also want to look at your database logs usually syslog will help otherwise you might need to find the logs and I will try to help you with that as we go on the database logs might have some messages that explain why the database is running like it crashed something else happened somebody shut it down on accident so you want to know why it shut down before you try to start it back up now once you have figured all that out and you determined that it is safe to start the database up you want to use systemctl and start Postgres depending on the distribution of Postgres or the distribution of the OS you may need to use PostgreSQL-version to start it up brute force if you don't want to use systemctl or you can't you can use pgctl and pgctl is the command that is used by systemctl to start the database so if you don't want to use it you can just start it up manually yourself with pgctl however you will need to know where the pgdata directory lives in order to start it up usually that's in if you're using the community version of Postgres and not a redistribution by another company it should be in var, lib pgxql and then the version and then data and then again if you're using ubuntu it might be a little tricky to figure that out too so you'll need to figure out where pgdata is that is the data directory where all the data lives if you don't know it try asking your sysadmin ask your dba get that information beforehand before those people disappear okay yeah so the so pgctl start will start the database now if for some reason you need to stop it there are three modes to stop when you pass it with no arguments it will go into a smart shutdown mode which basically really cleanly shuts it down by waiting for all the disconnect all the sessions to disconnect and then it shuts down however if you use the mode fast what that will do is that will blank it out sorry I think that will stop the it will wait for all the queries to disconnect all the queries to finish and then it will stop the first one it waits for all the sessions to disconnect second one it waits for all the queries to finish MI is immediate it actually simulates a crash of the database it just shuts everything down and then if you ever have to start it up again it will start in recovery mode which takes a little bit extra time now now that the database is up and running we want to be able to connect so what you're going to need to be able to connect you're going to need the hostname and you're going to need a port which is usually 5, 4, 3, 2 but in some instances it could be a different port number so you'll need to use maybe like netstat or something to figure out which port is listening on and then you'll need a username and password and then you you might need to check the PostgreSQL.conf file to figure out if any of these don't if the port number is not standard and then if your listen addresses is not set to star connecting tools that we use to connect the one that I'm going to use today is called PSQL this is a default command line interface and if you if you use this you basically can use just put in some query just like any other like a bash interface sort of you just type in the query, press semicolon and press enter and then the queries run now if you need to use like a GUI we would recommend pgadmin or dbeaver these are some pretty popular graphical user interface that allows you to connect to the database and run the queries okay so let's say I connect so I connect with PSQL I pass in my H my hostname and then my username and my database name and then it just assumes that it's using 5.432 so you don't need to pass it in a port if it's going to be the standard port now once I log in I'm presented with a prompt with the database name and I can do a bunch of these blackslash commands to figure out my way around the first thing I can do is blackslashl which lists all the databases that have been created for this machine the one that I'm going to be using is going to be edbadmin Postgres is a database that's created by default some people use that by default I highly don't recommend that just for the sake of security and then template0 and template1 are databases that are provided by Postgres as kind of like a bootstrap way to create other databases so you don't want to you don't want to use those and you don't want to delete those either if you actually modify template1 any other database that gets created will have whatever is in template1 so if you create a table in template1 and you create a new database that table is going to show up in that new database you don't want to touch these unless you don't want to touch them at all alright what else can I do in there I can do backslashd which describes all the tables that are visible to me okay as you can see if I do backslashd I only see these two views I don't see any tables and the reason why is because some of my tables actually don't exist in the public schema by default my namespace is going to be public and my username which is going to be edbadmin so if I wanted to I can type dn which describes the namespaces or schemas and here we'll see I have a namespace called myschema and then public and the results so if I want to actually be able to see those tables that are in those schemas I'll need to change the search path and I'll show you it's just search underscore path and you'll need to set that on the prompt so so now we connected we can kind of see what tables there are what is going on in the database so usually when somebody logs in and wants to use the database it's probably because they need to figure out something figure out what's going on and the way that you figure that out is by using this view called pgstatactivity and what this does is it shows what's going on in the database at that very instant and if you're using RDS or Google Cloud that table the view might actually have limited visibility it's basically because you're not a super user so if you can get super user access to be able to view everything that'll be very helpful and then if you actually have access to PSQL then you can show log directory and that'll show you where the logs are and then that directory will contain all the text logs of various things that are going on or happened in the database and then RDS cloud-based distributions will have the logs available if you're sysadmin made it available to you okay okay and once you've decided once you've figured out what's going on and you might need to actually like stop a process or stop a query you can actually use these two commands pgcancelbackend and pgterminatebackend pgcancelbackend will cancel a query so the query that's running will get cancelled and then that session stays open however if you are insufficient like for some reason that's not resolving the problem that you're experiencing you may want to use pgterminatebackend on that process ID okay so how do we get the process ID okay so here's what I'll show you this right so here's a view of pgstatactivity as you can see it's a little bit hard to read because it's really wide it's got three lines full of column names and it could be a little tricky to read so one tip you can use backslash x to turn it transposed so you can show all the columns as rows and then you see one record at a time okay so now I do pgstatactivity again and then I can see this PID is running out of vacuum and then this PID over here is actually that's me that's my query right so like let's say somebody you look at this view and you say oh this query has been running for a very long time or it's been idle and it's actually holding a lock or something like that you do pgcancel backend or pgterminatebackend on this PID and then it'll do what it needs to do so here's a quick example so let's say I get in here and I look for everything where it's active or idle in transaction and I see that this this vacuum pull is active and it's waiting on something it's waiting on a lock so it's running but it's basically stuck and it can't go any further and you look at the second row you see that oh there's this other there's this other query this select query that is idle in transaction and okay so something something's a little fishy right so what you can do is you can take this view and this information from pgstatactivity and cross-reface it with pglocks and you look at which locks are not granted and here we go this one is not granted and you want to find the relation or the table that has the same id and find where it is granted and it turns out in this particular example I actually have another session somewhere that actually did a begin and it locked the table in exclusive mode and then it does a select and the moment you do the select the table is locked and then that's why it shows up in this view that this query ran but now it's waiting on the client so it's actually not an active query anymore okay so that's just a it's just idle in transaction so I started the transaction and I ran a query and then I walked away for lunch and then now it's blocking the vacuum or blocking whatever else you know whoever else is using the database so you want to be definitely look for idle transactions cross-reface it with pglocks and then that'll help you figure out what needs to be done because this is not a query it's just sitting there then I have to use pgterminate backend so I do pgterminate backend on this PID and then that session ends and then my vacuum can continue alright so alright so that's a very important part of working with the database in terms of making sure the apps keeps up and running now let's talk about configuration a little bit configuration is all contained within the postcurescule.com file and that is within pgdata so that's why you need to know where the pgdata folder is for some for I think for dbn it's actually in slash etc so yeah depending on the distribution of Linux you're using you may need to look outside of pgdata there's also a file called postcuresculeauto.com and that is actually something that humans should not be working with because the alter system command will edit that file for us okay so if you're in psql and you want to change your configuration you can do alter system set whatever configurable to whatever value you want that's a very convenient way to to change the configuration however I think it's limited to certain users so if you actually have file access to the postcurescule.com file it's probably better okay now what is the current state of configuration now sometimes some people will edit the postcurescule.com file and then forget to reload it to restart the database so what's actually loaded in postcurescule is not matching with what's in postcurescule.com and in that situation it can cause a lot of confusion so what you want to do is you want to get into psql and run show all and just print the whole table this is the current configuration that's loaded these are the current values okay okay so the configuration you can change some of it without having to restart the system so without a downtime and I'm going to go over a few of those with you but just to get a list of what those parameters are you can actually run this query select name, setting from PG settings where context is in syncup or user so this will provide a list of all the things that you yourself can change without actually requiring a downtime and all you have to do in the user context you can say just set parameter to value and then it'll go so I guess the easiest example would be workmem so workmem is set globally in your conf file but you can be edited at a user level because you want actual workmem to do sorting or joins on a query that is your use case so you would say set workmem to 100 megabytes or whatever and if you want to actually set it globally then you say alter system set workmem to a syncup on the postmaster process or you do a PG reload comp okay alright or the other options is actually find the PID pertaining to postmaster and then just do a killhub okay so some of the useful configuration parameters I talked about workmem I'm just going to quickly go over a few of them for you some people will wonder why I don't talk about shared buffers and max connection these are the two most important parameters in your database but they might not be useful to you if you're not a DBA because changing these require a down time you don't want to have a down time so we're not going to go to those in depth now search path so like I mentioned and your username if you want to add to it you say set search path to comma delimited list of all the namespaces you want to be able to look in you can change that in the postgresual.conf and then reload the database and then you'll have a larger set of namespaces that you can look at by default workmem like I mentioned earlier is just memory allocated to sorting and hashing if you change it in postgresual.conf it could affect other users like let's say you have a machine with 64 gigabytes of RAM and then you say workmem equals 1 gigabyte every session gets 1 gigabyte so if you have 100 sessions using the database you're going to run into memory issues and you might have a crash so set the workmem low kind of the average and then on a per session basis you set the workmem manually to whatever you need for your queries maintenance workmem is also very it could be potentially useful if you're trying to deal with a performance issue maintenance workmem is similar it just sets sets sorry but memory for your vacuum to be able to work with it I'm sorry oh I'm good yeah it's okay I'm sorry it's underneath the podium yeah sorry yeah it's good thank you yeah thanks and then some other parameters that you can change having to change to restart the database are some of the log parameters so this gives you more visibility into what gets printed into the logs now let's see these ones I'm not going to go into them too much but log line prefix gives you more more information about the session that caused that log entry and the log checkpoints might show you whether you're checkpointing too much and causing IO issues and then log connections, log disconnections will tell you whether when a session started and when a session ended so that's useful if you need to do some auditing and then auto vacuum and duration and hostname these are also pretty useful depending on your use case I have a separate talk on this but these are some pretty useful in terms of you as a developer or DBA trying to figure out what is causing performance issues with your application now one thing to note is that these logs are not wall logs I think some people might get a little confused with that so I'm going to go into that real quick so wall logs what are wall logs? wall stands for write ahead log and what this is is a journaling system every modification to a database table is written to the wall log and at checkpoint time it gets permanently flushed to the disk so these wall files they live in PG data slash PG wall if you somehow see the word PGX log which is what PG wall used to be called you may want to do yourself a favor and talk to your sysadmin and try to get your database upgraded because ever since 9.6 we're at 16 now so yeah PGX log is definitely very old but yeah wall is a journal and it basically provides a means of disaster recovery so at a checkpoint all the wall files information gets flushed to the disk into PG data slash base and then from that point there's a kind of not a transaction ID but some kind of ID that tells you when the last checkpoint was and then all the new information gets written to the wall and then if you crash you basically say oh I know that what's in base matches where the checkpoint is so I can start with the checkpoint and replay everything and then you basically recover up to the last right before the database crash so you can see why this is a very very important folder now needless to say do not delete these files I say this because our customers have done it before and those customers will remain nameless but yeah I think people do get confused oh it's a log file so I don't need it but no it's actually a disaster recovery log file so you don't want to touch any of those files in there now if you actually go to your PG data folder here is Barlib I mean main this is a Ubuntu machine you'll see that there's a bunch of stuff here, there's the base folder where all the data actually lives and then a bunch of other stuff and then there's the PG wall folder so you don't want to touch that file touch that folder if you actually go into it you'll see these files which are very very neatly named with a bunch of zeros those are the wall files themselves so each wall segment is about 16 megabytes by default if you compile differently you might get a different size but 16 is the standard size so this is what it looks like so I'm showing this to you now so that when you ever see it you know oh I better get out of this directory so we talked about configuration wall logs we're going to move on to authentication this is a really short topic but I think it's important for you to know so authentication information is stored in pghba.com host based access control so HBA now this actually allows you to control in a very you can slice and dice who gets access and what hosts get access what database they get access to how they authenticate and things like that whether or not they use SSL so this is okay so any changes to it can just be committed with a hub or a reload so here's an example of what a pghba.com might look like this is the default so a real one wouldn't have password I think you would use some other authentication method but so like for example right if I say this address is so basically it's read top down okay and then the first match is the first match where your connection fully matches all the parameters is the one that is going to be using to connect to the database so like for example I type psql and I psql to my local host okay now all users and all databases are accessible by this one and the password is validated so I connect with my laptop to the postgres that's running on my laptop and it'll say it's a host, it's not a unix socket and then he wants to connect to a database that is on my system and any user from this one can connect and then it says okay it'll actually send my connection to it'll fetch the password information out of the database and compare my password with it and then if that works then I can actually log into psql and start running queries so you can actually limit this by like subnets, certain subnets can connect to the database only certain users can connect to the database only certain databases by a particular user things like that okay so I won't go into too many more examples here but this is a way for you to control who gets to even try to connect to the database okay alright vacuuming so I think a lot of our customers that came to us they came to us with performance issues and they're wondering why is our database not performing well and a lot of times it turns out to be a they're trying to run a bunch of queries while a vacuum is running alright now what vacuuming is is that it it helps prevent bloat if you're if you're well let me continue so basically an update or delete on a table doesn't actually delete or update that particular row that you're trying to edit it actually flags a row as deleted and then in an update it inserts another row so in that situation you can actually have a lot of these deleted or dead rows and then a lot of new rows you're just going to take up more space on disk and you're going to get bloat the problem there is that if you need to do a sequential scan on a table you're going to have to scan even past those dead tuples as well so that's why it will slow down your database so what Postgres offers is this thing called vacuum and it basically goes through and flags those deleted rows for future inserts and updates so that instead of continuing to tack on more rows at the end of a file it finds a reusable dead row and inserts there and if you vacuum enough you can actually keep your file size to a minimum and just keep reusing rows that are flagged as deleted and the nice thing is that there's also a feature called auto vacuum which will periodically wake up vacuum the table and then you don't have to manually do that yourself so you always have dead rows that are flagged as reusable but the problem is a lot of times the customers we're dealing with banks and large organizations and their tables are really really big and the vacuum takes a very long time and in those situations you do see that they're doing maybe like an ETL query trying to aggregate some data and then it's happening at the same time as the vacuum and it just creates this situation where the server actually just gets very slow overloaded with I.O. and usually I recommend just waiting for those vacuums to finish but in some situations we do kill it and then we do some stuff to prevent the vacuum from waking up and then allow their ETL to finish and then go back and vacuum or in some situations you might discover that the auto vacuum cost delay is set too high and sometimes you can just set it to zero and then it will make the vacuum finish faster and then you can set it back to what it was before I guess as a non-DBA if you're running into some kind of performance issue involving a vacuum you want to double check to make sure auto vacuum cost delay is set lower or the vacuum cost delay is set lower excuse me so the question was if a delete flag that is deleted why doesn't it automatically flag it as reusable and the reason for that is because of multi-version concurrency control so NPCC other sessions might still be able to see that row that you deleted so in order to handle multiple users there needs to be a mechanism to be able to flag it as deleted and those older sessions that are from a different transaction can still see those rows so you have to wait for all that to finish and then the vacuum process comes through and marks it as reusable okay so sometimes we kill those vacuums off to be able to let other traffic go through but that's just like a band-aid over the situation it's a stopgap it's not the actual solution you have to wait for the vacuum to finish so one way to make it finish faster by changing the cost delays alright so we talked about vacuum maintenance and we also talked about configuration we want to be able to take backups if you're you have this database and you don't know what the condition is you want to be able to take backup to be able to restore it so Postgres, the way that we do backups is using this program called PGDump and what it does is it does a plain text dump of the database so when you PGDump your database what you end up with is just this text file with a bunch of inserts create tables what not to create the database it's completely human readable you can argument PGDump to limit which tables or which namespaces get dumped and then you can actually dump a compressed or binary version if you want to save space and that actually has an added benefit of allowing you to multi-thread or multi-process a restore so it's actually if you can afford to use the binary version it's actually preferable so the nice thing about PGDump is that it's less likely to copy corruption now I'm going to get into that actually I'll get into that right now so there's three types of corruption okay there's the corruption that happens on disk so let's say there's a faulty driver or the heads got some problems the data on the disk gets corrupted okay so there's a corruption that's caused by stray DML commands so like maybe a coworker deleted deleted in higher table or did an update without a work loss or didn't use the transaction so those corruptions are caused by I guess either humans or bad programs of corruption is corruption by a PostgreSQL bug which in most users cases that's not going to happen I think it's very it's very rare to hit a PostgreSQL bug just using it on our laptops or within like a development environment okay so the PGDump because it prints out human readable SQL the disk corruption or the PostgreSQL bug corruption doesn't get pulled into that file okay but it's definitely slower because you're converting binary data into text okay so if you want to take a faster backup you can use something called PGBase backup and what this does is it takes the entire PGData directory and copies it and that basically includes all the indexes all the constraints it will include all the dead tuples just everything that's in there and sometimes you might want that because it's faster right but the risk that you run is that you copy the corruption along with it if it is already okay so my usual recommendation is to have a little bit of both we want to be able to have a plain text dump and then we also want to be able to dump quickly like every night you want to do a PGBase backup just in case you need to restore right away okay see so that's the last thing I was going to say you got to test those backups there we've actually run into situations where people dump their database they don't test it and they discover oh the dump was empty because they dumped the wrong database or there was a typo or something like that so yeah test your backups make sure that it works okay question hmm okay good question so um okay PGBase the PGDump will not copy wall logs but PGBase backup will okay when you do the PGBase backup you it will copy everything in that PGData folder but it only copies the wall logs up to that moment right so if you want to be able to restore from a later point in time you're going to want to harvest those wall logs as well so uh there are other mechanisms that I won't be able to get into in this talk but there are ways to constantly pull those wall logs so that way they're always ready for you to use in the future okay alright that's a good question alright just you guys feel like you're drinking from a fire hydrant a lot of stuff um I'm going to talk about monitoring real quick so I guess this is kind of an extension of the logging that I was talking about earlier and again I have a logging talk on this so I won't be able to go too into detail but um if you set log line prefix um by default I think it was just the time stamp and the process ID which isn't really useful if you need to do a lot of like correlation between web apps or applications and users and stuff like that so by using this um every single line will have a time stamp, process ID uh I think I forgot what this was transaction ID, user name, database name application name and then host IP address or host name alright so this having extra stuff in your log line prefix allows you to um find more clues to what might be causing some kind of problem or what might have caused a problem in the past okay um and then the other one is log min duration statement so any query that takes more than this parameter so this is a millisecond so like you said it's a 1000 so any query that takes more than will get printed to the log so you'll know what query it was uh that took a long time so like let's say you have a query that takes you know your database every night it gets really slow for about an hour and you're wondering why and you log min duration statement you see oh this query that ran at 3am to 4am was it took a whole hour and then and then you know what query that was and you can deal with it uh accordingly okay um alright so other locking parameters um so log statement will print print the query before it starts to execute so this is different from log min duration statement log min duration statement prints the statement and it's run time after it's done log statement runs the query uh prints the query before it runs now the reason why this might be useful is that if your database crashes it never finished so if you have log statement set to print the statement and then it crashes you'll know which statement actually caused the crash so this might be useful in some situations log min error statement um if let's say you have a app and the person wrote a query and it has some kind of syntax error and you're wondering oh why you know why is this app not working and you discover after you know log min error statement that it will print oh syntax error and then it will print the query that had the syntax error then you'll actually be able to go back to the developer and you know determine oh this is this is the query that was I might need to fix that query so this is you can set different levels warning error fatal or panic um can't go into that too much here but uh it's just the severity of the of the error so if it if it was an error like a syntax error it would just be this uh if it was like uh a query that um you know that that caused a crash then then it'll if you set it to panic only queries that cause the database to crash will will get printed but if you do warning then it'll be more for both like all the queries that have both crash and syntax errors then uh they will get printed um log duration will only log the duration which isn't that useful I would recommend using uh pgstat statements if you if you're trying to figure out you know just averages of how long a query takes uh log connections uh it will it will show you when the session begins so that way you can kind of track oh this this is what this user did when they when they logged in okay all right um okay so we're gonna talk a little bit about performance so explain and explain analyze explain um is a uh if you're if you're familiar with uh oracle uh you might you might have used explain before it basically tells tells you uh this query will use this particular query plan okay and then explain analyze will actually run the query and show you all the statistics behind uh some of the uh some of the joins or sorting or how long things took okay now there's also this thing called auto explain uh as a developer this is actually very useful so if you can get auto explain installed onto your database um this is very useful because especially of using an ORM because ORMs um you you you specify a lot of objects and how they interact with each other and then you write to database you don't know what kind of query gets generated and in some in uh in in at least a couple instances with with some of our customers I um we we we we we had this query and we're like this query runs fine we run it by hand it runs fine um but then once we run it through the ORM it doesn't run fine because the ORM is actually casting uh certain certain data types the wrong way right so this auto explain will kind of help you will help you definitely um uncover those kind of issues real quick about the difference between explain and explain analyze so you can see that um this uh this is pgbench which is a benchmarking tool that comes built into postgres so I initialized it and then I'm I'm running this query select star from pgbench uh accounts join branches where aid is less than 100,000 and you can see that oh it's going to do two sequential scans and then it's going to do a nested loop together and then return all the data now you can notice that these costs it's all that you get and these costs are basically the estimates that uh the query planner is using to make the decisions on what kind of scan what kind of loop uh what kind of join it's going to take but once you do explain analyze on the same query you're going to get this additional stuff which shows you the time it took for each leg and then the actual time at the end okay so this is actually very useful in terms of finding the bottleneck uh of a particular query so usually your query might will be a lot more complex than this and you'll get a lot more uh rows in this uh in this output and you're going to you're going to find oh this is the part that took the most time and maybe I need like an index on that so here's uh so make sure you use the right data types uh Postgres uh supports a lot of different data types and using the right one will help you ensure uh that the query planner making them the best decisions for you okay so don't use all text I think I've seen some people do that um there's also you might get confused with JSON B and JSON these are both JSON types for Postgres uh JSON is just a text representation it's just basically relabeled a text data type that just got relabeled as JSON so if you want to use JSON use JSON B that actually allows uh some additional features in terms of uh indexing and searching you can actually do a lot more uh there there are some JSON B operators that make uh make working with JSON a lot faster okay and it's also very important to have the proper indexes on your tables so that way you can actually do the scans a lot faster um yeah mm-hmm it will not what right yeah it's a dry run it doesn't actually uh it doesn't actually run the query itself so if you do explain delete something it won't actually delete but if you do explain analyze delete it will actually delete so be careful okay um yeah so just an example here uh so I do explain analyze on this this query and I see oh this sequential scan took 45 uh millisecond now I took um create I created index on this column right and I run it again it's a lot faster right because now it now what it's doing is that instead of searching the entire table for anything where BID equals 1 it just scans through the index which is just basically just for the purposes of explanation it's just uh all the values and then uh like the locations on disk right so if I find all the values in the in my index where BID equals 1 I know exactly where to grab it from the disk and it's a lot faster okay alright um my last slide here so what not to do okay what not to do don't delete anything but also don't kill 9 on any postgres process okay the even the psql process I would highly recommend against it okay um killing kill 9 on a postgres process actually simulates it causes the entire thing to crash and then it goes into recovery mode and then um once you start it up it starts up in recovery mode does a bunch of scans and then it takes a long time before it's available again so you'll have a down time so don't kill 9 okay um idle transactions uh as a in in in as a practice always commit and roll back transactions don't leave them open because of that uh that example I showed you earlier somebody walks it and then walks away and then um data is unusable right it'll hold up people and then um you look in pgstatactivity uh look for idle transactions don't don't look for idle idle is just it's just is normal if it's not running a query it's idle but if it's idle and transaction that's bad okay and you want to cross-reference that with pglocks to figure out uh which queries are being held up okay um don't drop anything don't delete anything um don't don't lose it and not be able to get it back so rename things instead okay um indexes I think you can't just disable them but uh yeah you don't want to drop anything just rename things okay and then um don't delete stuff from pgdata especially pgwall and then uh if you're curious what commands are available in psql there's a a backslash h and a backslash question mark and that might be helpful to you okay and finally um where to find help so there are a lot of different places where you can get help uh if you if you don't have a lot of uh dba experience uh you can use we have a slack we have a mailing list which is a bit slower to get answered because it's mailing list uh there's irc there's wiki doc the docs are really really good I I did I learned most of uh my dba experience through uh through the documentation actually so uh use the docs they're really good and then uh obligatory plug for edb support um you know my company uh we offer support for customers uh different tiers so if you want uh you can get support with us uh all these links up here are available on the link tree so link tree postgres help so you can you can use those as a resource okay all right so that's it in terms of uh what I wanted to share with you I hope that you found it very useful and um just want to wish you a happy pie day okay all right uh can you go over wall logs again so is it just recovery of logs or is it recovery of the database um sorry recovery of the database wall logs does it just recover the logs within the database or is it recovering the database itself so I think the question is um wall logs and recovery so um so when you do a pg base backup you get a snapshot of of that of that directory now this this could be before or after a checkpoint right actually I think pg base backup issues a checkpoint for you so you get it you get a it issues a checkpoint and you grab all the files and then as it's as it's grabbing the files you could have produced more wall files right so at the end of the pg base backup you get this snapshot from the moment you press enter and what you're going to want is all the wall files after you press enter if you uh if you actually take those two things so the pg base backup and all the wall files you take it to another machine you you unzip or untar that that folder and that's there and then you copy the wall files into pg wall when you start it up it'll it'll find um it'll start up saying oh I here was here was the last checkpoint and then it'll find all the wall files that uh that got created or it'll it'll just look for all those it'll it'll just notice those wall files in that folder say hey oh there's wall files here let's let's replay them and then it'll play it all the way to the end and then at that point you can log into the database and then do whatever you need to do in there so that's um it's called point in time recovery so if you want to look up the documentation on that look for PITR or point in time recovery and then you'll get you get definitely get a lot more uh examples uh use cases for that yeah so you said there was two different backup commands and you said the first one was tech space so like um what does that look like or like is it like like this uh representation entire database in like block by block by page or no it's it's text commands so if you actually do a pg dump on a database what you end up with is um well what it'll be is a bunch of set commands to set like kind of like environment variables and then it'll say create table whatever uh insert this row into the database insert into table stuff like that so like it recreates all the queries the yeah all the queries yeah so I have a question regarding like restores so if you have to restore do you have to load the schema first before you restore the dump or the dump will take care of the schema of the database oh that's a good question okay so a pg dump will dump both the d ddl which is all the all the um table creation and stuff like that and then it'll it'll also have the dml together if you want some people do pg dump and say uh schema only and then it'll do only the ddl and you do pg dump data only and you'll have a separate file so but if you do all in one and you just pass it into an empty database it'll it'll create the tables for you right but if you if you want to separate it you can actually create the tables first and then uh and then load the data later so you have options there I'm sorry does it also preserve the user data like the passwords and user credentials oh okay so that those those are outside of so when you do pg dump you usually pass a database name into it so you dump only that database if you want to dump the uh user information as well you do pg dump all so there's a pg underscore dump all and then that will dump everything uh all databases and if you only want the global information I think I forget if it was a there was another flag that you pass in that dumps only the user information my memory fails me about how the passwords get dumped but uh yeah user information will get will get dumped too okay all right thank you very much thing testing testing we are good hello folks let's welcome chris travers who is talking about nulls and postgres today thank you very much now uh before I get started I'd just like to see a show of hands how many of you have seen bruce momjans talk on nulls anybody one do you feel like you understand nulls after how many of you feel like you understand nulls few of you hopefully there will be no hands that will go up if I ask this at the end okay put it put it this way by the time I'm done here you will probably understand that you don't understand nulls so so this is why this talk is called everything you probably never wanted to know about nulls um we're going to talk about why nulls are confusing we're going to talk about where they come from we're going to talk about what a mathematically based null would look like if we had one and why we don't have one or actually we kind of do have them but not where we think we do so about me I've been using Postgres for 25 years I often come to these conferences there may be one or two people who have more experience with Postgres than I do but not very many I have worked as a database administrator a software developer a consultant a engineering manager of database teams I've even contributed code to Postgres itself a small little bug fix for a race condition between parallel plan startup and replication so that's me I come to these conferences I like to speak and I like to push Postgres in directions that nobody else tends to like to push it so first of all we do have mathematically defined nulls and databases in two pieces but we never call these nulls there are exactly two mathematical nulls in databases the first is a null set a null set is you can think of it as like an empty table okay a null set is a set with zero members okay the other mathematically defined null we have is the null tuple we're going to come back to this in the null tuple behaves in ways that defy logic in Postgres and probably in other databases too the null tuple is like a row with zero fields okay so we have these two things we don't call them nulls we call something else a null which we didn't pull from mathematics but if we did use mathematically defined nulls these are what we would call nulls however this guy deserves the credit or the blame this is Sir Charles Anthony Hoare he has called nulls his billion dollar mistake null references he was referring to because he was one of the major designers of Algol languages right and in Algol it was a successor to Algol 65 one of the things the product people had decided was we need Fortran compatibility and they're like but we have all this type checking we can't really just make it work with something that doesn't have this in the same way that doesn't look at data as records in this way we need to make it work and so he came up with the idea of a null reference not sure what happened there yes well evidently yeah so at any rate and then this was designed to make this compatibility just work now if you go and you look up his talk about null references being a billion dollar mistake he actually goes into the fact that there would have been some other ways of doing this but that this was the way that everybody had just kind of settled it or that this was the way he did it and then everybody just kind of settled in on it and he had said that null references have caused at least a billion dollars roughly of damage since the time they were introduced now we grabbed nulls from these null references so I think we need to continue this forward and I think if you look at all the trouble that nulls have caused in databases it's got to be ten times what it is in application development so we ought to be able to add a zero to Tony Hoar's estimate of the damage of his own invention here so if you've worked with nulls at all you know this thing called tree value logic the basic problem that we have in databases is we've built these things from relational math but then we have to make them work in the real world right so we end up saying relational math works in sets but we have duplicate entries how are we going to deal with this let's ditch sets and go to bags we'll define a new entity called bags to deal with this problem how do we handle missing information oh let's just throw a null reference in there we didn't have the data let's throw in a null reference so we did this but now the problem is people start to go okay well what does this mean well we don't know what it means so let's say it means unknown so we end up with three valued logic null equal anything is null but because that doesn't work very well is we have the is operator for booleans right so is true is false is null this never returns null right so null is true is false null is false is false right if null were fully unknown that wouldn't work right so we've created these other we've created these other we've created these other operators because this is false null doesn't really mean unknown we just sometimes decide that it means yes so one of the things that we all then have to figure out is how to choose the right operator right is not applies to booleans only is distinct from is not distinct from applies to any data type if you can figure out why the sql standards community decided to do that you're welcome to explain it to me I haven't been able to figure it out and then when you have almost anything else then you have a null in there it evaluates to null right this is relatively straight forward until we get to more advanced nulls because unwilling to stop here we've decided to add more insanity on to things so now let's talk about some advanced fun with nulls so so how many of you think that it's possible to store a value that is null in a field that is designated to be not null anybody think they can do this in fact it's quite possible to do this because and this starts to get into postgres internals quite a lot but I think as we start to understand the problem here we're going to start to see that that null is is null is so heavily overloaded that we can get into situations where things really really really don't make sense so here's a good example create type foo we're going to have two fields in it an int and a text create another table we'll call it bar this will have one field it'll be a foo type so it's a composite type and let's say we don't want nulls in there so we define it to be not null so now we insert into bar a foo type where both values are null and this will happily do this for you now we realize we have a problem and we do ultra table bar add check foo is not null and we get an error the check constraint fails because the entry we added is null how does this work anybody if you have any ideas just raise your hand I just want to see a show of hands anybody think they have any idea why this would work why this would work so to understand this we have to understand how Postgres actually stores nulls in the background okay the Postgres data pages are basically you have a page header and then it's basically an onion structure going from the top and the bottom you have a header going from top to bottom and rows from the bottom to the top and they meet in the middle right in this structure we want to optimize because you can't really use okay so let me back up a moment one of the early approaches to storing nulls is what oracle does anybody here have any experience with oracle's null handling so oracle had this great idea we have to deal with nulls somehow let's store them as impossible values okay so if we have an integer or a number and we need to store it as a null we will store it as a negative zero okay because we don't have any other reason to store a negative zero so a negative zero will be null anybody want to guess what a null text string is in oracle nope, yes well it is an empty string yes so in oracle yeah I mean it will store it as a exactly so it's just an empty string so if you do empty string as null in oracle you get true this also leads to fun with concatenation handling and a whole bunch of other things and oracle of course being oracle was the first major relational database on the market they got to make all of these mistakes and then of course people are depending on this behavior and they couldn't fix it right they actually kind of fixed it by adding a new data type with different null handling just so they could kind of encourage people to do nvrcar instead of vrcar right so a lot of people looked at oracles implementation they said that sucks because it does right and so what Postgres does is in the row header it actually has a bitmap field which tells you which fields are null right and in our example here is that bitmap field is that bitmap entry null no it's not the not null constraint on a column if it's not a check constraint check is different but just not null constraint on the column all it does is it checks to see whether that bitmap bit would be set okay so a so an actual if I were to put null instead of a row type with both fields being null then it wouldn't insert right but when I insert this value it inserts a new tuple header in the field itself and says okay this field isn't null but all fields within the field are null okay and then what we do is we alter the table we add a check constraint and a check constraint evaluates a SQL expression so because the empty entry evaluates to null now the check constraint fails so now you start to see that if you're using composite types anywhere understanding this becomes really important because there is no way to enforce on the creation of a type that that a field within that type is not null right I actually reported this as a bug back in 8.4 I think and I was told can't fix it because if you do that you will break everything that happens with outer joins yes an array of nulls is not null arrays behave differently than tuples because yes go ahead no you can't and the other thing is that no it gets even worse than that because suppose you go oh wait but we can create domains and domains can have a not null constraint to them it doesn't you can use a new method to enforce it when you're putting it in this way and that was what I reported in 8.4 and I was told this if you try because the problem there is that if you have a domain that that can't be null and you ever query it in a left join then the return type will still be null so there is on type creation that the type is actually not null go ahead so the question was if the food type was not a composite type could you still do this with a row the answer is no because a row only works for a composite type so this is the syntax of how you would insert a composite type into an insert so this is the first this is the first beginning of some of the more advanced problems one of the other problems that people run into here and we're going to talk about a few different variations on this are anti-joins anybody here not familiar with the term anti-joy I will explain it sorry I managed to pull off the thing again don't worry about it I'm just going to keep going at any rate at any rate now this is an anti-join an anti-join the simplest form of an anti-join is a not in clause we want to select rows from this table where a join condition to this table is not met okay that's what an anti-join is so the simple thing is to use a not in clause now I just want to show of hands how many of you understand the problem with a simple select like this with not in if you have nulls only a few of you have probably been bit by this so far if some vowel in B has any nulls in it anywhere this will return exactly zero rows it will return a null set why? because not in will treat null as a unknown and we don't know if this matches or not so therefore it might and we can't return to anything okay so this form is bad it will bite you and people usually get bit by this at least once in their database careers so the minimal fix here is just to do where some vowel is not null go ahead okay now this kind of this works it fixes the proximal problem but problem here is that if you standardize on this format sooner or later you're going to forget that not null is not null and you're going to go back to the same problem in the first and the really pernicious part about this is when you test this you might not have nulls in your data later you might and then suddenly your query breaks and you don't know why so this is why this particular fix is bad did you have a question? okay so not exist is a little better but it has some problems with it so not exist will basically run this query in what is effectively a lateral join and then discard cases where where the lateral join condition is met right the problem with not exist is a little bit more abstract not exist is a perfect general tool but that sub query could be anything right and as a result there is very little that you can do to make sure that the planner can effectively choose the best plan in this case right if you're just doing an anti-join not exist is difficult for the planner to deal with I have very rarely seen not exist perform anywhere near as well as I thought it should so not exist is better it's going to get you out of this trap but it's going to create a whole bunch of performance problems in the meantime go ahead so the problem here is again this gets treated as a lateral join and a discard so whatever is evaluated in that loop it's hard sometimes for the planner to compose it into the plan okay yes I have found not exist to usually perform at least on Postgres slower than not in but there's a faster version that I'll get to in a moment okay um the best approach sorry it got rid of the dot a dot star so that should be select a dot star from a left join B using some vowel where B is null this one I usually find performs way better than not is and it even performs usually better than not in sorry not exist and it usually performs better than not in and the reason is that that the left join side is much better optimized and consequently it tends to be a lot easier to compose this on the planner side it's a little bit more um it's a little bit more how should I put it verbose and it's also a little bit more limited in other words this only works on anti joins you can't do some of the more complex logic you could do with not exist right but because it's more limited the planner can make some better assumptions so my general recommendation actually is to write anti joins as left joins where the left side is null and then it will simply do the left join and it will discard it anywhere that join condition is met and it will do it as a left join rather than as a lateral join so this is so this is sort of the initial piece here I have a few more examples that I'd like to get to if we don't have questions are there questions at this point or should I just kind of continue to a few more unexpected pieces okay at this point I think I'm going to pull up a let's see this is probably too small for everybody to see here so let's see it's control shift let's do this okay so since many of you have not seen Bruce's talk anybody I just want to see a show of hands how many of you think select row parentheses is null will return true show of hands how many people think it will return false show of hands false people have it let's see oh it returned true how many people here think this will return true nobody how many people think it will return false it returns true also the null tuple is both null and not null this is Schrodinger's tuple if you understand the rules by how null tuples get get or is null gets evaluated with a row type then this actually kind of makes sense but it also really doesn't and it's a really good example of the lack of logic behind nulls per se so the basic thing is if I do if I put a comma in here so now I have a two value I have a tuple with two entries now we get what so why didn't yeah I think I'm going to just create a type here oh yeah as probably is just I can probably do this why is this okay so I'm not sure why it's not accepting the comma in here oh you're probably right no oh yeah okay let's yeah I'm just going to create a type foo too I should probably clear these up between talks so as we have foo three now for some reason this has worked in the past I'm really not sure what I'm doing wrong but I don't think so because so if I just do select this I still get the same problem if I still do this I still get the same problem just a second yeah it should be a row I'm probably I'm probably missing something really silly here probably miss typing something or so forth but the general rule is that a row type is null if and only if all fields are null right yeah there we go see so this is now a null and this is now and this is not not null right so in the case where we have so in the case where we have a null row right if every field is null then it's then it is a null if any field is not null then it is yes but it shouldn't be reporting false on both of these what I'm noting here is that for some reason it is it is casting it is returning I know what the problem probably is so I'm not entirely sure why this is one of the things which I find about nulls is I sometimes get blindsided by my behavior the rule is supposed to be is I understood it that if you have any non-null field then the row type is not null but here we're seeing that it is neither null nor not null yes and now it seems to behave behave as we expected before so now what we're seeing here is that there are values which are neither null nor not null and there are values where if every field is not null then it is not null right so now what you find is that if you have a tuple which has zero fields in it then it is true that every field is null and it is also true that every field is not null so it is null and it is not null but if we have a field where if we have a row type where one piece is null it seems to evaluate not only to it is not not null it is also not null yes yes exactly yes so this is actually a really good example of why I usually test almost every weird case of nulls before I demonstrate them and I seem to have missed one of them today and I actually learned something up here but this is also a really good example of how these rules often don't make sense so let's see I had so this is actually interesting because if you look at a row where you have both of these is null right then it is null but if we do this array of nulls is not null again there's no consistency and the reason why there's no consistency is all null actually means normally is no data here it is just a null reference and then when we started to say okay let's determine whether something is null or not null now we have made a mess of things which we are never ever going to be able to get out of yes well if we do an array with this it will still be false an array which even if I have a zero form array is also not null so a zero member array is also not null arrays are only null if they are explicitly null any other value is not null row types have extremely inconsistent logic and lead you to cases where things may be not null and may also be null in other cases where they may be not null and at the same time not is null so this is these are pieces which really kind of highlight some of the difficulties again one of the really cool things you can do here is you can actually leverage this when you're doing anti-joins like I mentioned before you can say we only want we only want those cases where we actually have a match on the row but now I'm starting to wonder about that because even though I use that all the time this row handling is weird and it's not what I expected so I'm going to actually have to go back and double check so to speak however just a second I want to see something here if I do this so that is still okay so I'm going to just check this also maybe it may be about casting nope I'm stumped by this this is this was this is this was not expected for me so we've kind of talked about like how nulls are stored and some of these things we've talked about the three-valued logic we've talked about the history and the fact that these are null references that they don't actually necessarily mean anything are there any questions on this side at this point I know I'm quite a bit early I'm trying to think if there are any other interesting examples that I can go over I have a question when did you actually use the array type in a collection type when is it useful besides why can't you just create more columns in a table so so what I usually suggest here is thinking so arrays are useful in one of two specific kinds of cases okay the first case is it may be useful to represent a single value as a series of sort of sub-values so as a completely contrived example and then I could give a I'll give a more specific example it's a completely contrived example you could break down an IPv4 address into four octets and so you could store that as as a four member array right that sort of thing can be very helpful if you needed to do processing on one piece of it like if you want to you know figure out whether something here has one of those members being a particular value a real-world case I would give on this would be when I worked on a big life sciences database we would generate something called BLAST reports and I'll just give you an example of what a BLAST report is BLAST is I forget what it stood for I think it was like basic something basic lateral alignment search tool I think is what it stood for and the idea is that you have protein sequences right or genetic sequences and you're looking for similar sections in different ones and there may be offsets and things like that and so what we would do is we would run the tool that would do these searches and then we would store like the top 10 results in an array we could have stored it in a more normalized form but the problem with doing that with the size of the data that we were dealing with storing it in an array just simply made performance a lot better because it's made performance a lot better because instead of finding rows that might be on different pages you'd get the whole row set once out of a single page and so that helped a lot in that case over time so in this particular case we would we didn't index the individual entries if you wanted to do that that would be possible but it would be a lot more difficult you'd have to probably write a function to extract what you want to do index out of each entry turn that into another array and then you would probably have to take that and generate a GIN index on that out of that function so it's possible but it's we were in this case only looking up the last reports by ID and then we were dealing with deletion anomalies by doing a further expansion and join out of that to remove things that have been previously deleted where I used it in many places especially to create big tags for example in an e-commerce site each product had a bunch of different tags let's say for a candle's e-commerce website so it was red small six by four and a bunch of different tags and attributes and that allowed to write a query that was very efficient to find now all the items that have the tags let's say red and small red and voted so because it is indexable so it was very effective very efficient also in blog posts when you many times put tags on them so filtering when you want to filter the result set according to attributes because yeah because you know it's four but if it's tags you don't know how many tags you have but like with okay so with the IP example I mean in actual practice it probably doesn't make that much of a difference but one thing that it would make is that if you were to create for example a domain that said that we have four entries here then you could create functions on the domain that could present it as an IP address so you could do things to it right if you store it in columns there are ways to do this with inheritance they're ugly but you could still attach functions to the record type to preserve it to present it however you'd like right I wouldn't recommend doing it that way because Postgres has an IP tables type no an IP type but as I say that's a contrived example so again if you have something that's structured and you need to present it as substructures that's a good option another thing to keep in mind is that SQL arrays are not like programming arrays in one really important respect and that is that SQL arrays are strictly rectangular right so if you have a two dimensional array in SQL it is not like an array and choose your favorite programming language I think I think I'm not sure I think Fortran arrays may be comparable but I can't think of any other language where arrays are comparable and what this means is if you have a two dimensional array you can't have the first the first ordinal being four elements and the second ordinal being three everyone has to have the same number right and what this means is that arrays are actually matrices so if there's any data you want to store as a matrix or you want to model as a matrix an array is actually a really good data type for that an example that I would give that I pushed for in a previous place I worked was that we would get some financial risk matrices from an upstream provider and that would have been a perfect example to use a SQL array to store just simply because the data type matches what you're getting and there isn't really a normalization you can do to it so it would just typically be like a set of it'd be strictly rectangular usually like 5x5 and I think if I remember right they would be they would be all floating points so it'd be like 0.2, 0.3, 4, 0. and then this would allow you to look at like severity and over time of a particular kind of incident that was likely to affect certain kinds of markets right and I don't really remember where we got these from but just because the data was like a matrix and then it was processed as a matrix it would make perfect sense to store that as a SQL array yeah, exactly but yeah, but JSON B arrays are not like SQL arrays right so the thing is like JSON B arrays are JavaScript arrays they can be completely heterogenous they are one-dimensional and if you want to have two-dimensionality into them then what you do is you basically have elements which are also arrays so what this means is you can have a JSON B array that's object, object array, object array, object, object value and so that's that's something where it looks more like our tuple types or record types than it but you can't store a you can't store a record type without any other type information in Postgres because if you try to do that then there's nowhere to decode the record header from so I would definitely say JSON B arrays are also useful they're relatively different from they're completely different from use cases, wise from SQL array I would say if you're storing actual array data then I would definitely use a JSON B if you're wondering where I would probably do that if I'm building for example a large log management system then that data that's coming in maybe relatively unstructured and consequently putting it in in a JSON B object makes perfect sense the question is about compression on it and yes compression would help also although interestingly enough when I was working in a previous place the compression on JSON B we could get on a column wasn't anywhere near what we could get on disk if we put it on a compressed file system so since this was almost exclusively in a PEND only workload we just put everything on ZFAS compressed that way we could ingest faster data just simply because we could trade CPU cycles for IOPS JSON B yes actually I can show you I have the ways of converting JSON to relational normalizing back because there is one function which converts record set to JSON and back and most times all the time I hear customers are saying that they need JSON because they do not know what type of data they receive it usually overrated in real life they know what they receive at least like something and it's more faster and more economical to convert them when you receive it from application to relational and then convert it on the if you're interested I will show you because I have it no I can convince everyone I can convince everybody Omar I'm just a moderator so the question was between choosing JSON B and plain JSON this is a slightly different topic but there are actually several cases where I have seen things that are supported in JSON that are not supported in JSON B and if you're hitting those cases and you don't need the gin indexing then it makes sense to use JSON instead of JSON B I'll give you just the two top examples that I have off the top of my head JSON as a specification allows duplicate keys in case you're happening to use duplicate keys JSON B will not work for you you probably should not have duplicate keys if you're using duplicate keys for something where you actually need duplicate keys then sure but there's probably better ways of doing it than that the second one is actually a really interesting issue with the internals of how things are stored the specification allows you to do unicode escapes inside of JSON identifiers and value strings JSON B when you store it stores actual strings so if you have embedded null characters which might happen for example if you're actually like trying to store actual binary data then JSON B will not work for you yes so if you do a unicode backside U0 you will get an error I can probably I'm trying to remember how this works I'm trying to remember if this is right wait I don't think that's right so I think that's the wrong so let me try this again select might be okay so we have this this will throw an error so yes don't do not ask me how I happen to know this yeah I mean the only way anybody knows this is by stepping on it right unless you just happen to be at this talk and you're going huh that's really interesting I'm going to remember that so the fundamental problem here is that the way strings are stored in Postgres internally in memory is as null terminated strings and so what happens here if it tries to if it were to try to convert it is that you would end up with possibly a null termination character inside the middle of your string and this then causes a problem and so the safety is there to keep that from happening there isn't a way of really handling it unless you want to unless you want to redesign your JSON documents to for example store things in base 64 encoding or something but that's generally speaking not the best way forward so so those are the two cases where JSON works and JSON doesn't I think there is probably a solution if you want to hear my solution you will not like it you can probably guess what it is and it will not work on fully managed Postgres instance services but your only other option is figure out how you actually want to represent this on disk and in memory and write your own type which the code is mostly there for you in Postgres you can modify but most people aren't going to want to do that can I try the same thing but cast it as text yes because this is a specific aspect of how JSON is supposed to encode unicode characters so that's a bit of a digression but are there any other questions on the null side at this point I guess not yes so if you're using composite types one more time if you're using composite types you want to make sure that you don't have a null type then you put is you put a check is not null you don't just rely on the not null on the column definition that's how you have to handle that yes all right so Chris promised that everybody is going to be stumped by nulls by the end of the talk I think he kept his promise thank you Chris hello folks up for our next talk Postgres memory internals well memory management internals please welcome Krishna Kumar hello everybody welcome to Postgres memory management internals I am Krishna Kumar all of you can call me KK I'll start off with a bit of introduction about myself and how this talk has come about primarily so I'm engineering manager for Postgres open source contributors and committers team in Microsoft I started roughly like 22 years back born into Linux and was doing embedded system porting on things like MIPS and SS4 boards I don't know how many of you probably are aware of those boards but I know that SS4 support got taken away from Postgres sometime back so that's when all this was going on from there I joined HP worked in high performance computing for a while, scalable file systems did join a startup of hybrid cloud storage and it got acquired by Microsoft that's how I landed in Microsoft continued doing storage stuff slowly moved into Kubernetes side of things for some time did a lot of open source work around authentication storage and things like that and that's how I landed on Kubernetes side in the last three to four years I was part of the team which built the managed cloud service called flexible Azure flexible server in Azure so we have a managed offering which I helped build recently roughly like last year middle I joined as the engineering manager for contributors and committers team and a lot of my job is the people I interact with and these are the folks who have commit privileges to Postgres who contribute a lot of code like Thomas Monroe, David Rowley Daniel Gustafson, Anders Froin, Melanie Bilal and Melly so all of these folks I interact on a day to day basis as part of my job so a lot of things to learn via them and this is my attempt to share those learnings with all of you and as much as this is a sharing session I would like to learn from all of you as well so continuing on this is roughly how the talk is structured will quickly touch upon the architecture just in the memory context of memory we'll talk about how Postgres uses the shared aspect of the memory how between processes it uses those then go into a little bit details of the types of shared memory that's around in Postgres then we'll talk a bit about the local aspects of it how Postgres even though really it's return in C it still offers a sort of library which kind of gives a managed language feeling when you write code in Postgres we'll talk a bit about the kernel interactions that Postgres has we want big deep but still we'll go through some of those aspects then we'll talk a bit about how to debug some of these issues in terms of not bringing many issues into the picture but just enabling all of you to be aware of these tools and how to use them and things like that we'll touch upon configs which matter and it's not like a comprehensive thing again it's more like how can we like what are the interconnected pieces here with memory and the configurations then we'll talk about projects some of this are currently out there in the mailing list some of this have patches out there some of them don't have and some of them is close to my heart so I just want to talk about it so that people if they want to join and collaborate would like to engage and then as anything else in the open source world nothing stands on its own there's a village behind all of these things these are learnings from a lot of people just want to say thanks at the end to all of them as well so we'll start off with the usual you know quote which is obligatory to put in that for a long long time the promise that Postgres at some point will become trading like will be based on trading but at this point in time Postgres is still process based there's a lot of momentum at least around the last year we have started seeing more focus on making this happen but you know it's not really happening for a long time since 1990 promises at least so at the end of the day Postgres is a process based architecture Postmaster spins up the rest of the processes and everything out there is delta's process so when new connections come in there are more processes getting created and there are like some default ones few of them I picked and put it here so the lot of memory management process of Postgres is based on this architectural fundamental right like if we had process if we had threads some of them is not needed if since we have process some of them is designed a certain way so that's how this context is important so we'll talk about in this talk we'll talk the structure of this talk is like about the shared memory aspects of it the shared process different processes sharing the same memory we'll talk about local memory aspects and the kernel memory aspects of it so just in terms of this is just a pmap output from one of the Postgres processes just to give you a glimpse into how this shared memory and like shared memory and local memory appears so in this case like the shared memory we can think of shared memory as static in the sense that there's not a typical static memory I'm talking about the static here means that when the startup happens you already you know like allocated this memory and then you continue to use it in between the processes and the dynamic side of things is when after you have a certain execution going on let's say you had a query which which Postgres wants to make parallel at that point it needs to generate new processes and then have new shared memory attached to it so that's the concept of when I'm talking static and dynamic I'll just give it a name it's not the typical C static dynamic thing but just so that you're aware in shared memory so here the picture that is there I'll go into details of this later but just to give you a sense if you run pmap here's how you will typically see this one of them is static type which is declared in the beginning and the other one is like as we go through we'll attach as needed so in the static memory side of things we will talk about how it's created that is when the startup happens how it's created and things like that we'll talk a bit about how locking is done in this framework because as you can imagine the locking will be totally different if we were to do threading so just want to touch upon that aspect of it like how locking is done in Postgres within the in terms of Postgres we'll go through some examples primarily we'll go dig a bit on what buffer pool is and how it's managed and things like that and what could potentially be the enhancements that we could make in that area we'll quickly briefly look at like SLRU and wall buffers but primarily we'll dig more into the buffer pool side of things at some point in time Postgres used to be system 5 shared memory mapping like that's what it used to use over course of time we have gotten to a point that there's a little bit of memory that we use with system 5 memory for making sure that there's no two processes which are running at the same time or two start-ups which are happening but outside that everything has become more and more of M-map stuff and anonymous allocations so as you can see in this address space the M-mapped shared memory that we have which is like typically if you take a file and then memory map that's what generally you do but in this case we don't have a back of anonymous memory that we map we M-map this area give the shared flag attaches to the primary process write the postmaster so from here on anytime there is someone some process which folks it also gets this address space with it which means that it will get access to the shared memory and then from then on it gets to use these areas so that's the way in which the shared memory is created and that's the way in which fork happens you have multiple choices if you want you can go back but typically this M-map system is what default comes and most of the times we end up using this I just want to touch upon a bit here typical allocations you can do with regular memory but if you have a huge memory and huge pages enabled in your system you can use those flags I'll talk about the configuration in a bit but you can use those to make sure that this memory is allocated in the huge pages as well so that's the creation part and how the child processes and rest of them get it like as promised we'll look at locking a bit so because this is shared area there's a lot of things that happen back and forth that happens between these processes and locking as you can imagine is one of those things that we really need to happen when we are coding in this space so in just to give the context this is not the row locking or a transaction locking that I'm talking about I'm just talking about the lightweight locking which you use when you're coding so lightweight locks are implemented in a very straightforward fashion although as you get into the nitty-gritty it becomes complicated but the way that this happens is first there is atomic operations that's possible on the memory so you use that to lock the memory if attempt fails then you get into what is shown in this you first the process first puts itself into a queue then it tries to lock again if it doesn't work it will wait and in Postgres currently we use semaphores to kind of like make sure that all of them are waiting on the semaphore and when unlock happens we go and like wake the rest of them up to see if they can lock again the semaphores currently we use semaphores but there are proposals around whether we could use the more modern stuff like few-texts and stuff like that so one thing that I missed mentioning was that the whole talk is about how Postgres is done on Linux not on other operating systems so that's why I'm using few-texts and other things just so that we have the setting but in some cases we use something like few-texts semaphores itself in Linux is implemented on top of the few-text system calls nowadays so there is trade-offs around that but in general the process is it depends on the atomic locking and it depends on the semaphore as well and there is a wake-up process that goes through in locking so this is how the locks are designed and this is what gets used that Postgres processes have to communicate between each other a bit of detail around this that it doesn't have sophisticated things like deadlock detection and things like that so when you are dealing with this please be aware of those things so that as a programmer you have to deal with those things there is shared an exclusive mode in this locking available and typically in literature the latch is what folks use but I'm just calling it lock because of the name of the functions and a lot of this is like in the short-lived situation like I was saying in lightweight locking scenario so that's about so we covered the shared memory aspect of it how it's created now we talked about locking a bit we'll go into one of the examples that is front and center of a lot of usage of this so Postgres as you can imagine needs to be performant so anything we access from the disk we want to be able to cache it also we don't want to operate we want to make sure that we operate more efficiently which means we have something called buffer pool in Postgres and this buffer pool is implemented on top of the shared memory that is there and statically declared one and here the entire buffer pool allocation is done into blocks of areas you know blocks of sizes and then there is like descriptors which are defined on top of it which is just like metadata for each of these blocks a lot of times this particular buffer pool that we have in Postgres is optimized to get to a place where we have the most recently most recently used like page as well as the most frequently used page itself so it's a give and take kind of scenario where we have to balance these things out but at the end of the day currently Postgres users are very let's say it was somewhere I think in 1960s or so the page replacement algorithm that's there we will give a brief about that like effectively we do something called a clock sweep where every time we need to evict a page from this buffer we go around through the buffer the ring buffer actually doesn't exist but based on the descriptors we go through this buffer and then we keep reducing the usage of this each of these block buffers if it gets to zero then we can evict it because recently nobody has come and updated it so that's kind of the basic logic that we have and there are a bunch of discussions around this on whether we can improve this part of it but right now we still are using the clock sweep algorithm one challenge with this is that in Postgres there is a lot of activities which could actually trash the entire buffer like vacuum and sequential scan is a two set of such things where if we start doing the sequential scan we could actually just fill it up the entire buffer pool and trash the regular rest of the IO patterns that were there so in that context we use something called buffer strategy control wherein fixed set of blocks from this block for blocks that are there are the only ones that we can touch so which means that when vacuum comes up and tries to do some of these reads we do this buffer strategy so it doesn't go and uncontrollably trash the entire buffer pool it gets fixed size that it can operate on so that is how we tend to make Postgres tends to make the scanning like Postgres tends to make scan resistance in this buffer pool but at some point in time a sophisticated algorithm probably something like CAR which has built in scan resistance could help here as well but overall the theme to take away is that the shared memory that we have here is used for buffering the contents from the disk and we do have a page replacement and we will go to them built into this so just to give a picture here this is the buffer pool content how it looks like so one thing that I didn't quite touch upon was there is this buffer hash table so whenever we have a need for the buffer we need an easy way to look up the buffer so if you look at the buffer pool the rel file node and like these IDs right the 5 IDs after buffer ID those are the ones which are used to map in that buffer hash table because of that we have a very fast look up mechanism as well we use this fightable look up the table and then figure out the blocks from which we access in general in the operations that we deal with that's an optimization we have as well the rest of it is all like this is from a table which I recently populated so as you can see there are fields which says whether this buffer block is dirty the buffer IDs of this and a bunch of these other values that are required to maintain the buffer pool the other many of the things which use this shared memory while buffer being one of them SLR use another thing which again uses a page replacement algorithm and there's a lot of discussion which has happened in the past on whether the SLR use can go into the buffer pool itself because they kind of solve the same problem but in a different way can we enhance that to focus on one algorithm or usages another thing so just laying out these are the other SLR use which are there which we which we use in the shared memory as well before I move on to the dynamic part of it any questions yes unfortunately I don't know but I will get back to you because I know someone who really knows this yeah Thomas Monroe knows Thomas Monroe like he knows it I mean like not like he's working on that but he knows all of these details so one okay yeah I don't know that part but I was just using like the user session like I didn't no this is stat this is not no extension sorry yeah I do know that just that this is available I don't know which version is available I was just using the head of the tree so let's move on to the dynamic shared memory part of things so there are scenarios which we really need dynamic shared memory like parallel query execution being one of them where we have not allocated the memory in the beginning so we can't use the you know fork magic that we rely on we need some some way in which we newly allocate this memory then attach it to the the processes that are there so the way in which so there are two types of dynamic shared memory that is the names I just there's DSM and DSA so dynamic shared memory like DSM was the one that first came so and then DSA was built on top of that for more surface together operations like if you want multiple DSM segments and want to manage the allocation better things like that is where DSA came into picture in DSM scenario like if you look at the right side of the slide this is how the operations really happen we take a file give it and like an ID at the end like inside DSM then we map that into the system and then pass this ID to the worker processes or the next set of processes which has been in the process and then attach this to there on address space as well. So the secret source here is really that shared memory attach and making that happen here I'm showing you something which is not recommended when you deal with things but just enabling parallel query by default and just running a select thing so that it spins off some workers and opens it then M maps it and later when the query is over it unmaps it and then unlinks it so this particular shared memory file is what gets M mapped so the processes will get with each other. One of the like one of the salient problems that is there with this is that when you do this dynamic shared memory the process like in terms of address will be different in different processes previously we could have relate on that but here we need to actually make mechanisms to make sure that we have a different way in which we can map and figure out where the memory was. So in both DSA and DSM this kind of works out fine but DSA has more like friendly usage of segments and things like that and it was developed both of them like the original one I believe was developed when we were starting to get to parallel queries and parallel has joins when we go to it that's when DSA was developed as well so that's the way in which dynamic shared memory is dealt with so we crossed over the shared memory the static and the dynamic part of it went through an example now we'll talk about how local memory is operating local memory is like think of it as maloc when you do in a typical application where you need objects to be allocated in the memory to make things happen and that is where it's used so Postgres offers this I would say infrastructure where there's something called memory context and memory context has various APIs like allow, realloc, reset, delete all of these APIs and it is sort of an interface kind of model where there is memory context and then you have implementations of this so a typical implementation of this is aluxet which is what is used in most cases then there are like specific implementations which folks have developed over course of time to improve the memory management part of it slab is one of them where we had to do like a large equally when we are doing a large equally sized objects slab is what we should be using and I believe in logical application we had used that significantly in the past the one important aspect which comes in very handy is that there is something called current memory context which always will point to your current context that is there and I'll explain the next slide why that becomes important so this is how the memory context itself is built with a hierarchy in mind so the right side shows the top memory context which is what we start with in every process that we build we have top memory context and then from there we have a child memory context from there and so on and so forth we can go to levels multiple levels of this so at any point in code so we don't have to pass the memory context a separate function parameters and things like that it's all there declared so that whenever you are in a particular function and then you want to look up what's the current memory context you always have like I was saying the current memory context all of them is hierarchically tried in the way that when you from a parent you can get to the children that have been allocated the way that this comes in handy is that you don't have to manage the memory that if you had an error you need to come back and free this up like you just can free up the parent context at wherever you are and the children will get freed as well so this is already available within postgres so you don't I mean postgres core code so you don't have to think about memory management a lot of times and the malloc portions of this are the internal of how the memory is managed and whether we should free it up all of that is handled within the memory context so we don't have to really do this we can call the free thing but that the implementation will determine what's needed want to touch upon a bit on how exceptions are handled exception is not really exception handling in other languages kind of exception but I'm talking about if you had a particular location we use this set jump kind of mechanism where if it goes back into the stack at a certain point it still will free up the memory because we would put hooks there which make sure that the set jump like the capture of that the handler of that will make sure that we free up the memory there as well the last point was what I mentioned earlier the last point is another technique which we use where we don't when we start allocating we start allocating the contents of this like the memory context of this until we get to a point where we can move forward we don't assign it to the parent process what that means is that sorry parent context what this means is that we can handle the errors very easily if you have an error we can just free up the entire block instead of like trying to free up these by piece and at the end when we are safe and steady we can do a setting of parent context as well so that's how this is managed yeah so that's the local memory context part of it before we move on to the current side questions yes so the test environment there is an option to enable asserts by default for example when I change code I enable them by default and then when you test them these things like get caught up but like if someone has put the assert there you would get it so it's a little bit like context specific that we'll have to look at this case and see it but there is an option to enable this like while compelling the nowadays mission what we all use so while configuring we can set it and then run it so because we have processes as is the case with anything in Linux there is some amount of memory conception that every process takes something like the page tables there's memory for stack and things like that but I'd like to talk about this concept of double buffering that typically happens in the Postgres world so when we talked about the buffer pool the idea was that we would have a cache for everything every content that was there that we deal with from storage but most of the time Postgres is on top of file system so at that point in time when we are dealing with when we are reading from storage like reading a file for example it also gets cached in the internal page cache and this is one of those things where we have the we potentially could have the same copy in memory inside Postgres as well as inside kernel right like page cache inside kernel and buffer pool inside Postgres as well so that like it's sometimes very useful sometimes not very useful but I just wanted to bring that topic here so that I can show some of these ways of debugging so basically Fincore is one of those tools that you can use to figure out the contents from the buffer pool in this example I am looking at a table called my large table and then I filled it up with some large amount of data recently written and then looked up the Fincore to look through the buffer cache and sorry page cache of the kernel and then as you can see the number of the so when we eventually look at the file system layout the already of this will be what will be there in the file system so that one when we look at the file node relation we can find that the pages are cached appropriately inside the page caches now off late at least few years since we have been working on this thing making Postgres to go to a place where we have Asynchronous and DirectIO built in both are things that we really need eventually for better performing you know like better performing Postgres a lot of the group in Microsoft like we work on that like Anders Thomas Melanie for example work on it and David too so when that DirectIO comes what you will start to see is that this page caching is not like the pages in the cache that is shown on the right from the kernel will no longer be there as well I was trying to show a screenshot of that but didn't get a chance to put it in but right now in Postgres 16 you can enable DirectIO in debug mode then you will start to see that we no longer cache it inside the kernel there are performance aspects of this it cannot be used directly and that's the part where the development phases of Postgres is taking us through some project milestones where we will get to a place where we have AsynchronousIO and DIO built in and then we will solve some of these performance challenges with buffer pool as well so that kind of concludes the type of memory we talked about shared we talked about private and local and we talked about kernel memory consumption we will talk a bit about configuration a lot of talks here already covers a lot of this so I am not going to be comprehensive around this just touching upon some things which really are interesting and curious when it comes to memory management so shared buffers is the first thing it will determine the buffer pool aspect of it make it big then you might actually consume memory in a way that kernel itself won't have memory to do stuff for its page caching so there needs to be a balance in that that's the one thing to remember but some tweaking here could help as well in a lot of scenarios when we have very IO bound workload we have work mem which typically is the internal limits around this but the fact of the matter is there is really no limit in most cases because every process that kicks in let's say parallel workers or every user session or partitions that kicks in it thinks that work mem is its limit so that means that if you add three or four of them all four of them would be actually looking at the work mem so in a simple configuration this kind of works by setting the work mem but you have to be aware of these other corner cases to currently make sense on how much actual consumption of memory is out there and what the limits of them are huge pages is yet another thing there is on, off and try and on off is like it will try to allocate if there is huge pages available if not it will move on and try to do the regular allocation the important thing to remember is that huge pages might be needed to be reserved in advance and that will help you to make sure by the time postgres comes up it gets access to the huge pages by default the next setting is something called overcome it setting this is by default postgres documentation is suggested when you have a regular Linux machine you don't have this setting on and postgres says hey set this up kind of thing but the thing to remember here is that in a typical Linux system when you let's say you do a malloc and then you you try to write to the malloc that is when actually the pages would get faulted in and allocated properly and there is no limit you can do like mallocs for a large amount until your virtual memory stuff but you could have a place where your physical memory is still not used and you will be fine but when we use this overcome it setting that is typically recommended in postgres systems if you do a malloc and every time you do malloc the memory gets reserved by the kernel what that means is that postgres who can expect the behavior where when we are running out of memory we will end up in a scenario where malloc would return us error rather than a fault right like out of memory fault kind of scenario so that is the kind of expectation of this that there is a more graceful behavior somewhere I think folks had the expectations that when a new user session connects and it's not able to allocate memory it should be killed and stuff like that but typically if you run into this situation then the entire thing goes bad but the good part of this is that we won't have to actually do a lot of scenarios that we end up we don't have to do recovery right like that's where it will be a more graceful shutdown because you get a proper error return from malloc so that's the story behind the setting but something to be careful around when you are dealing with many many instances of postgres in a cloud environment if you are setting this there are consequences of this especially in low-end skews and things like that so please be aware of this setting yep yep no magic yeah but work mem is tricky that's the message I want to I mean typically when we configure it we expect it to behave all fine but there are scenarios where we are not calculating max connections is not enough actually but I would say a typical user we don't have to worry about it but as we get to sophisticated users of let's say we had a lot of parallel queries going on yeah there you go so a bit of back and debugging sort of thing like things which we could use to debug some of these things where actually our memory went to the end of the scenario like here is a couple of things we can use which is one is as I was explaining earlier the shared memory allocations the PGSMEM allocations view so I've put that on the right like there you can see all the shared memory allocations like it got snipped at the end but like all the shared memory allocations that potentially you can get using this you will be able to understand like what statically and what dynamically these things will help you to get a sense of it back and memory context is the local memory context that is there the thing to remember here is that we will be getting for that particular connection like what's the back end thing you can give say that hey I want it for another process let's say you want it to figure out what the memory conception of that is that also you can send but it will be logged inside the system so combining these two you should be able to figure out what's going on in the system in general because allocations are happening in these two areas this still doesn't cover the kernel side of things like you still need some more magic to make that happen but this should be good for us to debug some tools and extensions like pg buffer cache is the extension I was demonstrating earlier which can tell you what is the content of the buffer cache that is there the pg pre-bomb is another tool typically people use to pre-load the tables into the buffer pool as well PMAP is another tool which you can use to actually map the entire process on the right here and then figure out what is allocated locally what is allocated like in shared memory and things like that so these are tools which when you end up in situations where things are not going well life-saving tools at the end of the day now that we have covered some of the fundamentals of all of this let's talk a bit about some of the projects some of this are ideas which are floating around some of this have consensus some of them don't have they are very contentious in the English so just wanted to lay out some topics around this so there is a patch which is coming up in PG 17 which we did as an internet project and we have a group and Thomas helped guide this and we are working on this is to get to a place where we can invalidate the buffer cache like if this tool can be like it's part of the core postgres you don't need extension but then you can call this invalidate buffer cache and then you can get the buffer cache fleshed or dealt with later so this is not to be used in when you have to debug something as well as this is a building block to some of the other things that we will talk about so the next topic is like the memory shrink and expand scenario where right now everything is kind of like static if you have a buffer pool that is actually static when you want it later you still need to restart and throw away your context and build this thing whole later but the place where we could reach is that postgres just stays on and you can just expand or shrink this memory and that's the idea behind this so we could use it in let's say serverless environment where we want to decrease the usage of memory as the more and more of these user interactions reduce we can decrease the memory and as the user comes in we can reduce the memory so right now postgres doesn't have such things it's all very static so that's potentially one area that we could over booking is another scenario which comes up when you have a multi-tenant scenario and you want to make sure that you can put two or three sorry not two or three you can put multiple instances of postgres into one let's say code node one of these instances we can free up and give it to another instance and things like that which involves kernel at that point which involves hypervisor at that point and the whole stack but this is another area where something like memory shrink expand could be useful and anyone interested in this space please let's talk I'm very very thrilled to talk about this to get it to postgres we're counting and limiting this is a bit of tricky situation that we really don't have a good way to limit postgres memory right in the sense that let's say we want that to say this query should not be using beyond a certain memory something like that some sophisticated limiting that we could use where we can manage the expectations of the users or wait till some memory is available right where other users are using the instance and things like that so right now that facility is not there there are patches in the mailing list where discussions are going on especially on accounting and limiting but they are not yet concluded so we are still in the stages SLRU was another topic which I touched upon earlier where could we combine SLRU into buffer pool and have just one single way to manage the things which are just backing in memory there are scenarios where we have caches when we execute a query we end up with caches but we don't free up that well on time and that's another area we could do there's a lot of performance improvements possible in allocation speeds every version gets it pg17 has a bunch of Palock allocation speed improvements memory allocation speed improvements are coming up as well and the other thing is like we could do more sophisticated buffer pool replacement algorithms so those are the projects ideas which I want to spill out in the world and have folks look at it and let's collaborate on this so I just want to thank first of all I want to thank all of you for coming to the session it's been a long time since I've given a talk so I've been thank you for the participation I want to thank our group inside Microsoft which consists of both comiters as well as contributors so we have a group which focuses solely on making improvements in Postgres, adding features and improvements only on specific areas but we do things like CSED improvements, we do things like really AAO, DAO and things like that so the whole spectrum of things we do and I get to have fun with my group here and a lot of this has come from that so thanks for, thanks to them special thanks to Thomas Mandraud, David, Roli and Anders Rowan because they helped me gaining the knowledge and kind of like brainstorming on some of these aspects Teresa from Microsoft helped me build the graphics around it so thanks to all of them thank you and that's about it from the session so the double buffering do we encounter the double buffering for the huge pages also? Yes I believe so okay thank you the page cache still has it like when we take it from the thing, I believe so okay so just a quick comment and then a quick question the comment is one thing I wanted to just kind of mention about enable see asserts is that it also when you do that it also enables zeroing out each of your allocation sets, your memory contexts when it clears them so that gets rid, that allows you to turn problems into crashes but my question was you mentioned huge pages have you seen any particular cases, any particular parameters where huge pages make a particularly big difference to performance or is that something that's maybe outside of your area of work? I mean, we do recommend that in general I have not dealt with a specific workload which dealt with it but in the past I've read about workloads which are out there which benefits from this but not directly dealing with it yeah the older databases and the incumbents okay so Krishna Kumar he was getting up on the stage told me that this is the first time in 18 years that you're giving a talk and I told him that this is just like riding a bike and I think that he proved that another round of applause for him thank you