 Hi everyone! I want to welcome you guys to Mostly Mistaken and Ignored PostgreSQL parameters while optimizing a PostSQL database. My name is Lindsay Hooper and I'm one of the PostgreSQL conference organizers and your moderator for this webinar. Before I hand off to Avi, I just want to give you guys a brief intro about what we're what we're gonna be talking about today. So in this webinar, Avi is going to detail a list of important PostgreSQL parameters that could drastically improve the performance graph when properly said, discuss mathematical formulae and extensions helpful to expose diagnostic data for tuning parameters, and show a detailed description of the top mistakes made while tuning PostgreSQL parameters and how to set them correctly. I'm here with Avi Tech Lead Global Services at Prokona. Prior to joining Prokona, Avi worked for OpenSCG for two years as a database architect, and at Dell as a tech lead for almost 10 years. His vast experience in technologies like Oracle, PostgreSQL, MySQL, and MongoDB. He's an avid Python and GoLine developer and he's co-authored a book on PostgreSQL, beginning PostgreSQL on cloud, and another one on PostgreSQL in progress. His areas of expertise are PostgreSQL, training, consulting, and migrations. So with that, I'm going to hand it off to Avi and you can take it away. Enjoy. Thank you, Lindsay. You could see my screen, right, Lindsay? Correct. Okay, just wanted to confirm. All right, thank you. Thank you for the introduction and hello everybody again, and thank you for joining today for this webinar. And as Lindsay said, I've started my career as an Oracle database administrator and at the same time, I've seen even MySQL, MongoDB, so I've got some experience working with those databases as well, and then I've started working on Postgres in parallel. And now, I'm of course also working on migrating those databases away from, you know, Oracle, the actual database I started my career with to Postgres. So yes, so this talk is more about the parameters that you would have to tune in your Postgres database, or in case you haven't tuned them, or in case you're not aware of the right ways of tuning them, this talk hopefully should help you all understand how to do it. So without any further break, like, and I'm also working for Percona, as Lindsay said, and as a tech lead, please feel free to follow me on LinkedIn and we could have a nice chat. All right, so some points to note. Most of the performance issues, most of the performance issues today, as we have seen in majority of the client environments, are all due to lack of tuning. So sometimes, most of Oracle to Postgres migrations during the initial stages, when the code, let's say, you migrate PL SQL code to PL, PGS, QL, and Postgres, or you migrate some tables and their data and start running some queries, the performance team starts complaining that the performance is not that great on Postgres, right? So what really happens in the back end is that you haven't really tuned the parameters that you would have to tune. So just not migrating the data, but along with that, you should also be tuning some of the important parameters. And you should tune them like you tune it for Postgres, but not like Oracle, because there are several architectural changes between Oracle and Postgres. Some of the, you know, commands or some syntax may appear similar. So we definitely could go under the assumption that few things stay the same with Oracle or Postgres, but we need to be aware that there are a lot of things that are different between different databases, right? And likewise, beat Oracle or any database, you cannot just compare directly with Postgres. So you need to know how to tune it like Postgres. And so like I said, most of the performance issues are due to lack of parameter tuning. And if you don't tune them and leave it as the default parameters, assuming that the Postgres really works fine with the default parameters, that may be a mistake. If you have a really minimalistic server configuration, these default parameters may be good enough. For example, your shared buffers, your Postgres memory area is actually set to 128 megabytes by default. And that is, that could be really tiny for a huge production database. Sometimes I also see a lot of mistakes related to auto vacuum. People assume that disabling auto vacuum improves performance, but you may be making a really big mistake there. Because if you go ahead and disable auto vacuum, you need to know what auto vacuum is doing in the back end. And you should be able to do all of that yourself. But it's really, really difficult to tweak everything that's being done by auto vacuum. And you know, you really need to be an expert in doing that. And at the same time, you should let Postgres do its job while we trying to tune the auto vacuum settings the way we have to do. And sometimes while loading data or for some temporary purposes, we may create tables and set it to unlock. We're just fine. But you may really not want to do that in your production databases, just to increase performance. Because usually you could go ahead and set your table to unlock. But that would of course avoid wall right. So your writer head logs do not get generated. But you should know that the replication would not, I mean, these tables that are set to unlock would not get replicated to your standby database. So due to that reason, even if you would need to perform a point in time recovery using any of the recent backups, you don't have the writer head logs that contain the changes that have happened to these tables, because it's set to unlock. So due to that reason, a lot of damage could happen. And at the same time, I also see some mistakes or assumptions that setting the Postgres database to an archive mode by archiving writer head logs would, you know, affect some performance, which may not be right, right? So you need to archive your writer head logs safely, because your standby, your streaming replica, or even your point in time recovery, all of that depends on the writer head logs. So never try to go ahead and, you know, disable the durability that's offered by Postgres by default. At the same time, also never try to turn off some feature that could help you really during disasters. Postgres allows instance level database level user session and transaction level settings. So some of the parameters which you said globally, you know, could take effect into everything like for every user for every session, every query, but you don't have to do that. You could, you have a lot of parameters that you could set at the database level or a user level session level, right? So you need to know how you could limit the, you know, overall impact of that setting to a transaction or a session. So now we could actually go ahead and quickly see an architecture diagram of the Postgres so that we then get started with the parameters because all of the parameters that we will be discussing today will be based on the architecture diagram that you see here. So in this architecture diagram, you see processes, right, which it does include background processes, and you see some of the memory components, and you also see the storage. So in the processes, you actually see that the Postmaster, of course, which is the, which can be considered for Oracle guys, it could be considered as a listener who listens to the connections first and forks a background process for each connection, right? Because you have a per connection back in process, right? So if you, if your application connects to a Postgres database, Postmaster would for a user process and so another process is created in the back end. So likewise, if you have so many processes, so many user processes, right? And you also have some background utility processes similar to the background processes that are running in some other databases. So each of these processes have got certain, you know, I mean, each of these processes, you know, have certain purpose. So for example, check pointer is a process that's that's responsibility is to perform a checkpoint. BG writer is a background writer, archival, wall writer, log writer, logger, auto vacuum. So there are several such background processes, you know, but we will not be going into all of these background processes in detail today. And you also have work mem, maintenance work mem, wall buffers, share buffers, which all come under memory because all of these will be using RAM. And you also have data files, writer head logs. And if you archive these writer head logs, you call them the archived walls, of course, you may be storing them in some remote storage, which is more durable. It could be cloud. And you also have globals, right? Or table spaces, maybe, or a lot of other files. So all of these come under storage. So now, by considering that architecture diagram, we would go ahead and start discussing the mistaken parameters. So the first mistaken parameter that we are going to discuss about is shared buffers. So what does this parameter mean shared and disco buffers? So every time you create a Postgres instance, or if you already have some existing Postgres instances, you may go ahead and see, you set a Postgres, you set, you set a value to shared buffers parameter, right? And you may go ahead and set like 40% of the total RAM available, 25%. Or you may just go with some random value as well. But how to set it the right way. So shared buffers is the amount of RAM that is actually allocated to the shared buffers, which is the memory area. So it contains the pages that are either being modified or read. So let's say you are, you're running a query like select star from employee where ID equal to 10. So that record or tuple would be usually stored, would be stored in a page, which could be of size eight kilobyte if it's a default block size. And now that page, if it's not found in memory, which is shared buffers, it needs to be fetched from disk. So that page will be loaded from disk to memory. Likewise, if a write happens, right, if you update a record, or if you delete a record or if you insert, it's a write to a page. And that's nothing but dirtying a page or modifying a page. And even that write happens in shared buffers, right? So it uses the least recently used algorithm to flush these pages from the shared buffers area. So how to set it correctly? We understood that. Yes. I mean, like all the databases, you know, this is the amount of RAM that's allocated to postgres specific memory. And this is used for modified or read to store modified or read buffers. So an important point that you need to know is that postgres does not do direct IO, right? So what does it mean? Is it double caching? So let me, you need to follow my words so that you understand it correctly. And I could repeat if required. So let's say, I mean, we discussed about one of the examples that when you go ahead and do a select star from employee where ID equal to 10, that record or tuple will be in a page, which will either be on the disk, right, which of course will be on the disk. And if that page was previously loaded and it's still in shared buffers, right, automatically that need not be loaded from the disk again, right? So consider this scenario one. The page is already found in shared buffers. Then the user request will be served immediately. But if that page is not found in memory, the page that contains the tuple requested by the client is not found in memory, which is shared buffers, then postgres would request the operating system to get that page from the disk, right? So now the operating system would fetch that page and, you know, store it in the cache and from there it's copied to shared buffers. So in databases like Oracle or MySQL, it is the database that would automatically go ahead and fetch it from the data files and load it to, you know, I mean, in MySQL you have an ODB buffer pool, in Oracle you have database buffer cache, right, as part of SGA. So automatically those databases take care of that, but postgres would rather request the operating system to fetch that page from the disk. And OS would fetch it to memory and from there it will be copied to shared buffers, right? So for this reason, you rather give more preference to the OS cache. Why? Because let's say you have a database of size 100 GB and you have a RAM of size 64 GB and all your hot data, when I say hot data or active data set, the data that on which all your application connections or the transactions are running, right? So that data set is more than the total RAM, like 64 GB, let's say. Now, if you go ahead and set shared buffers to, let's say, you know, some 32 GB, 50% of 64 GB which is the available total RAM. So you have a RAM of size 64 GB and you set your shared buffers to 50%, nothing but 32 GB. When you set it to 50%, so that 32 GB can accommodate some other pages, but the likelihood of the same pages being in the operating system as well is pretty high. So you're not allowing the database server to cache more. Rather, because of this double caching, if you go ahead and reduce the amount of shared buffer size, right, is, I mean, rather you reduce the amount of shared buffer size to somewhere around like, let's say, 25% of 64 GB, right, which is nothing but 16 GB, then you're giving more room to the operating system to store more, right? So cache more. So that way, you're letting the database server cache more, and that's actually an advantage, right? So you're letting more number of pages be cached. So for that reason, earlier we used to see a magic number like 8 GB, or sometimes people say 25% of total RAM needs to be assigned to shared buffers, which is good, right? That's the place to start. However, you also need to make sure that you do your benchmarking well for your application load and even your performance testing because sometimes even 80% or 75% of the total RAM could also do good, because if your active data set can fit all that area, right? Let's say you set, you have a RAM of size, let's say 64 GB again, and you are able to set your shared buffers to 48 gigs, considering the activity, you know, that's happening in your database, no sort memory required, and all that is great, right? You may be able to get a great performance with that as well, right? Because your active data set is able to fit in that shared buffer area. So you may really get a great performance, but you need to do that testing. So do not just go with the assumptions that 25% of RAM or 8 GB magic number is the only value to start with. You may do your benchmarking and get up to a bigger number. In order to see what's in your shared buffers, you could use the extension PG buffer cache, and that's available as part of the contract module. So that would tell you what amount, I mean, how much amount of a table is actually stored in shared buffers. You also have an extension called PG stat statements that shows how many blocks have been fetched from memory, which is blocks hit, and how many blocks have been read from disk or OS, right? So that way, you know, what is the percentage of the cache hit, right? So you also have PG stat IO views, which actually gives you more information. So, you know, for that reason, like, you could, you could, of course, I mean, first go ahead and use PG stat statements, which is, of course, one of the default extensions observed in most of the production environments, because that exposes a lot of information, but that does expose great information about what is that SQL that's taking more time to complete, and at the same time, more amount of pages are fetched from disk rather than memory. So you may need to tune that, or you may need to set session level, some settings in order to see if there is something that you could tune or yeah. So yeah, these are the PG stat IO views that I mentioned. So you could also use these views to see what are those tables that are mostly fetched, mostly cached or indexed or serve mostly through the cache. You could always run explain analyze buffers and the query, right? And when you run that, that actually tells you how much amount of, you know, that actually gives you block level information and even the cost at each node. So that would also help you determine the shed buffers in some cases. So the mistaken parameter to workman. So you may, we may have heard about workman a lot, but if it's new for you, I would, of course, explain it. So this is the amount of memory, which is allocated for each sort operation, like when you do simple, I mean, in simple terms, when you do an order by or a distinct, it has to perform a sort and even merge joins and when hash braced processing is, you know, required. So it defaults to four megabytes, a query could do more than one sort, right? There could be parallel sort of operation. So each sort gets so much of RAM allocated, right? So if it's a complex query, it may require parallel sort, but how to set it correctly is four MB fine, because in some of the environments, you may increase your work mem and you see that the query is performing more faster. Why? Because you need to understand whether you give preference to more temp files or memory, because anything that's written to disk is very, very less in performance when compared to, you know, the sorting that's happening in memory, right? So for that reason, if you rather allocate more memory, you would be able to go ahead and get a great performance. However, can you go ahead and set a default? I mean, can you set a global work mem value to 100 megabytes or 500 megabytes? Can you do that? When you do that, each sort by each query in so many parallel connections, right? And if it's doing parallel sorts, understand how much of memory it could, you know, go ahead and utilize because it is the default memory that's assigned automatically to every sort, whatever value you set. If you set it to 100 MB, each sort is assigned 100 MB. So for that reason, you could go ahead and set session level as well. So only for that session. And in order to determine whether a session level parameter for that query could be helpful, you could run again, explain, analyze buffers query, and you see if there is some writes that are happening to the disk for sorting, and you could increase your work mem accordingly. So session level work mem allocates so much RAM only for that session. So basically this would avoid out of memory like situations, right? So one of the hint that I would say is if you have a, if you know how many active concurrent connections could be established during a peak, so number of active concurrent connections during peak times average sort operations for each connection, right? Because I said a query could also do parallel sorts, right? Times work mem, whatever value you set. It should be much, much lesser than total RAM minus the shared buffers that you have assigned and maintenance work mem times auto vacuum max workers. So I'll explain maintenance work mem and auto vacuum max workers in a bit. But you know, when you, when you go through all those parameters as well, this formula will be very, very clear for you. Effective cache size. So this is the parameter that's only used for estimation purposes by the planner. And it's an assumption on the available cache for each query. So the default is four gig, right? But when you set it to truly such a low value, optimizer would automatically prefer sequence scans or index scans. So that could be one of the reasons why you see, you know, queries automatically showing like in the execution plan, it shows a sequence plan. And it could rather use an index, go ahead and see if your effective cache size is set to a bigger value, right? If you have faster disks and indexes are always cast, you could set it to 70% or even more, right? And do remember that it does not reserve any memory. You're just telling the optimizer that you have so much of RAM available for caching. And that's only for estimate, that's only used for estimation purposes. So there is no harm in increasing this to a bigger percentage, right? So for GB, when you have more RAM in your database server, maybe very, very less. Auto vacuum. Again, as I discussed in the, in the first slide, this is the, I mean, this, this should never be set to off. It's default, it defaults to on. So this is the process that's responsible for spawning worker processes, which take care of either vacuum or analyze. Okay. So vacuum could also, I mean, a vacuum has got a lot of responsibilities. It's not just cleaning up the detopals, right? It's not just cleaning up the detopals. It's also like responsible for freezing the anxieties of tuples, right? Updating the free space map. And, you know, it even updates statistics of table that are used by planner, the analyze, right? And how to set it correctly, just never set it to off, unless you know why you're doing that, right? Unless you know the complications, just let the default on be, you know, over there. So it requires you to write your own manual vacuum logic, which is of course difficult to hack, unless you can, right? Try manual vacuuming instead of setting it to off. Track counts. Again, it defaults to on, right? So we discussed some time back that auto vacuum is responsible for running a vacuum or analyze on tables. However, in order to determine what table should be the next in the queue for a vacuum or an analyze, so that a worker process can be assigned to it, auto vacuum depends on the statistics, such as how many inserts, updates, and deletes have happened. And that's only tracked when track counts are set to on. So never set this to off, right? It disables some of the functionalities of auto vacuum. Auto vacuum, scale factor and vacuum threshold. Now we are getting into the internals of vacuum. And I hope you would really understand some of the important concepts here. So auto vacuum, vacuum scale factor, auto vacuum, vacuum threshold. So when does a vacuum and auto vacuum vacuum start on a table? It starts on a table when the following formula is satisfied. In that case, the table falls into the queue. So like we discussed, when track counts is set to on, updates and deletes and inserts are tracked. So the number of updates and deletes, if that is equal to, if the number of updates and deletes on a table is equal to, auto vacuum, vacuum scale factor times total number of tuples plus auto vacuum, vacuum threshold. If this is successful, in that case, right, you know, the table falls into the queue. So vacuum scale factor is usually set to 0.2 by default, which means 20% of the table records, right? So it's nothing but the fraction of the table records that will be added to the formula. And likewise, vacuum threshold, it defaults to 50. The minimum number of obsolete records needed to trigger an auto vacuum. So for example, if your vacuum scale factor is set to 0.2, and if the total number of tuples is 1000 and vacuum threshold is set to 50, then 0.2 times 1000 plus 50, which is 250. For every 250 updates and deletes, this table will become a candidate for auto vacuum vacuum, right? Any table with, you know, which has got 1000 tuples, if it's got 250 updates and deletes, it'll be a candidate for an auto vacuum vacuum. Likewise, auto vacuum analyze scale factor and analyze threshold. So what do these parameter means again? An auto vacuum analyze, which is required to update the statistics of a table, because these statistics are used by the optimizer in order to prepare the best execution plan, which is very low in cost, right? So for that purpose, it has to use the statistics. And table statistics should be updated very often. And when does that happen? When does a table fall into the auto vacuum queue for an analyze? And that happens when the total number of inserts plus updates plus deletes on a table is equal to analyze scale factor times total number of tuples plus analyze threshold. So what is the scale factor again? Similar to the vacuum scale factor, but it's set to 0.1 by default. You could change it to 0.01. I mean, I see it, you know, people changing it, making modifications. We'll see that in a while, but 10% of table records by default. And threshold again is set to 50, like auto vacuum vacuum threshold, even auto vacuum analyze threshold is set to 50. So if auto vacuum analyze scale factor is 0.1, and total number of tuples is 1000, and threshold is 50, for every 150 inserts plus updates plus deletes, we see the table as a candidate for an auto vacuum analyze. So this is, this is how a table falls into the queue for vacuum or analyze. Now, let's actually get into much more important settings. So we see that these parameters are actually global settings. So are these global settings good enough for us, right? How to set it correctly? So global settings alone may not be appropriate because it ignores the size of the table, right? Even if in case a table is very tiny, or very huge, the same settings are applicable. So consider two tables with 10 records and a million records. The frequency at which a vacuum or an analyze happens would be much more for 10 records, right? Because 0.2 times 10 plus 50 when compared with 0.2 times million plus 50. It takes a while for the table with million records to fall into the queue. But the table with 10 records on that table, a vacuum keep on happening, right? So in that case, we need to go ahead and set table level auto vacuum settings for selected tables, right? You could also go ahead and say, yeah, I rather do not worry about the scale factor just for that table. I rather set a vacuum threshold. So more than 1000, more than 100 or more than 10,000 updates and deletes, let a vacuum run on this table, right? So you need to rely on table level auto vacuum settings in that case. And now we spoke about auto vacuum, vacuum and analyze tables falling into the queue. But how many auto vacuums can happen and can run at the same time? It depends on the parameter auto vacuum max workers. It's set to three by default. So they cannot be more than these many auto vacuums that are running by default, right? I mean, automatically by Postgres. So now let's talk about auto vacuum IO because and also talk about auto vacuum max workers much more in detail, right? So auto vacuum IO parameters. So what about the cost of auto vacuum? So consider that we are talking about the default block size. So auto vacuum reads eight kilobytes pages by default, right? Because if the block size is set to 8kb from a table, you know, and it modifies or writes to those pages, those contains the detopoles, right? So basically auto vacuum also has to perform a write on a page, which could be of size 8kb, which is the default. So it involves both read as well as write IO, right? So now let's look at some of the auto vacuum IO parameters, right? Auto vacuum, vacuum cost limit. This is the total cost limit the auto vacuum could reach combined by all auto vacuum jobs. So we discussed that there are three auto vacuum workers by default. So combined by all the auto vacuum, what is the unit, right? What is the cost that it could actually reach? So that limit is set by this parameter. Likewise, if the cost limit is reached, how much time should the auto vacuum sleep? And that is decided by vacuum cost delay. And we discussed about auto vacuum performing both read and write IO. So what is the cost of auto vacuum reading a page that's already in shared buffers? It is vacuum cost page hit. And the cost of fetching a page that's not in shared buffers is vacuum cost page miss. And the cost of writing to a page when detopoles are found, it is a vacuum cost page dirty. So now look at the default values for the auto vacuum IO parameters. So the default auto vacuum, vacuum cost limit is minus one, which means it defaults to vacuum cost limit, which is 200 units. And the cost delay, when this 200 cost is reached, auto vacuum sleeps for 20 milliseconds and wakes up. And the cost page hit. So the cost of reading the page from shared buffers is one. When not in shared buffers, the cost of reading is 10. And the cost of modifying the page when detopoles are found by auto vacuum is 20. So what can happen in one second? One second is equal to 1000 milliseconds. So how much time can auto vacuum sleep considering the best disk, right? 20 milliseconds. So it's nothing but when there is a really no latency, 50 times 20 milliseconds, 50 times 20 is nothing but 1000. So an auto vacuum can wake up and go for sleep like 50 times, right? So it's nothing but 50 times auto vacuum, vacuum cost delay. Now talk about the read IO and write IO limitations. If all the pages with detopoles are found in shared buffers, in every wake up 200 pages can be read, right? So in one second, 50 times 200 by vacuum cost page hit. So if all the blocks or the pages are found in shared buffers, the amount of IO, read IO, auto vacuum could do could be around 78.13 megabytes per second. And if the pages are not found in shared buffers, it's 200 divided by vacuum cost page miss, which is 10. So auto vacuum could read 7.81 megabytes per second. And in order to write, considering vacuum cost page dirty set to 20, at the most auto vacuum can write our dirty 3.9 megabytes per second. So all these three default auto vacuum workers in parallel together could read 78.13 megabytes per second or 7.81 megabytes per second or could write to 3.9 megabytes per second. So now it's just increasing auto vacuum max workers help us because most of the times we go ahead and increase that to five or six or eight so that more auto vacuums could run. But that could damage a lot of things because all that cost is shared, right? So now you also need to go ahead and increase the cost limit and consider you need to consider that you also have you need to have a faster disk and you are able to allocate some portion of the disk just for auto vacuum and you need to choose a disk that has got that could do more IO, right? And likewise, you could also go ahead and look at the total number of databases that you have and also increase the vacuum cost limit considering all that we discussed. And also when you go ahead and increase the auto vacuum max workers, it requires increasing the cost limit like, you know, in the same proportion. And so if you increase the cost limit to 2000, it means 781 megabytes of reads per second approximately from shared buffers, right? So you need to consider these approximations at least in order to go ahead and see what are those auto vacuum vacuum settings that you could really put in order to find you in your database. And all these IOPS that you see overall are all shared across all the auto vacuum processes, but it's not per auto vacuum. And finally, maintenance work, ma'am. So this is the amount of memory which is used by maintenance operations such as create index, alter table at foreign key, especially when you're dumping, I mean, you have taken a logical database dump using PG dump and you wanted to restore it using PG restore. Now it also has to create indexes, right? So that restore time can be improved when you increase your maintenance workman, right? So likewise, even auto vacuum, when auto vacuum max, I mean, workman is set to minus one, it defaults to maintenance workman amount of memory for auto vacuum. However, it doesn't mean that you could use all of that, right? There is a hard limit within the source. I mean, some time back, I mean, while I was discussing with the community members like I heard it as one GB, but I definitely need to go and check it. But consider that, you know, this is the limit or this is the amount of memory that you're also assigning to auto vacuum when your auto vacuum workman is left to minus one. So you rather set auto vacuum workman to 512 megabytes, let's say, as an example, and you're limiting the memory of auto vacuum worker process to so much, right? Else, when it's set to minus one, it automatically, I mean, auto vacuum workman automatically defaults to maintenance workman. So you need to set both appropriately. And also, like in the beginning, we discussed number of concurrent maintenance operations during peak times maintenance workman, right? So it could be indexed like maintenance operations could be re-indexed or, you know, similar operations which actually use maintenance workman. And that should be much lesser than the total RAM minus shared buffers and the workman times max concurrent connections times n, which is the parallel, right? Like n is the average parallel sort operations for each connection. So you could go over these slides again, these will be shared. So it makes sense when you actually digest it. So we'll just discuss very few ignored parameters as well, random page cost. What does this parameter mean? It's a frequently ignored setting. It sets the cost of a non sequentially fetched disk page, especially indexes are considered to be non sequentially fetched disk page. So the default cost is set to four, while sequence page cost is set to one. So it affects the planner's decision because your query planner, the optimizer could assume that the cost of using an index could be more, right? And recently I've seen even one of the situations where just setting random page cost to, you know, almost equal to the sequence page cost has actually solved a lot of issues because almost all the queries were getting into so many infinite loops and doing a sequence scan instead of an index scan. An index is also cached in memory, right? So because it's more tiny when compared to the whole table. So an index may be less frequently preferred by the planner when this value is higher. This could be, you know, set to four or such a huge value when you have a very, very slow disk, especially the, you know, like the old server, the old servers, like hard disk, but not the latest solid state disk or flash disk that you see in general in the database service today. So make sure that you do not ignore this and go ahead and see the value that it's set to. And also check PG stat user tables and see if there are more sequence scans over index scans for all the tables. And see if that's a, if that's, you know, if that really makes sense or no, I mean, in that case, like, you need to check this parameter. This could be doing all that damage because nowadays SSDs are default, right? So it could help if it's set to same as the sequence page cost. Wall compression. While doing benchmark, right, when we set it to on, the wall compression to on, it wrote half amount of wall segments to the disk, especially when you're storing the wall segments on the same disk as the data directory, especially maybe on AWS, right, when you deploy your servers or any cloud for that matter, you maybe if you use the same disk for storing the writer head logs as well as the data, you're writing a lot more. And if your IOPS is more, write IOPS is more, you need to go ahead and see if wall compression can be set to on at the cost of CPU. So it compresses a full page image written to a wall, right? So we'll talk about full page in the next parameter. But so yeah, when a checkpoint happens, when a checkpoint happens, after the checkpoint, every first modification to that page will be a full page write to the writer head log, right? And every subsequent page will be just good. You could simply consider it as a change vector or a world record, but it will not be a full page write. So these full pages, these full page writes will be compressed, right? So how to set it correctly? So storing the full page image guarantees that the page can be correctly restored. So you should not be disabling the full page writes, unless you know why you're doing that, and you're confident about it. So for that reason, you could set it to on so that it reduces the IO at an expense of CPU. And you also need to see if your replica is catching up, your streaming replica, the standby is catching up, because it has to uncompress and apply the wall segment. So in one of the benchmarks, we did 3,000, I mean, 3.3 terabyte gigs of walls for 10 hours, decreased to 1.9 terabytes, right? When it's set to on, it could help. Checkpoint timeout, right? So it defaults to five minutes. So checkpoint timeouts defaults to five minutes. A checkpoint is forced every checkpoint timeout amount of time, right? And frequent checkpointing, we might have heard about this a lot of times seen at several places that on blogs or everywhere, discussions that frequent checkpointing could be a problem. Why? Because after the checkpoint, every first change to the page will be a full page write, right? So so many full page writes could happen if you decrease the checkpoint timeout to such a very less, you know, time like five minutes. For that reason, let's say you set it 30 minutes instead. Now, if multiple changes have happened to the same page, it will not be so many full page writes now to the right ahead log, right? You're decreasing the number of full page writes. So that way you're also reducing the number of writes. So even more volume of data is written to the wall logs, right, when checkpoint intervals are less. So how to set it correctly? You could set it 15 minutes or 30 minutes or one hour. It depends on your workload. You need to test it. But when you set your checkpoint timeout to a bigger value, it could increase the crash recovery time, especially when your database server is crashed. Because now it has to apply all the committed transactions for a larger interval in order to bring the database to a consistent state, right? And a shutdown may take more time, of course, because now it has to make sure that all the, I mean, so many wall files need to be synced, which are for an amount of 15 minutes or 30 minutes or one hour. You could run a manual checkpoint too, right? And issue the shutdown, which could be more faster. Checkpoint completion target. It's set to 0.5. So you are spreading the checkpoint across checkpoint completion target times checkpoint timeout amount of time. So if it's set to 0.5 and let's say your checkpoint timeout is set to 30. So when your checkpoint completion target is set to 0.5 and checkpoint timeout is set to 30 minutes, then checkpoint has got 15 minutes to apply, you know, all those work or finish all the work that it has to do. You could rather increase that and let the checkpoint spread. Could increase it to 0.8, 0.9. So you rather let that happen over 27 minutes. So it'll be kind of a lazy, right? A work in the back end and it would be less IO intensive when you give it a bigger window, right? And finally, max wall size. So it is, I mean, it's not just the checkpoint timeout of 15 minutes or 30 minutes or one hour or whatever time. A checkpoint can also be issued when max wall size amount of walls is about to reach, right? Which defaults to one gig. So, you know, whichever comes first, checkpoint timeout or max wall size. So how to set it correctly again? So you could go ahead and set checkpoint warning, right? You could enable checkpoint warning to a much lesser value than checkpoint timeout. Let's say your checkpoint timeout is set to 30 minutes. You could set checkpoint warning of maybe 25 minutes. And if a checkpoint is happening, like, you know, in every 25 minutes, right? Automatically, it'll be a warning information will be locked to the lock file. And if that's happening often, you know that your max wall size is actually not helping you. You need to increase that. Or you also need to monitor how much of walls are really happening to the disk, right? And also, though this is not a requirement for you to set max wall size to a bigger value, like distributing the wall segments, you need to also make sure that always try, I mean, it's not like making sure, but you always try to distribute your wall segments to a different disk. So that is to distribute the IOPS, right? Wherever you have an opportunity to do that, so that you're not writing wall segments to the same disk as the data directory, right? And frequent checkpoint indicates less max wall size. So what else should we be looking at? You also look at OS metrics, like, is your PG wall, like the writer head log directory busy, or a lot of logging information is being written, which is actually using a lot of IOPS on the same disk. The throttle CPU, cashier drops, swapping. So you see all of these OS metrics. And at the same time, you also need to go ahead and see if your app is using a connection puller, especially when it's going beyond a certain very, I mean, a few hundreds of transactions per second concurrently. And also, you need to see if your cashier and active dataset could fit in your memory and see if shared buffers can be adjusted, or if you need to point tune, or, you know, if your server is under, I mean, if your server is downsized, or if you need to upgrade your server, and also need to see sequence scan versus index scan and see what are those queries that you need to tune. Also, check what are those indexes that are unused or never scanned so that you could tune them. I mean, you could go ahead and drop those indexes because writing to those unused indexes is again a performance bottleneck, too, because it adds some overhead, including the amount of data that's well-locked. Also, look at bloated tables and see how you could take that bloat away so that the tables are always, you know, healthy. And your queries are running over a limited amount of a dataset. And partitioning bigger tables is one of the important aspect. You could use the latest declarative partitioning features and see if you could move tables between tablespaces, which are spanned across different disks, so that you could distribute the IOPS, right? So all of these really help you, other than just the tuning. So now I would definitely go ahead and go through the questions just to make sure that, yeah, so I see some of the questions, right? So, yeah, set buffer preferences opposite from MySQL or even the way Oracle works. I mean, it's a little different, I wouldn't say it's totally opposite. So how much tuning is needed for Postgres hosted on RDS? I think there is a way to choose your RDS instance like a standard one or, you know, or tuned RDS instance. So it depends on what RDS instance you tune, but RDS does come up with some tuned settings, but it's not that those settings will be perfectly well for every workload. You may have to, again, relook into that because nothing can be so much automated, right? And, but maybe it could be good too. So you need to go ahead and see that still because you need to know how you have created the RDS instance. Is it safe to turn on PGSTAT statements at very high loaded DB? Like, yes. So PGSTAT statements does not create so many entries for each query. So there is a limited amount of like how many queries and what is the limit of queries that you want to have on PGSTAT statements. And it's just a counter like until you do a reset. So I mean, I've seen PGSTAT statements extension enabled on truly critical production database with thousands of TPS. However, you could test it again against a performance instance, enable that extension. It does require a restart, but usually it should not cause any damage, but you could test that and implement. How to check the OS cache? Is that possible in some case? Yeah. So talking about OS cache, there are some, you know, in the operating system level, you could use, you know, some of the third party tools that you could even download for free. Or even in the OS level, you could see what are those that are cast in memory. So something, once in order to present and show that to make it easy, like I've used something like VM touch, I think. Yeah. So in PG11 or 12, the allocation of workman is dynamic or static. Okay. So it is dynamic. Yeah, you could also set a session level, just a set workman to, you know, you could, you could set workman to 512 megabyte, one GB just for that session. And once you disconnect that, it's gone. So it's dynamic, right? I mean, it can be good for your session. So if I have one GB set and have 100 parallel sessions, yeah. So if you have one GB of workman set globally, that's a damage. You may need 100 times one GB or 100 times one GB times N, if in case every session is performing parallel sorts, right? Like queries in those sessions are doing parallel sorts. So you may go, I mean, your server may go out of memory. So one GB, I mean, it depends on how much RAM you've got. Of course, you may have got even more than 700 gigs RAM if you've gone with an R5 metal server, let's say, right? 768 GB of RAM is what you get. So you could do that. But you need to be very careful when you set a global one GB, right? It rather needs to be a session level. And you need to know why you need to set one GB, right? You need to see how you can avoid that, how you could limit that. Or if it's just for something really specific, right? So, yeah, okay. I think, yeah, Bruce, thank you, Bruce. You did respond to that. And yes, shared buffers is always allocated in full at a server start. So Tejas has got a question that he faced an issue. Auto vacuum was running every 30 minutes on a table was restarting every 30 minutes to run a manual vacuum for that to stop. Is that a setting that should be tweaked to let auto vacuum not quit the unfinished table? Okay. So that's, okay, auto vacuum. I mean, I'm not sure of the version of the Postgres version that you've got, or maybe if there is any such background job that's scheduled in your environment, which is identifying tables with certain amount of vector pulls and automatically kicking off a vacuum on them, that could be one of the causes, but Postgres, you know, does like, you know, skip. I mean, it does skip the pages that it can't lock. And it again depends on the Postgres version that you're working on. So, yeah, we're on PG 11. We saw that our IO was consistently high. You're enduring low usage. Okay. The query start for that table kept changing every 30 minutes. Okay. It was present the whole day. So, yeah, so it looks like we could talk about, you know, certain specific issues that way that you're getting into, so that we can keep this conversation just generic for the moment. However, also look at all the settings that we discussed. If you think that the auto vacuum is actually causing some IO, just see if in case any of the auto vacuum IO level settings are aggressively set to high values too. I've seen that happening. Right. And about the indexes, yes, you can use concurrently. Can the index, can the indexes be vacuumed? Yeah. When you go ahead and create index concurrently, right? I mean, that's like recreating the entire index. You don't need to vacuum that again. And maybe some maintenance schedule for that table. Yeah, you could, you could schedule recreate. I mean, you don't have to recreate indexes every 30 minutes, unless there is a specific reason why you're doing that. So auto vacuum, is auto vacuum using a database that has got no deletes updates or inserts? I mean, yeah. I mean, it could be inactive. I mean, it could be on, but if there are no transactions that are running in the database, of course, I mean, it has got no work to do, unless if there are any transactions being created and transaction IDs counter-increasing, and maybe if it has to run some freeze, aggressive freezing, maybe. But yeah, I don't see a reason. Yeah. So I think all of the questions are answered. And at the same time, you could always reach out to me on LinkedIn. So Lindsay, over to you. Absolutely. Great job, Avi. Thank you so much for spending the time with us today. To all our attendees, thank you so much for spending a piece of your day with us. And we hope to see you on future webinars. Have a great day.