 All right, thank you, Brad, so we're here to talk about identifying slow queries and fixing them So if you're here to talk about something else, sorry We're not gonna let you out anymore. Just FYI And doors have closed. All right, so first off I'm Stephen Frost as many of you probably know CTO at crunchy data commit or major contributor do lots of stuff and whatnot So when I'm not here giving talks and other stuff I tend to be hacking on postgres or helping out our clients with whatever So I just wanted to hit on the quick community slide I like to always put this out there follow planet postgres. It's very cool lots of really good information Lots of things about slow queries, right? So if you're interested in you know, what other people are doing to solve their Challenges with certain types of queries and what not a lot of that ends up on planet. The other one is go join postgres US So postgres US is now very straightforward to go through and join so I recommend everybody do so All right, so first of all we're gonna talk about the different ways of finding slow queries, right? There's three kind of main ways of doing that First thing you have to do is enable logging and then you can go look at your logs That's kind of painful and time-consuming. So you want to use something to do log analysis. Great one is PG badger We'll talk a little bit about that and then there's a way to nowadays to view active queries inside of PG stat statement So we'll talk about what PG stat statements is and go through some of the different Fields and whatnot that are available through PG stat statements And after we do all of the finding of slow queries We'll go through and talk about exactly, you know, what queries to look out for and how you can improve those queries So the first thing that you want to care about a great deal is logging, right? These are the kinds of logging statements that you're gonna Log in configuration options that you're gonna want to be changing inside of your postgres config in order to get Logging set up so I'm gonna kind of go through each one of these Some of them all hit pretty quickly if you do have any questions feel free to ask as we go through a lot of these are pretty self-explanatory, but Definitely ask questions as we go through So the first one is log min duration statement. This is kind of like your your key one, right? So this is the statement. This is the logging parameter that allows you to log queries that have taken over a certain amount of time to run Right, so if you put it to zero then it logs every statement This is a lot better than using log statement equal all and log duration And the reason for that is because if you use log statement on log duration You end up with the duration in the SQL statement on two independent lines in the log file. You don't want that that makes analysis a real pain in the butt Right log min duration statement puts them on the same Actual line as I show here So you can see the duration followed by the actual statement one thing you'll realize with this of course is that the Duration isn't there until after this, you know and the statement isn't there until after the statement has completed Right, so the statement runs to completion So it won't show you a currently running query It's not until that query finishes and we can put out what the duration was along with the actual statement that you'll see it one of the other things just to note about this one is that in some environments particularly really you know Transactional very fast lots and lots of queries coming in you do went into a difficulty here, which is that if You set log min duration statement to zero You can just basically end up with a big bottleneck right because all that traffic has to go through the login collector And that can end up taking a lot of time and slowing down the overall system So you don't necessarily want to set it to zero But if you don't set it low enough then you aren't going to be seeing those queries that take longer than you want Right, so that's where PG stat statements in particular can be really handy and we'll talk more about that later But I just wanted to hit on that as something to you know Don't don't rush out to your you know million transaction a second Database and set this to zero because you're going to end up slowing the whole system down and it can be and end up being a problem All right, the next one is log line prefix So our default log line prefix is absolutely horrible doesn't give you any information that you need So this is what I tend to recommend as a log line prefix and is also something that tools like PG Badger are expecting And they work pretty well with this particular one And this is actually also in the docks for PG Badger, so you can certainly go look at it there I'm not going to hit on all of these different things that it includes But it you know the important things are things like timestamp, right and also the process ID So you know which process it is that this is associated with because obviously in a busy system You're gonna have lots of independent postgres backend processes that are running each one of them is going to be logging into the same file And you want to be able to differentiate them from each other And here you can see an example at the bottom of what that ends up looking like pretty straightforward So checkpoints something that people don't often realize when it comes to postgres performance is checkpoints can have an impact on your performance Particularly if you're checkpointing very very frequently because you're running out of x log segment or wall segments Right, so this is something that you want to enable is log checkpoints So you can see when checkpoints are happening how long they're taking why they're happening Really important to consider that whenever you're looking at the performance of a particular postgres instance What's really helpful here is that you also get a bunch of statistics regarding what happened during the checkpoint Like how long it took to run certain operations like how it long it took to sync all the different files How long and we spent actually writing things out to the kernel? Right, so writing things to the kernel is one thing syncing them as another right So syncing them is actually getting them out onto the reliable storage One of the things you want to look out for here is you don't want checkpoint starting because of wall or because of that running out of X log right you want checkpoints to be happening during due to time And so you'll see here. It says checkpoint starting immediate force weight That's just because I ran a checkpoint command, which is fine. It works whatever For illustration, but what you really want to see is checkpoint starting and then you want to see it Say time right you want it to be based on the timing rather than based on the you know anything else really Connection logging is pretty straightforward obviously you want to know when connection started and stopped another thing You want to care about a great deal when it comes to Postgres performance in general is just making sure that you don't have like constant Reconnections because postgres ends up being kind of slow when it comes to that frankly It takes a lot of effort to set up a new connection So you want to be using some kind of connection poor in an ideal world or have a really good application that bathes very well When it comes to minimizing the number of reconnects that it does And this also has you know some information about What each log what happened to the log connection? Anyway, log lock weights. This is also really important right so queries can end up being slow because of locking Right log lock weights is really really cool when it comes to this because after the deadlock timeout fires So whenever a query runs and acquires a lock, right? If we have waited on that lock for more than a second, which is what's set by deadlock timeout We're going to go run a deadlock detection routine that's going to basically say okay Is there any deadlock in the system? Right because we want to identify any deadlock that exists and if a deadlock exists Then we're going to end up having to do something to resolve the deadlock But the big thing here is that when that deadlock timeout fires We also can log all of the information about what the lock is that the current process is holding or that it's waiting on Right and who it's waiting on so this becomes very very helpful going through and identifying Okay, these are the lock weights. These are the processes that are involved in this particular locking situation So one of the problems you run into with queries is that they end up blocking you can go look at your Postgres log to see if it's if a query ends up blocking for some reason and this goes all the way down to row level locks And things that are blocking due to those this will not tell you information about Internal locks inside of postgres, but those are you know should be very very short held locks You shouldn't even have to worry about them in general only on very very large highly transactional systems Where you've got lots and lots of different processes running would you possibly have contention over those locks? Talking about things like the extension lock and things like that Temp files right another big one is logging of your temp files So log temp files equals zero means that every time a temp file is created We're going to log it right this makes a big difference because when you have queries that are running for a long period of time Oftentimes those can end up being cases where temp files are getting created because it's doing some kind of big on-disk sort Right as a great example And that can be an indication that maybe you need to change a parameter like workmen that we'll talk about later in order to Not use up all these different log files or temp files rather Because using temp files means that we're writing stuff out to the kernel And that's a lot more work than just writing something into memory somewhere log auto vacuum induration This one is about logging auto vacuum activity. So sometimes what you can run into is Something's happening on the system and it's vacuum running right and vacuum sucking up resources making the rest of the system slow down Potentially you don't want to disable auto vacuum, right? I strongly recommend against disabling auto vacuum. What you want to do is tune auto vacuum so that it is not impactful on your environment Right and on your ongoing front-end queries So the way you can identify whether it's having an impact is to log it right see what it's doing See how long it takes to do things and then you can start stepping things back as Needed so there's some tuning parameters that you can set for auto vacuum and for vacuum to say don't go so fast Right slow down And you can tune it so that the auto vacuum takes some amount of time But that's okay because it's just needs to scan through the table This got greatly improved lately in very recent versions of Postgres because we have the ability with the visibility map to have In 9.6 and up We're gonna have a bit in there that basically says we don't need to go vacuum a certain page because it's all frozen So that's gotten a great deal improved in 9.6 and above All right, so you want to you know, let's start talking about PG Badger so PG Badger is very straightforward right all you have to do is install it Once it's installed you can run it on a particular log file and you can get some Pretty fancy reports out of it, which is very straightforward. I think at this point. I had a That's not This is what I get I was gonna pull up a website to show you the reports But I don't think it's gonna work here because of the setup that I've got that just changed over from So tell you what I think what we're gonna do We're gonna step past that and we're gonna start talking about PG Stat statements and I'll come back to it So PG Badger basically what it does though just to give everybody understanding is that it scans for your log file and generates an HTML file Right very straightforward So you have a nice HTML file that you can go read through and understand how it's working and what it's doing So I guess I'll come back to that one though and it'll look and show you what it looks like So PG Stat statements though This is something that people should really care about because it's really really neat When you in order to install it you have to set it up in shared preload libraries So you set that up using to set shared preload libraries to PG Stat statements and Then restart you can't just reload you have to actually restart postgres And then after you've done that you go into whichever database you want to be setting up PG Stat statements for and run This statement create extension PG Stat statements Right and what that will do is that'll create the extension and it'll create a View called PG Stat statements inside of your current working database that has all of this really useful information So what this is going to do is for every query that's run We're going to actually be able to see the user that ran at which database a query identifier Right. This is not something that you should necessarily like Assume is going to always be the same across multiple versions of postgres Generally speaking, I don't think we've actually changed it yet It should be stable across a given major version of postgres, but it could change between major versions But I don't think it has yet, but be forewarned that that could happen So whether you want to depend on it or how you want to depend on it is up to you But something to consider What it does allow you to do though is that because you have that query identifier You can you know do roll-ups based off of that right and look at a given query and do Tracking over time of how a query's performance changes So you also included in here is the actual query text itself It's actually like a normalized structure for it Which is one of the things that's really cool about PG Stat statements is that you'll see the query and any Constants that were inside of the query that may have come from your application are removed right and they're just it replaced with placeholders essentially which is what allows us to aggregate individual Query calls into a single row inside of the database So this calls record is every single time we called that particular query We're going to increment that number whether or not the constants have changed But so the constants can change as long as the query structure itself is the same It's going to be aggregated into this same row inside of PG Stat statements Obviously, we also have total time which is pretty important and you can take total time and divide it by calls to get your Overall average, but we also track the min max mean standard deviation, right and the number of rows that came back from the query Anybody have any Questions about any of that with regard to how PG Stat statements works It's a very very useful tool for identifying what queries are causing problems and being slow The other stuff that it tracks is Block hits right so shared block hits are cases where we went to go find look for a block And it happened to be in our shared buffer cache, right? So those are shared hits Local block hits are cases where it actually was a local some kind of local block for us, right temp file stuff like that And well, and they've got temp files in here as well explicitly And then we have block read time block write time which gives you the overall amount of time that we had to spend doing block reads and block writes Which is pretty I mean this is interesting information if you're looking at things on a block level Obviously you want to have your shared hits be up as much as you can because that means it's all in shared buffers Which is going to generally be faster, but you know, maybe it doesn't have to always be in shared buffers It just depends on what the query is doing if it's a very large query scanning a lot of stuff You don't necessarily want to have all of your shared buffers get blown out by it Like a large sequential scan which we have mechanisms in place to avoid that So here's an example So here you can see the query ID which like I said It's like a hash of the query itself and here you can see the query text and like I was saying there's placeholders So those question marks are the placeholders that are put in place instead of constants So those constants are gone. We don't have any constants in this actual query plan You just have placeholders instead and you can see what the calls are total time and all of that stuff So that's pretty straightforward You can see here that one of the big things that kind of stands out here is the max time, right? So the max time on this one was 142 which would have been should be milliseconds Right, so it's gonna be 142 mil which is quite a while right particularly for such as a simple query Right and what this indicates is that there was at some point something happening on the system that ended up being You know causing this particular query to take an excessively large amount of time What that's probably going to be due to is that this particular query ended up having to do a lot of work Like a lot of IO Because the background writer wasn't able to keep up And so we're actually having to go do modify this block and actually go push that block out and the Right-ahead log records associated with it out to disk ourselves, and that means we have to actually do that sink in the foreground where the actual Processes running instead of lighting the background processes handle it and that sink can end up being expensive Right particularly when the IO subsystem is very busy. So here's another one. This is a straight select So here you can see that the min time and max time and the meantime are all very Consistent and very small which is fantastic. We like this. This is the kind of thing We really like to see and it's you know part of which is because it's just a select query and it's a very simple one Right, so that's always very good. That's what we like to see All right Understanding why queries are slow. So Obviously postgres configuration issues can be part of it or you can have Possible bloat in your environment that tends to not be an issue unless you have bad query plans and we'll talk about that Because that's the other big thing that you can run into right or query plans that end up causing the system to be slow We're causing the query to be slow rather So first of all when it comes to postgres configuration These are kind of the principal configuration parameters that I would argue that you want to be thinking about when you're looking at query tuning Right, so how many people are familiar with all of these? All right, all right. We've got a few okay. How many people know what work mem is Okay, all right, that's better. That's better. So let's talk through a couple of these right so I'm not gonna hit on them too hard but What I want people to do is ideally ask questions if I you know, I'm going too fast If there's anything in here that doesn't make sense, please speak up and ask me So we can you know make sure everybody has an understanding of what's going on so Work mem is the amount of memory that postgres Believes that it can use on a per node basis inside of a query plan, right? So for example if postgres needs to do a sort Right, we just you know you asked for an order you did a select with an order by we have to sort the data Right postgres is going to use up to work mem amount of memory to perform that sort Okay, if the sort is going to require gigabytes of Space to do the sort right your work mem is not going to be set to gigabytes typically speaking and what's going to end up happening Is that we're going to do an on-disk sort right that tends to be very painful so you want to avoid really big sorts There's a lot of cases where maybe you want to sort you know a hundred or a thousand records or maybe ten thousand records though And you end up with this threshold right where work mem is set to four megabytes or sixteen megabytes Whatever you have your work mem configured for and what your sorting ends up being just a little bit bigger than that Right or always postgres thinks it's going to be a little bigger than that So it's kind of you know, so it's one of those cases where well We're going to end up doing some amount of disk IO right because we're going to end up putting pushing at least some of that out Into this because we can't fit it all into memory and that may be a case where you want to consider upping work mem instead Right and you can identify that through looking at your query logs Right your postgres logs with log temp tables right or log temp files You can see cases where those queries end up writing out things to temp files and you say okay Why is that query doing that should it be can I change that? One of the ways you can do that is to increase work mem up to a point where you no longer see that happening Now the downside of increasing work mem is that your queries use more memory Right if you have lots and lots of concurrent connections, and they're all using a lot of memory Then you run the box out of memory potentially that's not good either right? So there needs to be a little bit of tuning back and forth what I typically recommend people do is Set work mem up higher for specific queries that you know require more memory Right there's a particular query that look that you know you've analyzed that query And you know that it runs a lot faster by having you know 64 megabytes of work mem Then you can set that on the fly from you know just on that one connection You can just say set work mem equals 64 megabytes run the query and then do a reset work mem, right? And it'll reset it back to the default So you just have that adjustment for that one query So I'm gonna run through the rest of these pretty quick But maintenance work mem for example is just how much memory we use for building an index and other kinds of maintenance tasks effective cache size is how much of the database Postgres can Kind of think is in memory right so a lot of what postgres has to do is decide whether or not to do a sequential scan Or to do like an index scan right an index scan is all random access Right, so you don't necessarily want to do an index scan if you're going to be going through the entire contents of a table Right you want to do a sequential stand instead now There's a balance there of well how much of the table right well, okay? That balance is impacted by how much of the table is probably already in memory Right, and that's what effective cache size is going to impact right the larger the effective cache size the more likely Something's going to be in memory already And that it's going to make postgres think that a random access isn't as expensive Because there's a potential there that that random access will actually be hitting something that's in cash already Right now if it's in cash already, then it's not very expensive It gets expensive and we actually have to go all the way out to disk and pull it back Everybody following along So effective cache size is never actually allocated right so you can set that to you know Half the memory in the box right if your cache seems to generally be about half the memory the box is used for cash You can set it to that it's not going to hurt anything. We're not going to actually try to ever allocate that memory Shared buffers is another really important thing so shared buffers is how much memory postgres has for its own cache Right so effective cache size is more about the Linux kernels disk cache Shared buffers will actually be allocated by postgres and is how much memory postgres is able to use for its own disk Cache you don't want to set this too big you don't want to set it too small it Really take some tuning to figure out what the right answer is in an ideal world You really want to set shared buffers to a bit larger than what you would call your working set right now The working set is different for every platform for every application as to what your working set actually is And how big it is it requires some analysis to figure that out But that's where you want to set shared buffers generally speaking you can get away with setting it to a few gigs and it's fine For a lot of different platforms and applications Checkpoint segments and min max wall size all just reflect how much disk space postgres can use for the right-ahead log That's relatively straightforward checkpoint timeout is how frequently we do a checkpoint and Checkpoint completion target is what we want to try to do how quickly we want to complete that checkpoint So I'm not going to hit on those too much, but if you have questions feel free to ask All right dead bloat and tuples or dead tuples and bloat rather Another problem that you can run into is that you end up with a lot of bloat in your tables right bloat And the tables is bad the way that we address bloat is by doing a vacuum so we scan through all of the tables and Mark any areas in those tables that are now reusable right? That's the whole purpose of vacuum So that's why you want to make sure the vacuum and auto vacuum are running on your system because you don't want to end up without You know if you don't run vacuum then we never mark any space as reusable or even though you delete a whole bunch of records It's not reusable until a vacuum is run right once the vacuums run that space is reusable That's how you can end up with a lot of bloat in your Tables is the vacuum isn't being run frequently enough and you're doing a lot of updates and deletes remember every time you do a Update in postgres. It's an actually an insert and a delete. That's something else to be aware of Another thing is that indexes can be bloated too right indexes particularly get bloated when you end up with Random or very wide key space usage right you you IDs are a great example of a really bad thing for indexes Right because what ends up happening is that it's completely random data going into that index and it ends up spreading it out all over the place Right and if you're doing a lot of updates and inserts and deletes with you you IDs that ends up being pretty expensive and can lead to bloat in your indexes It's not something you necessarily want Well, I don't want it. I think it's pretty bad check postgres is pretty useful I definitely recommend using something like check postgres or check postgres to be able to do this kind of monitoring and analysis It's got some interesting tools for identifying bloat I think some of the other tools like PG stat tuple is a great one for actually getting real information So the check postgres check is kind of like a it's a guess like we think there might be bloat here Right be based on some statistical information that we gather. It's a very fast check, but it's a guess Right so use it to kind of identify places where you should go look to see if there's an issue But actually do things like a proper vacuum and look at the stats from that or something like PG stat tuple and go look at The stats from that to see how much actual bloat you have in your in your table and in your index is associated with it In order to actually get rid of all bloat you have to either cluster or do a vacuum full typically speaking There are some corner cases, but we're not even gonna get into that But generally speaking you want to if you really want to get rid of all bloat You have to do a cluster or a vacuum full Not all bloat is bad though Right bloat can be good because that means that you have space in the table where you can actually insert new records Right which are either coming from a regular insert or from an update and that's valuable That way you don't have to extend the table extending the table as expensive requires a heavier lock If you have a lot of independent processes trying to write into the same table that gets particularly bad because that lock can Have a lot of contention around it newer versions of postgres are better about that And are able to extend more than just 8k at a time Older versions of how to bigger issue with that But it's still a heavy year lock that you probably don't want to do unless you have to so some bloat is good Right at the end of the day All right, so now let's talk about how how things work in postgres, right? So Whenever you're running a query you have a couple of different ways of getting the data out, right? You can either sequentially scan through every record, right? That's called a sec scan operation, which is bulk, right? You can do what's called a skipping scan, right? Or a bitmap scan What this is is this is a case where we go build this bitmap of locations in the table that we want to go visit and Then we start at the beginning and everywhere that we said we wanted to visit a location We go there right we scan and then we'll have potentially large sections where we don't need to scan that data So we're gonna skip right we're just gonna seek out You know we're just gonna do a seek and go to the next point in the file or in the relation where we Think there's data that's of interest to us. So that's what a bitmap scan does And that's something that you can get if you have an index, right? You have to have an index in order to be able to do a bitmap scan if you have an index We're not gonna be able to do a bitmap scan We're gonna have to do a sequential scan throughout the entire table You can also use an index to pull just specific records out, right? That's an index scan All right, you can also have an in order index scan which actually pulls out a lot You know basically all of the records from the you know based on the ordering of the index and that can be very fast although you want to do that in cases where the index is a You only want to pull out a subset of the table that way You don't generally want to do that unless you want the entire table sorted You don't generally want to do that because it's a lot of random access, right? And random access tends to be expensive Especially if you're going out to disk One of the other things about an index is that an X and generally is going to require going between Accessing the index and then going back to the heap. There are special cases where we can do what's called an index only scan Right, so if all of the columns that you are pulling out of your query are in the index Right, so maybe you've gotten an index across three columns and that's all you're pulling back in your select query You're just selecting out those three individual columns. We can potentially do an index only scan which means we're just going to go to the index We're going to pull the values out of the index and then we're going to check what's called the visibility map to see if the visibility map says that Particular page that the index is pointing to is Visible to everybody so if all the records on that page are visible to everybody That means we don't have to actually go bother visiting the heap because we have all the data We need from the index and we know that the tuple has to be visible The way the visibility map is built is through a vacuum, right? So you want to be vacuuming your tables, right? If you don't vacuum your tables, you won't have a visibility map if you don't have a visibility map We can't do index only scans There's another part where you definitely want to have vacuum running And if you have things like insert only tables in particular You want to make sure that you're vacuuming those because you want those to have a visibility map for when you're querying them Right, but auto vacuum won't necessarily go think to go do that You got to be watching for that and make sure that you're actually getting those tables vacuumed as well All the information for when back when tables are vacuumed are available through the stat system You can look at PD stat tables to see that All right Now how do we put things together inside of a query, right? We use joins right so when you select from two tables and you do a join Right, that's where you're putting those two tables together. There's only a few different ways to do that You can do that with a nested loop a merge join or a hash join Right typically speaking these are the ones that you're gonna see and other the ones that are interesting But nested loop you end up going through You know you basically step through one side and then do some kind of lookup Like you might have an index lookup that you have in on the inside of it So for every record we're gonna go do an index lookup an index lookup Right typically nested loops are not ideal Right, you just depends maybe if you're pulling out 10 records or something fine, right? But if you're pulling out hundreds or thousands of records generally speaking a nested loop is kind of painful, right? It works good for small sets not for big sets a merge join basically looks at both relations and Sort them or uses an index to perform an in-order index reversal So that the data coming out is sorted and then it just walks through them right merging the data together, right pretty straightforward merge sort If it doesn't have if there's no index available and it has to do a sort that gets very expensive Right, so if you see a merge join with a sort happening underneath of it That can be an indication of a problem with your query or possibly that you don't have work mem set high enough Right, so this is another case where work mem is really important when it comes to a hash join We're only going to do a hash join if we think we can actually fit the entire hash table into memory If we don't think we can fit the entire hash table into memory our you know postgres kind of runs home to merge sort Right and because we know we can do merge sort out on disk because we can go do sorts and whatnot. Yes, Thomas What's that? Has joined we'll do batching. That's just bad At least historically it's been pretty I've seen batching go really bad So hash join can do bat what's called batching you can have segments of the hash table that come in and out from files in order to do batching but number one I Discourage people from looking at batching or wanting to do batching because my experience it ends up being bad being worse Typically you want to up work mem instead my experience if you can do it, right? You can get work mem to have one batch That's ideal The other thing I've seen is that we do go back to merge join pretty quickly if we don't think we can hash join it That that I mean It's gonna happen, right? So the most recent versions of postgres have made changes in this area, right? In particular around hashing and how we do hashing that have made it better So I some of this stuff may be only applicable primarily to like nine five and below but Be wary of merge joins is kind of what I'm getting at because they tend to be expensive In general a hash join is going to be better But you want to up work mem enough so that you can get a hash join Especially if you can get a hash join in one batch because that's going to be generally the most performant way of doing a large join All right Aggregation right so you can do you know there's a couple different ways of doing aggregation You can do what's called a group ag which is a lot like a merge join, right? We're going to sort all the data and then we're going to feed the data sorted into the aggregate functions To do their aggregation or you can do a hash ag This is another case where we're building a hash table, which means we have to be able to fit the whole thing in memory Now what we have to fit into memory is the result set, right? The result of the aggregation is what has to go into memory which can at times be much smaller and typically is much smaller than the Data set going into it. So you don't necessarily have to you know, you hopefully won't have to set work mem as high, right? So the problem here is that postgres has a really hard time figuring out how many records are going to be resulting out of Aggregation right so we don't know so I've set work mem up to stupid high levels before because postgres is estimate of how big It thought the hash table was going to be was way off Right postgres would think the hash table is going to end up being you know 16 gigabytes when I know it's only going to be Three just because I know the data right and because of that I set work mem up to 16 gigabytes Just for this one case where I've analyzed it and I know this is going to be the case so that postgres does do a hash egg Right underneath and it will only actually use as much memory as it needs So it's not going to actually allocate all of work mem It's only going to allocate the amount of memory that it actually needs to do it to do the work So hash eggs are good group bags are bad. So what's the best plan right? It all depends right? So the way we figure this out if we gather statistics using analyzed vacuum analyzed It goes into this PD statistics table auto vacuum will automatically run analyze for you typically if you have bad stats You end up with bad plans right and it sucks So you want to use explain analyze to look at what your plans end up being and also look at what the estimates are from postgres versus What the actual resulting row counts and things like that are? So you can automate the collection of plans using auto explain which is pretty handy pretty straightforward. It's just like PG set statements where you set up shared payload libraries And then you can set a log main duration and what whether you won't nested statements or not to be logged pretty straightforward It is expensive if you turn on the explain analyze option because that literally will just rerun Whatever query it was so that it can collect the explain analyze information and then roll back that transaction that gets very expensive So I wouldn't typically recommend enabling that except if you have a really hard time getting an explain plan for a particular query All right We have a number of different ways of analyzing plans. There's you know in terms of what the output is or explain plans can have XML JSON or YAML as their output and then you can analyze those in either pgm in three and four or you can use explain at The pes.com which is fantastic for doing this kind of analysis work. All right, so now let's talk about fixing them Right so some really straightforward low hanging fruit items when you want to explain and you say a sequential scan Right and you only get like one row back and you're not doing any kind of aggregation That's a very good indication that you want an index, right? You want to be using an index in those kind of cases if you've got work mem And you're seeing sorting or a merge join you want to be you know Or if you have a query plan where you're seeing sorting happening or a merge join happening You may want to increase work mem up to a point where you get just a hash join back, right? Or something along those lines, you know or an in-memory sort So when Postgres does do a sort and do an explain analyze You'll see it'll actually tell you when that sort is actually having to use disk and when it's not if you end up in cases Where you have a nested loop and you know some of the times the problem can be statistics, right? If you have like a really large data set and postgres isn't able to figure out That you know this query is actually going to return back a lot more rows than it's estimating Sometimes that can be helped by increasing the statistics target if you want to talk about statistics You should talk to Thomas because Thomas has been doing a lot of work around statistics, right Thomas? That's a lot of work. One of the other things you can run it into are deletes that are slow Right a delete can end up being slow because you don't have you know because postgres doesn't require an index right to be To be backing a foreign key cascade delete Right, so you have a primary key on a table. You have another table. It's a foreign key reference to it You delete something out of the primary key table, right? If there's no index on the column, that's the foreign key from the foreign table Which there may not be because postgres doesn't require it that delete will end up having to do a sequential scan on that other table All right, so this is something you want to watch out for in your systems because and you want to make sure you have an Index on that on that other table Right, so that's something you can see with deletes end up being slow and you have cascading deletes happening with foreign keys All right prepared queries if you see that the planning time is significant for your queries And this may be something you want to consider. I wouldn't necessarily run out to make everything prepared statements tomorrow. It's different from having prepared statements for the purposes of making sure that you don't have SQL injection, which I certainly encourage but When you're actually doing proper, you know prepare is in order to save a plan You want to actually look and see is planning time the reason that this thing is slow, right? There are ways of getting out what the planning time is recent versions of postgres will explicitly tell you how much time Was spent in planning when you do an explain analyze in PG 10 we have the ability to also say what the planning time is on For just a regular explain without an explain analyze by turning summary on and it will tell you what the plan time is One thing to watch out for is that when we have this prepared queries with prepared plans Is that the first five times you run a query, right a prepared query? Postgres does this kind of little check to see okay Would it be sensible to replan this query, right? Would it help make things, you know with the query end up being much better if we replan this query entirely rather than just Planning it rather than just using the generic plan that was created So when you do a prepare we create a generic plan But when you actually pass in a value right when you run an execute We're going to take that value and we're going to consider okay Is it possible that with this new information we have with the specifics of this information that a specific query? Using that value would be much faster, and if it is we're going to run the specific query Right, and we'll do that for five times if for five times the generic query ends up winning out, right? Or isn't any different then we'll just always use the generic query Right, but there's this kind of magic number five in the back end that does this So if you start seeing things where like the first five times you run it It's fine, and then after that it like you know you the data changes and suddenly the query is horrible You may want to re-prepare it right or just have some ability to re-prepare that query because that's what's happening Or just always plan it out explicitly every time Okay, so let's just talk about some really straightforward queries here, so we're Going to talk about Straightforward things right so like a select count star is very straightforward, but a lot of people end up doing it Right well, how do we make a select count star fast? Well, one of the ways to do that is to have an index right if you have an index underneath of that table Then a count star will actually be faster anyway because it'll use an index only scan Right we still have to go through all of the entries in the index But at least that's typically going to be a lot smaller than going through all of the records in the in the table right or in the heap Downside of this if you don't have a visibility map We aren't going to be able to do an index only scan or if the visibility map isn't current then even if we do an index only scan we ended doing a lot of Going back to the heap to check the visibility information So keep your visibility maps current and this kind of a select count star is pretty fast because it's going to use the index Of course, there's other approaches right you can go create a side table and update it with triggers and whatnot That certainly works too. It's a different approach Other one that I see a lot of is people do select star from table Right well, do you actually need all of those columns coming back? Where this can be a really big problem is When we're talking about really wide columns right a great example is everybody who uses json or json b nowadays, right? Well that json b blob that you have that's like 10k in size right the way Postgres is going to handle that is we're going to take that Whole big blob. We're going to compress it and we're going to store it in another table Right because it's more efficient to work with it that way and that keeps the primary heap pretty small Right so in the primary heap We just have a four byte pointer right that points over to that entry in the toast table Okay, saying where they get that that big long big chunk you have What this also means is that if you use select star and you don't actually use that 10k blob of json That's coming back you're spending a lot of time making Postgres go grab that data from the other table Decompressed it stream it back to you and give it to you and you don't do anything with it right wasted effort Don't do that generally speaking. I don't use select star at all right Don't use it at all because you want to only return back the actual data that you need to get your work done Right and of course if you know make sure you're using where clause and conditionals and whatnot so that you're only pulling back The set of data you want to right because that's also how an index is going to be useful Things like select distinct is something I see a lot and I definitely tell people watch out for select distinct Right you don't want select distinct generally speaking because it usually is an indication that you are missing a join condition somewhere Right and so what postgres is doing is is doing a big Cartesian product join between these two tables and then filtering them out by basically Probably sorting them possibly not but could be sorting or could be hashing them and then Getting you down to just the distinct set of tuples, but we're doing a lot of work for nothing, right? The right answer is to fix the query Right and typically one of the ways to do that to make sure you're not missing any join conditions is to use Join syntax rather than comma splicing right that kind of comma join a comma B comma C is comma joining and it you know does this Cartesian product thing Typically, I recommend that what's down here at the very bottom which is explicit join syntax so that you see Exactly what your join conditions are between all of your tables to avoid the risk of having a Cartesian product All right Another case that we're at is think you know so more queries lots of queries. We're gonna talk about So this is things where you can do like a select star from x where my idea is in some select my idea from this big table Right which can end up being a big set right. It's typically not a good idea Generally, you want to turn that into a join or you want to use something like not exists or exists Right those are generally going to be faster So I typically try to say don't use not in or in use exists Right and the way you use exists is with a sub query, right and a correlated a correlated sub query specifically So that's definitely something that you want to use if you're doing a not end You might be able to turn that into a left join right if you do use you know If you like the vision of it, right postgres will typically realize when you use a not exists So what you mean is a it's a left one or an anti-join right and then it'll go and implement that underneath for you All right CTE's so CTE's are great I really encourage people using them the downside of them is that when you use a CTE what we're gonna do is we're gonna run That query and we're gonna take all the tuples from that query and store it in a tuple store Right now that can be in memory, right? If you're if your temp work mem or your temp mem is high enough or whatever It's gonna be in memory But even so the only way to then access the tuples in that CTE is by scanning them Right, so you don't want to end up with a large data set. Yes Common table expression Right, so when you do use a like a with clause you see how it says with CTE here That's a common table expression, which are fantastic They're actually really great ways to build queries is using common table expression But you have to be wary of it because you don't want to have like a large data set being returned from a CTE Unless it's something where you actually are gonna use all of those records, right? I mean you have to be careful of it But you don't want to like return a whole bunch of stuff into a CTE and then filter Right because if you do that then we're gonna build this, you know We're gonna run that CTE to ground Build up this big tuple store and then scan it and filter it all out, right? You want to just filter it from the get-go in the initial query, right? So what I typically do is I'll I'll use CTE's to kind of build up, right a query from the top to the bottom But just make sure that you realize how the CTE's are gonna be accessed between them If you want a really really fast count star One of the ways you can do that is by looking in PG class at the statistical information at what's called rel tuples, so it is Very very very fudgy, right and it's just completely a statistics base So it's not gonna be perfect if you want something faster a trigger based approach is the way to do that If you want something that's actually accurate All right, so a little bit of review Obviously, there's a lot to tuning postgres another big one is partial indexes So if you're not familiar with what a partial index is it's where you can say create index and then you can have a where clause Right and that where clause will create an index on just the tuples that are passing that where clause or that conditional Meaning you get a much smaller index. It's like yeah, people do indexes on on Boolean fields, right? Not a good idea Partial index on a Boolean field great idea right just do a partial index on that Boolean field for where it's true and Suddenly your index is tiny right because it's like, you know You're only looking at records that haven't been processed or something right and that index is tiny and very very fast And very very efficient to work with Right, don't make a don't put an index on a Boolean field where you're only ever caring about one side of the Boolean Let's put it that way But the other thing is that generally speaking you want to use indexes when you're pulling out a very very small set of data Right something less than 5% of the table typically right don't create an index because you want to pull out half the table Right if your query is pulling on half the table an index is probably not going to be helpful and postgres Probably won't use it one of the other things to do is go look through your stats and see which of your indexes are actually used Some indexes you have to keep Because of the fact that they are backing constraints, right? But for other indexes that are not that way consider whether or not you really need them Also, whenever you're creating partial or functional indexes double check your query plans to make sure they're actually going to get used Otherwise you have an index that you're paying to maintain an update and not doing anything with it All right, so questions. I know that was fast. I know that was a lot. Yeah Yes Yes Yeah, I don't think we actually know what the app name is at that point when that message is logged Yeah, but I think we I think we actually log that connection message earlier than we parse that information out and figure that out So I don't think we actually physically know that information. It's a good question, but I don't think we know it We might be able to change that Okay, well ask on hackers or talk to one of the consultants right here. Yeah, that doesn't matter You won't get an index only scan, but it will use the index for just where that boolean where that active flag is true Well, you are gonna have to include a where active right inside your query statement Yeah, sorry. Sorry. Well, we'll talk about it later. We're out of time But yeah, it should we should be able to use that as long as the conditional says where active it should use the index All right. Thank you