 There we go. Yes. I'm going to be talking today about Explain, which is PostgreSQL's primary tool for query execution, query planning and execution analysis. It is an extremely powerful tool. It is also somewhat unintuitive, hence the presentation. So, first let's talk a little bit about what Explain is. Explain tells you a number of things about your queries. It tells you what the query planner planned on doing, how it planned on executing the query, and then gives you some statistics depending on what mode you do it in, some statistics on what data it expected and what data it got, and can often tell you which step in the query took the longest. But Explain doesn't do everything for you. It doesn't, for example, if you're saying, why didn't it use that index? That's not something you can get out of the Explain plan. Sometimes you can infer it from the Explain plan, but you can't get it out of the Explain plan. It doesn't tell you how to rewrite your queries to make them faster, and it doesn't say anything about other factors that may make the query slow. For example, you will sometimes find that the majority of your response time in a query, like one of the things that I find all the time when I do analysis is I look at the Explain Analyze and Postgres, but I also look at the response time as reported by the application. And you can have a query where Explain Analyze says it's being executed in 50 milliseconds, but the response time in the application is three and a half seconds. And that's actually because that time is being spent in data transmission or network problems or rendering on the application. That's not uncommon at all. And Explain won't tell you that. It will only show you the time spent inside the database. And it won't tell you, and it particularly won't tell you, for example, things like data transmission time. Because if you request 150,000 rows from the database, that is going to take a significant amount of time to go across the wire. And that, again, does not get shown in Explain. So it doesn't tell you everything, but it does tell you a lot of things. So now I did say that Explain is a little unintuitive to interpret. So in order to understand what Explain is showing you, I need to actually explain a little bit about how the Postgres query planner works. Because how the Explain output appears is really tied to how the query planner works. So what happens with the query planner is it takes the parser has parsed your query by the time the query planner gets to it and rendered into a canonical form. And so the query planner breaks that down into individual atomic units of work that are called nodes appropriately and then the query planner comes up with every possible different way it could execute each one of those nodes. And then it changes those nodes together, and that will involve some discarding possible execution plans because they can't be chained together. And then it will take those chains and estimate a cost of executing that particular chain. And whichever one has the lowest cost is the one it will pick. And that's the one that it will display to you. The, I'm told, this was before my time, it was back at UC Berkeley, that our query planner was originally written in Lisp. If you look at the code you can kind of believe it. It's very recursive and uses a lot of lists of things. So, but you really have to kind of understand the node concept to see what the Explain is showing you. So I mean it's the basic idea is for every node, right, you have two inputs. One input will be one or more row sets. You know, in the case of joins it will be more than one row set. In the case of other forms of operation it will be a single row set. And a row set is a synthetic set of data that might be a table and it might be data coming from another query node. And then parameters that go into that node that define what the node is supposed to do, how it's executed thing. And then what will come out of it is another row set. And that's the query planner showing you. So for example, if you had a sort node, right, you're doing a sort on a column, right? So what's going to go into that is some scan rows and then a list of sort columns and how you want to sort on them or sort expressions. And then what's going to come out of that is a sorted row set. And that's going to be the text that Explain gives you. Now, I've also mentioned here several times the term cost. So Postgres has what's known in the database world as a cost based optimizer. As in we try to estimate the cost, which is generally sort of an estimation of time because that's what people care the most about is response time, right? It's generally sort of an estimation of time that it's going to take to execute the query and pick again the plan with the lowest cost, right? Because everything else being equal, you want the query plan that's going to give the shortest response time. But it's important to understand that these cost units are completely relative. They are not meant to represent anything external to the query planner itself. As a matter of fact, costs between different queries are not even necessarily corresponding. Like you can pretty much guarantee that a query that has a cost of 10 million is going to take longer to execute than a query that has a cost of 1,000. But two completely different queries, one that has a cost of 1 million and one that has a cost of 1.5 million, you can't make any assumptions of what's actually going to execute faster. The only thing that cost is supposed to be comparable to is that for the same query, the higher costing plan should execute slower. And then in cases when it doesn't, those are cases where we want to tweak the query planner or the statistics or other information in order to make the cost and correspond better with reality. Now that being said, Postgres has this parameter called sec page cost, short for sequential page cost, that people do like to manipulate in order to try to get that cost number to correspond more closely to say milliseconds of execution. But it's still an estimate and it's still not scalably comparable. So you can only get extremely approximate that way. I regard that as a waste of time because you're trying to actually make one thing represent another that it doesn't. So let's actually look at some of that. So here we have a basic, a very simple query with a very simple query plan. These are all out of my sort of fake public library database called libdata that I have on several projects on my GitHub. And we're doing it very simple, right? We're looking up a patron by their indexed library card number. And Postgres can be implemented in a really simple way. It's going to look on the primary key index and it's going to retrieve our one row. So now one of the things, the most important thing, like the very first time you jump on Postgres IRC and you say this query is slow when you post an explain plan. The first response you're going to get, how many people have done this? What's the first response you get when you do that? The first response you get is please run an explain analyze. Because here's the thing, explain by itself shows you what the query planner planned to do. Explain analyze shows you how it came out. That is, you know, I expected to retrieve 1,000 rows from this but I actually retrieved 10,000 rows. I expected this to cost X and this is how much time it took. So explain analyze actually shows you that. Now it's important to know that explain analyze does actually run the query because without running the query we can't find these things out. That's critically important if you're trying to analyze a right query of some kind, like an update or insert or whatever. It will actually do the update or insert. Explain analyze will. Explain will not. Explain analyze will actually do the update or insert. So if you're going to do that it's important to do it inside a transaction so you can roll it back. So here's an explain analyze and you can see the extra information that we have here, right? We have our index scan and we still have our cost information and that sort of thing. And brother I'm going to go over this in detail in a minute. And then we suddenly have these actual time figures and actual rows. And then in more recent versions of post queries, this is from 9.4, you get other information like how long did it take to plan the query. Now one of the things you'll find in very simple queries like this, the planning time is actually greater than the query execution time. That's not uncommon in queries where you're retrieving one row based on an index where the entire execution time of the query could be measured, you know, is measured in the tenth of a millisecond. And you know, so and right here we see this actually it took us more than twice as long to plan the query if it did to execute it. So you get a lot of good information that way. Now you pretty much always want to run explain analyze because explain by itself you're going to get what the query planner considers the most reasonable plan. And most of the time the query planner is right. One of the big cursors from a Postgres development standpoint, one of the big problems with the query planner is that it is right so often because it makes us very reluctant to mess with it because if we start messing around with the internals of the query planner what we're going to initially get is a lot of really bad queries until we like retune it. So it's right most of the time and so for that reason the just looking at the explain plan by itself doesn't necessarily tell you anything, you want to explain analyze. Now there are a few reasons why you wouldn't want to explain analyze. Obviously the update case is one, if you can't wrap it in a transaction roll it back because of locking or whatever. The other thing is maybe the query is so slow it never finishes, right? In which case you can't run explain analyze because it never finishes. Maybe you just want to check whether or not a particular index is going to be used and the explain plan will tell you that. And maybe if you've got partition data you just want to check which partitions are going to be hit by a particular query. You want to make sure that the query is partition compliance so you're only going to hit the partitions that match. And again a plain explain plan will tell you that you don't actually have to run explain analyze. The whole rest of the time you're going to want to run explain analyze. Now there are some other options for explain that are useful. One of the big ones I got introduced in Postgres 9.2 I think is buffers. So if you add buffers then it tells you information about how much data was retrieved from Postgres' dedicated cache versus how much had to come out of the file system. Now because Postgres goes on top of the file system it can't tell you how many things came out of the file system cache versus off of disk. You have to sort of infer that by how long it took. But at least giving you an idea of cache data versus red data gives you an idea of hey is this query really slow because we're waiting to read stuff out of the file system. Cost and timing are on by default. That's what you saw before. You can't actually turn them off. The one reason why you'd actually want to turn them off is on some platforms such as Win2K server checking the system clock is really expensive. It is so expensive that it affects the query execution time when you're doing explain analyze and thus gives you a really deceptive result. And so that's the reason to turn timing off. The other thing that we can actually do is we can change the format. That is what I've been showing you up until now is the default format, the default sort of line oriented text format for explain and explain analyze. But we also offer explain and analyze in YAML, XML, and JSON. And actually when I'm actually doing work I use the YAML format a lot because it breaks out each different thing into a separate key value field and I find that much easier to read. Plus I can put it into scripting tools and then actually look for certain patterns. And then there's verbose. Let's actually show you a few of those. So this is an example of buffers. So one of the things that we're actually getting here so we've got this and we're doing here, now you notice when you want to add these extra parameters you have to use this extended format. Explain open parentheses, you know, parameter and then honor off, parameter value. And that's the extended format that we're adding for a lot of Postgres commands because we decided a while ago that we wanted to stop extending the SQL parser syntax all the time because that was becoming a real problem. So instead you put these in parentheses. So it might be a good idea to get used to doing that anyway even though you don't technically need it with analyze and verbose. And then we're going to go here and I'm actually doing a select from people and then I am doing a JSON search because this is a JSON field so I'm doing a JSON query on looking up certain key values in the JSON field. And then here I'm getting a lot of information and here you can see what we're getting from buffers. So we're doing a bitmap heap scan on people right here. Bitmap heap scan means that we are reading data from the, actually this is a two-part thing. So the first part is we do the bitmap index scan which means that we are reading the index into memory and then transforming it into a bitmap so that it can be manipulated and matched up with other query criteria. And then it says buffers here, right? As in we hit one data page already in Postgres memory and then we had to read 39 of them which is fine, you know, it's like less than 1K of data. No, less than 1 megabyte of data, not really an issue. But then we go up to the bitmap heap scan and we're hitting 18,000 in memory and then reading 70,000 out of the file system. So if this query was slow that would be a pretty good hint that it's probably pulling data off of disk and that's why it's slow. Now, here's a cut-off example of the ML format. Obviously since it's vertical, it would scrolls all the way down but you can see how if you're looking for these specific statistics it's a lot easier to read this. It's a little harder to see the whole query because you have to do a lot of scrolling. But it's a lot easier to read the parameters because they're broken out into separate fields instead of being in a massive long line that wraps. However, I'm not using the ML format for the slides because being vertical it just runs off the slide all the time. Okay, and then verbose actually doesn't give you a lot of extra information. It's possible in the future that we might add extra things that display when you're in verbose mode but right now what verbose mode mostly does is remember I said there's row sets going into the node and row sets coming out of the node? It gives you all the column names for those row sets going in and out. Where that's really useful, someone else wrote a query, they handed you that query and the query lacks table aliases. So you're like, hey, where did this F name field come from? What table is that from? That'll tell you that. You can actually trace that particular column through its various levels of the query to see what table it originally came from so you can see where it's missing an index, for example. As we'll say it, obviously it is verbose so be prepared for lots of output but it's useful that way. So here we can see actually here's all of our right output, all the output columns from each level of the query. And for example, one of the things that I would say here for this is like, hey, at this level of the query we're actually getting all of the columns in the fairly broad patrons table even though we're only going to display four of them. Couldn't we add a filter criteria to, you know, couldn't we add a filter criteria when we're searching the patrons table and actually avoid manipulating some of the data? That sort of thing gets actually particularly critical when you have these giant tables that have 600 columns. Anytime somebody does a select star things get really slow. So that's one of the things that you get used fully out of verbose. So next we're going to talk about reading explained in detail. Before I go to that, do you think we have any questions about the options? I'm actually going to show you what each different clause means in just a minute. But anybody have any questions about sort of the options for explained before we get to that? No, we're good? Okay, cool. So the first thing to understand about explained, and this is the hardest thing to sort of grok when you're first looking at it, is it displays as an inverted tree. And what I mean by inverted tree is that the thing that's executed first is the one that's indented innermost. So in this particular query, which has a join in it, this bitmap index scan on loan patrons, so we're joining the patrons table against the loan patrons table, this particular bitmap index scan on loan patrons is what got executed first. For certain definitions of first, here's the important thing. When Postgres is waiting on I.O., it will attempt to go to other query nodes that it can execute in parallel. So you will actually have overlapping execution between nodes. No, we don't have full parallel query in Postgres yet. It's coming in 9.6, we're in a full parallel query in Postgres yet. So it's not really parallel, it's actually more of sort of an asynchronous request model, right? When we're waiting on other systems, then we'll go on to other stuff. But anyway, this is executed first, and then second, we have a heap scan on loans and an index scan on patrons that's being executed second. And again, like I said, we're going to have some I.O. going on there because we have some I.O. going on there. Those two things will probably be executed more or less concurrently. I mean, it's context switching by the particular processor, but it's more or less concurrent. And then when those get executed, that all gets passed up to the join, which is the nested loop join right here. And that gets executed last. So we have first, second, third. So that makes it hard to read because first of all, you have to understand that the innermost thing is getting executed first. And the innermost thing may not be on the bottom. Sometimes we have other stuff in the bottom, and you'll see some Harrier queries with an example. And the things that line up, margin-wise, are executed more or less concurrently. Now let's actually look at what's in those lines of explain. So here's one line of explain for an index scan on patrons' peaky, right? patrons' primary key. So we've got the index scan right here. So the first part of it just tells us what is the operation we are performing in this node. So it is an index scan node. So it goes first, you know, two words tell us what kind of a node it is. It's an index scan node. And then the rest of it will be some sort of a text phrase telling us what were the operational parameters of an index scan node, right? As in an index scan node, what index are we using? We're using the patrons' peaky index. What relation are we scanning? We're scanning patrons. Now depending on what kind of node it is, that text string is going to be structured completely differently and doing a join node, the information that we need is completely different from an index scan node, right? When the second thing here is we're going to get some cost information. So one of the things that people with mistakes always say, hey, why is this cost thing two numbers? You know, why is this a range here? What does that mean? Does that mean the cost is somewhere between these two numbers? Well, what actually we're showing you here is this number is the estimated startup cost. That is, what is going to be the cost of launching this node at all? In the case of an index scan, what is the cost to get the first row from the index? And then the other cost is what is the estimated total cost for executing this? So it's the second one you're generally concerned about. And then the next thing that we'll actually estimate for you is how many rows do we expect to return, right? And what's the estimated width? And we'll talk a little bit about those more in a second. We've got, so first of all, we're expecting a cost, startup cost of 0.28, final cost of 2.5, and again these are arbitrary units. Estimated to return one row, that row is estimated to be approximately 24 bytes wide. That width becomes important if it's a really high number, otherwise you can pretty much ignore it. But if you're returning a whole lot of rows, or the rows are really wide, then just the sheer memory manipulation of moving those rows around takes a significant amount of time. So, now the other thing important to know about these costs is that the costs accumulate upwards in the query plan. So, here is our first query node, and it actually has a startup cost of 0, which probably means it's just a straight up table scan which has no startup cost. We've got a startup cost of 0, final cost of 1.5, and then that accumulates up to the other nodes where we have, and you notice here the startup cost on that is equal to the final cost on this. It's not a coincidence. It's because what Postgres is telling us is it can't actually start this step until that step is done. It needs the data from this one in order to do that one. Whereas this one, it does not need the data for the other one, and hence the lower startup cost. And then this one, again, has an even higher, but you notice it's actually starting before, potentially starting before any of these are finished. That will often, you'll often see that pattern when we're doing a scan or a join or whatever and this stuff is going to a sort. Well, as the rows come in from the other node, if there is processor time available, Postgres will start sorting them. So, and then this is our sort of final estimated cost for the query. So, now if you do an explain, analyze, you're going to get an actual line as well. And so the actual line has some other stuff. We once again, this time, instead of having a cost, which is arbitrary units, we have a time in milliseconds. So this is very fast, 0.015 milliseconds. Now, in this case in the time, what we're generally showing is from the row set coming out of the node, what was the time to return the first row versus the time to return the last row. Which is similar to startup cost, but not exactly the same. For this, again, is how many rows did you actually return? And then you've got this loop thing. So, the reason why the loops is important is that there are certain things that Postgres will do, such as nested loop joins and CTE scans, we'll be defining this in a minute, where it might actually execute that same node multiple times in a loop. So the thing is, if you read this line, what you're learning is that executing that node took a total of 0.015 milliseconds, right? But in this line, we actually executed that particular node 1258 times. So this time is the amount of time for one average execution of the loop. So the actual amount of time taken by this node is 0.015 times 1258. It's one of those things that trips people up when they look at it. And this number of rows is the average number of rows returned from each execution of the loop. So the loop thing trips people up all the time, because they look at it and they go, wait, why does this node start only seconds after this other node is finished? This other node is finished after one millisecond, and this one only starts in after two seconds. What's going on? Well, that node was a loop. It actually executed 10,000 times, and it took two seconds to do that. Now, the other thing that you'll get in certain kinds of nodes, such as scans or joins or other things, is that you will get some conditions. For example, in an index scan, you'll often get filter conditions on the index, right? As in, hey, we're looking for this particular card number on the index. And so that gives you some extra information about what's going on, particularly useful if, for example, you're saying, hey, why didn't we use an index for this? Well, look at the filter conditions, see if the filter conditions correspond to any index you have. If the answers no, then you already know how to fix that query. So I've thrown around the names of a lot of, well, actually, let me stop. We'll take questions real briefly, so we actually have gone over the little parts. Anybody have questions on the individual little clauses before we go on to node types? That's a column name. That is a column name right here, and that is a value that I supplied. Yep, other questions? The width is the number of bytes in one row. Yes. Yes, the question is, if there are 10 rows, you only show per row, and the answer is yes. So, okay. Oh, pretty good, actually. So let's go over some node types. You've got a whole bunch of different node types in Postgres. This is in a slightly incomplete list of node types, particularly if you're using Postgres 9.5, because one of the things that we do in Postgres is we add node types all the time. Certain Postgres extensions will add their own node types. The query planner is designed to be pluggable. And therefore, as long as you meet the node API for the query planner, people can actually add their own node types in an extension, and they do. So you may see things that aren't in this list. Plus, there were some that were added new to 9.5 that are not in this list and other stuff. But these are the ones you can see most of the time, because they're the oldest ones and the most common ones, right? Let's go on. First of all, sex scan is our obscure term for a full table scan. Sex is short for sequential, and it means that we're doing a sequential read from desk. So that is the short, that is the Postgres cryptic term for full table scan, is sex scan. Sex scan is something that you want to see in a very small table, and you don't want to see in a really large table. Index scan is scan the index and look up individual tuples in the index in the sort of normal index tree transversal way. And index only scan is, so one of the confusing things about Postgres, how many people were in Quinn's talk earlier yesterday morning? So one of the things Quinn went over is how Postgres does in place concurrency, because rows are not deleted from Postgres immediately. They're marked as invalid for a certain range of transactions, and then garbage collected later. Well, so at any given time, Postgres has to check what are called visibility rules to determine if you can see certain rows or not. And the visibility rows are stored in the base table. However, starting in Postgres 9.3, we said, hey, if all of the data in the table is really old, then we don't actually have to check visibility rules. And so we added index only scan, and index only scan is really nice if you're getting sorted output from a really big table, because it means all we do is retrieve data from the index and we don't touch the base table at all. You won't see it as often as you would like to see it, but it's in there. Now, bitmap scan is scan the index and then build a bitmap of which rows correspond to our criteria. One of the primary reasons to do that is I'm going to scan a second index on another table, construct a bitmap, and then I'm going to overlap the two bitmaps in order to see which rows correspond to complex criteria. This is how Postgres can combine using two or more indexes on the same query, something that certain other open source databases can't do. Most of these kinds of scans will contain a filter condition. Obviously, sometimes you just select star from table, particularly if it's a small table that's not going to filter the condition, but most of the time you're going to be select star from, you know, select columns from table where, and that where clause is going to become what's known as a filter condition showing what you're getting. But do keep in mind this is a node, you know, per node thing, so sometimes the filter condition will be something that was generated by the query planner or executor based on criteria passed elsewhere in the query, you know, and sort of bubbled up. So here's an example of a bitmap scan, and this is why I said visibility rolls. So we actually have two steps here. We've got the bitmap index scan, and so this is where we scan the index, and we looked for all of the items in the pages in the index that corresponded to that particular library card number and constructed a bitmap. Now the problem is that even in the bitmap index scan, we do have to check visibility rolls, and in order to do that we have to do a bitmap heap scan, and heap is again another cryptic term for base table. So we do the bitmap heap scan to actually check, double check, that all of these rows correspond to our visibility rolls before we proceed on to the next step. In this particular operation, the bitmap index scan, you will often find that the bitmap heap scan takes longer than the bitmap index scan. That's because the base table is much larger than the index, and it's not ordered. Yeah? Right. Yes, so what's happening is we're getting all of the rows, so the index does not have visibility information. So we're getting all of the rows in the index that have this card number, and then we're checking those rows and only those rows in the heap, in the table, that is. So join types, Postgres supports lots of join types in order to have different strategies to see which join. Nest loop join, nest loop is the join that's most intuitively understandable, which is for every row in table A, look up all the rows in table B, and loop over them one at a time. This is the fastest join strategy when very small numbers of rows are involved because there's almost no startup cost. It is a terrible strategy when there are a lot of rows involved because it is completely serial and there's no batching. So if we're joining some big tables, Postgres is going to try to do some other things, like, for example, a hash join, right? Well, we actually make a hash map based on one table and then shove stuff into hash buckets from the other table based on which hash bucket they fall in and then sync them up. Whether or not a hash join is used is often dependent on how you've set the parameter work memory because that defines the largest hash map you're allowed to have as a query operation. So if you think Postgres should be using a hash join and it's not one of the things you can try doing is bumping work memory, a merge join says, hey, I'm going to want sorted output for both of these. You often see the query plan you're doing this when it knows you're going to be sorting afterwards and that sort corresponds to indexes that you have in both relations, right? So if I have an index and first name here and an index and ID number here and I'm going to be... Or actually, I have an index and ID number here and an index and ID number there and I'm going to be displaying them an ID number order and they're both big tables. One of the things Postgres will do is say, hey, I'm just going to sort everything according to ID and match everything that matches up, like a big zipper. So you don't see that as often as the other types because you really need to have certain query conditions for that to be the efficient plan. It has a huge startup cost, but if you're joining a big row set it's the fastest plan. More joins, because like I said we support lots of joins in Postgres. Lateral join is most often used against what are known as table expressions that is functions that return a table. And you want to actually pass in criteria from one table, like a scan on a table and you want to take the results of a column from that scan and pass that as an input to the function. So what you do is what's called the lateral join. It's really funky syntax. A lot of people can spend years in Postgres without using it but when you do need it it's really useful. And then it's a separate type. A semi-joined, this is how we execute outer joins a lot of the time. You know, as in, hey, we're only going to join against the rows that match in this table and we're going to ignore the ones that don't match. We're going to ignore the null ones. And the opposite of that is the anti-join. That's when you do see a not in expression or where not exists expression. We will often do an anti-join. So some examples. Again, this is joins. Now we have a hash join here, right? And one of the things I mentioned are all the conditions. So in the join the condition is you're going to find the matching conditions between the two tables or relations or synthetic row sets or query clauses that you're joining. So in this case we're matching the STL source ID against the source.id in the other table. And merge join the same, right here. Merge join. We're matching this against that. I pulled these off of the explained steps which I'll show you in a minute. And you can actually see some things that are happening. Like for example, you notice one of the reasons that a merge join is being used here is because what we have is we have two giant huge tables. We've got a row set of 30 million here. But the resulting, and you can actually kind of see how this filters out, but that's actually examples of some of these node types. So more node types. These slides by the way are online if you want to review them later on. I don't expect you to memorize all these. Aggregates. One of the things we do in Postgres is aggregate, right? Average, median, standard deviation, all kinds of other things. So when you see an aggregate, group aggregates, that's our basic aggregate. Again, fastest way to do things on small rows sets. Just go ahead and add up the values and when they don't repeat doing the value and that sort of thing, a hash aggregate says, okay, we've got a lot of data so we're going to construct a hash table and then put stuff in hash buckets and group them that way. Much faster for if you're going to have a lot of rows but your aggregation column is indexed. We'll do a hash aggregate. Window aggregates are used to support windowing clauses. How many people have used window clauses? Yeah. Those of you who haven't probably should. They're awesome. They allow you to do something in one query that would have taken two or three. I'm not representing some of the new 9.4 aggregates in here. Grouping sets, et cetera, which you'll now see as new node types. All right, 9.5 aggregates, sorry. You'll now see as new node types. So, again, examples of that. So here's our hash aggregate. So we've got our usual execution time and cost and then the criteria information we get here right is the key on which we are grouping, right? So these are the three columns, card no, last name, first name that I'm grouping by and we get that as our group key and that gives us our condition information. Or in the group aggregate, we have the same thing. Other operations that you'll see expressed as nodes, although they don't correspond to other things. Occasionally you'll see unique. It's a lot less common now that we have other ways to deal with returning a unique row set that are less expensive. But occasionally you will as a way of deduplicating rows and the primary things that'll produce this are like distinct and union as opposed to union all. And by the way, that's a query tip for you. If you're doing a union query and you don't have a good reason to deduplicate, then use union all because it is much less expensive to execute. The deduplication is expensive. But most of the time you won't see unique because postgres will do something like a hash aggregate instead in order to implement this under the hood. A sort should be obvious, right? We're going to have a sort. Limit, fusing limiter offset, that's the node you'll see is the limit node. Other operations, CTE scan, that's with clauses. So if you have a with clause and then you have another query portion that is querying that with clause, you'll see a CTE scan node that's our reference to the with clause. Subquery scan is the same only for subqueries. And this is usually in cases where the results of the subquery can't be folded into the main query. We try to do that whenever possible because the thing that postgres can do the fastest in terms of corresponding results is some kind of join. So one of the things that the query rewriters tries to do all the time is come with a way of how can I turn this into a join? So if you're seeing something with a subquery scan it's because postgres couldn't figure out how to turn it into a join. Materialize, that's actually our most expensive sort of subquery or with clause that says, hey, in order to actually execute this sub clause I had to actually turn it into a built, constructed complete rosette in memory, which is expensive. And that's what materialize means. So if you're seeing a materialize step in your query that can be a reason why it's slow. And then an append, and that's for like union all and partitioning and other things where we're appending rosettes to each other vertically. So CTE scan example, you know we're going to have so we're going to have all kinds of things CTE scan and it's going to have this and it'll refer to here's the name of the with clause, TOTAB and then there'll be a filter outside the CTE scan and in recent versions of postgres you'll get this tremendously useful rose removed by filter. Because that rose removed by filter is actually the majority of the rose that were returned you start saying, hey, is there some way I could push that criteria down inside the with clause so that I'm not doing all this expensive filtering. So having done all of that analysis the real actually I can take like one minute of questions on the node types before we start looking at why is my query slow? Any questions, yeah? Your statistics. Postgres keeps statistics on I think I talk about this in why is my query slow actually so let's actually see. Let me talk about why is my query slow, if not I will revisit it. So why is my query slow? So one of the things that you may have seen you've noticed the background of my title slide here well that background is this wonderfully useful tool maintained by Hubert out of Poland called explain.devces.com and what this tool allows it's an online website in PHP that lets you paste in a query text and it will kind of parse it out and it will attempt to actually find certain things about your query that are common patterns for why it's slow like a huge difference in the estimated number of rows returned versus the actual number of rows returned. It'll also tell you which step took the most time right because one of the problems is remember I said those costs in the times cascade upwards well so it can take you a fair amount of math to figure out how long did this step actually take independent of the other steps and so this will do that math for you. As you can see whoops exclusive versus inclusive times right so in this particular query hey this bitmap index scan is where I'm spending most of my time and explain.devces.com has highlighted that for me. So tremendously useful tool do keep in mind that it is somebody's public web thing so if you have you know data restrictions you work at a HIPAA compliant PCI compliant company etc you might want to pull down his source code for this and run it internally instead or he actually does have a button there to obfuscate all your column names. I don't really do any of my sequels so maybe. There are some other tools you can use the PG admin GUI web desktop client will do a visual diagram of the query execution I tend to find that that only works for very simple queries because the diagram gets unwieldily large very quickly there's a more sort of un-steroids version of this originally developed by Red Hat for Red Hat database and now maintained as part of the Enterprise DB developer studio I haven't actually seen this because I've never used the EDB developer studio but it's there. However for a lot of people their primary tool for helping interpret explain plans is the PostgreSQL channel on IRC.freenode.net where people come in and they say why is this query slow when they post their explain plan and then they get please do an explain analyze and they post their explain analyze and people are like ah you know so particularly when you're learning how to interpret explain plans the IRC channel is really invaluable so here's some examples of particular query problems that you might encounter and how you'd actually get that out of the explain plan so here's one where we simply need an index that we don't have because you can see here we're doing a full table scan and we're doing a full table scan on a pretty large table 200,000 rows probably full table scan is not the best strategy for this it's certainly not happening quickly and we've got this filter called category equals 7 now there's actually a little bit more to this query I just sampled it out there were other columns being selected but the other columns were what you call non-selective there were columns for each value had thousands of rows and PostgreSQL only use an index if it can pull a very small portion of the table usually no more than a couple percent beyond that it's actually cheaper to do the sequential scan so the answer for this particular query was to modify one of those indexes to include this category column and all of a sudden the query got orders of magnitude faster another one example is bad row estimate now I've actually flushed with the alignment here so you can actually see this I didn't actually pick it out so index scan and this sort of thing and here's the cost here's the estimated number of rows returned from this here's the actual time and here's the actual number of rows returned well we're a couple orders of magnitude off now these are estimates I generally say anywhere within a factor of five that is five times as many to one-fifth as many well doesn't generally change the query plan sometimes it does when you're really close between two different cost estimates but it doesn't generally but things where you are two orders of magnitude off will change the query plan and this particular case you know if you're actually pulling ten percent of the table then in index scan is not the way to go it's going to be slower than doing a sequential scan so and you can see where this estimate gets off trying to remember if I talk about statistics later anyway we'll find out now another one that you actually have here is I mentioned that loop thing and this is where a lot of people miss the calculation which is we're doing a nested loop query up here and we're corresponding to ten thousand rows which means that on this actual index scan that we're doing is we're actually looping over it you know almost eleven thousand times and that's what's actually consuming a lot of our time that's why actually the end time for this is what it is most of the time it's being spent in that loop and so you look for these things with very high numbers of loop values usually there's a better way to do that now this is a complicated one but it's really annoying because there isn't a good easy way to fix it so one of the things we implement in PostQuest is this thing called abort early plans and the way that an abort early plan works is hey you've asked for a hundred rows from this table and I don't have a good index to use or I don't have you know or I'm joining against something else or whatever but the criteria you've specified are very broad and therefore I think I can scan one percent of the table and get your hundred rows and then I can quit the problem is if PostQuest is wrong about how common your criteria are or how well distributed they are like say that your criteria are common but all those rows are grouped somewhere down towards the end of the table then it can end up scanning the whole table so where you can actually see this here is you notice this really weird circumstance look the cost of this node is 9 million and the cost of the node above it is only 2,600 wait I thought you said cost accumulate upwards so the cost of accumulating upwards why is this cost this low well the answer is PostQuest is expecting to abort this join long before it finishes executing it but if it's wrong that can be a really slow query and the reason why it's doing that here is if you look down a node and you say hey we've got this index scan backwards so what happens is you have an eighth ending index and then you ask for everything greater than x we'll do an index scan backwards and we expect that the index scan backwards you see we've got this right here 4.2 million rows on the index scan backwards so PostQuest is figuring that those index criteria apply to a lot of the table but it's wrong they apply to a much smaller slice of the table which means you're going to scan a lot more of the table to find the 100 rows you were looking for I did say I would mention statistics so the way that PostQuest the way that the query planner figures all this stuff out is that we have tables of statistics kept in the PostQuest system catalog about what's in all of your tables and these statistics are asynchronously updated however they are samples number one they're based on sampling the table because we don't want to full table scan all tables every time we update the statistics they'll be prohibitively expensive so number one they're based on samples and number two we're then applying algorithmic rules to those samples so we have a list of the most common values we have some histograms of ranges we have some information on how unique values are in the column and that sort of thing because all of those things are estimates they can all be off, right? so if you apply an estimate to an estimate you get an estimate that has an even higher range of error than you did originally so one of the things that you can actually do is mess with some of the execution of things now the brute force way to do this is to actually force certain plans in PostQuest by turning off certain types of nodes so if you say that enable nest loop equals off then you're preventing PostQuest from executing a nest loop now that is fine, that is great as a way to test different query plans like hey, if this was a hash join, would it have been faster? well let's force that by disabling nest loops right? and then you can see hey, no a hash join was actually much much slower that's why the query planner predicted it was right so it's useful for testing the mistake that people make is that they actually change these things in PostQuestQL.com because here's the thing, if you do that in PostQuestQL.com if you're not disabling nest loops for that one slow query you're disabling nest loops for every query in the database which means a bunch of queries that used to be fast will all of a sudden become slow don't do it that's for troubleshooting queries only now one of the things you can do that's more useful is I said that the statistics were based on sampling you can increase both the default sample size and you can increase the sample size on specific columns the default sample size is like 100 samples which is very small if you're looking at a table that's got 50 million rows right? so you can increase it up to 10,000 samples the anywhere over about 500 to 1,000 I really recommend doing it on a per column basis because if you're sampling say a large text column it gets really large and if you start bloating out the statistics table then the statistics table itself gets slower at which point all of your query planning gets slower so you don't really want to do that do it on the columns that you frequently join on for example the other thing that you can do is that you can actually clobber the statistics in certain ways indistinct is the estimation of how unique a column is and because of there's no such thing as a perfect algorithm for estimating indistinct in a small sample so sometimes Postgres is way off based on how it took the sample and the thing is it's not that hard for you to get a better estimate of indistinct simply by doing a group by query one time the indistinct of tables that have been used for a while doesn't tend to change much over time and so you can overwrite it with a set statement and then you go ahead and run the explain again to see if it actually changed the query plan in this case for our abort early plan it did so we changed the query plan by changing some of the statistics and saying hey you know this abort early plan is bad because hey there's a lot more rows than you think by increasing the sample size we changed the histogram and Postgres suddenly realized that more rows would correspond to those criteria and therefore the abort early plan was a bad idea sorry less rows would and therefore the abort early plan was a bad idea so that's your sort of cycle so that's your basics of explain so you have about five minutes for questions so we have questions yes yes you do if you're doing any of these statistics things you have to run an analyze statement it doesn't take effect until the next analyze so either you run an analyze statement or you wait for auto analyze to get around to it well no the thing is like say you've got a B tree index right that is a normal index on last name or a B tree next actually on a date added right and then you want to see everybody who's date added is after a certain date then it's going to do a reverse index scan because that's a descending search but the index scan is an ascending order I'm not entirely clear on why reverse index scans are slower than forward index scans it has something to do with the data structures that we use they are actually a little bit slower so for example if you are searching on greater than that date column all the time you probably want to remove that ascending index and replace it with a descending index because it will be just a little bit faster more questions question was whether or not you could force it to use a particular index and the answer is no we haven't implemented that in Postgres for a variety of design reasons the you can manipulate a bunch of other things that will end up causing it to use that index and there are certain things that you can do with indexes like Postgres supports the concept of partial indexes where you do an index on where yeah it's generally better in that case to actually look at why is it not using that index in the first place often if the index really would be the fastest way to do it and it's not using it there's a good reason why like actually your criteria don't match the index as well as you think for example one of the problems that you'll have in indexes on text columns is mismatch text encodings so you're using actually a different collation in your client than the database that the column was populated with and therefore we can't use the index on the text column the or you're using say a like or unanchored text search and the using a like criteria and the text column wasn't built with there's this criteria called text pattern ops that you can apply to an index on a text column that makes it support like and otherwise it doesn't unless you're in C encoding there's a bunch of little things like that will actually prevent Postgres from using a particular index even when you think the index is applicable and sometimes the index is actually slower in that case it would be nice to be able to force it to use the index just as an experiment but we don't currently have a way to do that so I did a tool a while ago trying to remember where it is on github that actually looks through all of your tables and then based on an estimate of how many rows it has in your tables it takes all the columns that you've bothered to index deciding that those are your important columns because why would you index them otherwise and increases the statistics to a value based on a coefficient of that I did that quite a while ago though and I don't remember where it is ping me later on and I'll find it keep in mind that increasing the sample size in Postgres is not always beneficial and it's not always size based that is in a lot of cases where you want bigger samples is not because the table is bigger but because it's more asymmetrically skewed as in like if your data is completely randomly distributed then you can go with a sample size of 10 it's when it's not randomly distributed that you need larger sample sizes so in cases of highly skewed tables for example if you have a column that represents activity status and then you're corresponding that against date well obviously the active rows are going to be mostly the ones with the latest dates but Postgres won't know that instinctively and so that having higher sample sizes there's been various proposals to actually make the automated sampling correspond to a percentage of the table but none of those have been incorporated into mainstream Postgres because that would actually make more sense from my perspective yeah, yeah one of the things to understand is until Postgres 9.6 we will not have any correlation stats between columns so Postgres assumes that values between two columns are entirely randomly matched which is a really problem when they are entirely non-randomly matched like say month and year or zip code and date Postgres will estimate way low on the number of rows returned from that we are getting column correlation stats in 9.6 for some value of that and that situation will improve a lot but for right now if you're saying hey I put criteria on two independent columns and Postgres estimated really low orders of magnitude low it's usually because those two columns are actually highly correlated but Postgres has no way to estimate that and sometimes the answer can be making a multi-column index which Postgres will evaluate according to different stats it doesn't, you know and sometimes there isn't really a good answer for that so I take one more question then we have to quit one more question okay, awesome, thank you very much