 My name is Peter Gagan. This talk is called SORT versus HACC, the duality. I work for VMware. I work on Postgres. I'm known probably mostly as the person that is the primary author of UPSERT, but I'm also well known as someone that works on sorting, which somewhat inspired this talk. I should also mention at this point, you actually can't see it here because it's been cut off, but you can barely see it. The intention here is that you go, if you are so inclined, at least to this URL down on the slides, because there are citations and stuff like that that you may want to have a look at in your own time. That doesn't seem at all visible, and it won't become more visible at any point, so I guess that didn't really work out. But I can give you that if you want. You can ask me for that. The intent is that you could, if you were so inclined, for that. Okay. First of all, I'm going to talk about nested loop join, which is the, you could say, the most elementary of join algorithms. I'm going to sort of discuss how it's important, why it's important. So here we see an image of a B-tree index. This is actually a fairly small index, but it appears here twice. This is the full extent of the B-tree, whereas this is zooming in on the root here. So there's only one root page, and there's, I believe, approximately 20 leaf pages. So it's a fairly small index. There's an enormous fan out here. If we were to, that's what we're actually focusing in on there at the bottom, that's to scale. If we were to display the entire B-tree, we would need an additional, I believe, 10 projectors, so you can, if you think of it intuitively, it would be certainly wider than this entire room at that scale. So extremely, extremely wide, a lot of fan out. I sort of tend to emphasize that because I think it's something that's often lost. A B-tree is certainly not like a binary search tree. It's characterized by this enormous fan out, and that's important for reasons we'll discuss in the context of this join algorithm in particular. So I think that's actually 240 kilobytes in size, or 30 blocks, including one meta page. So nested loop join itself, I called it the kind of universal join algorithm usable in principle to implement any join that can be expressed from SQL. It is literally a nested loop, a for loop within a for loop on the inner side, typically an index scan, such as a B-tree index scan, typically a B-tree index scan. So the output from the inner side is sort of plugged in to the outer side. This is your OLTP queries. Most of your queries really are probably going to be pretty nest loop heavy. So it tends to be that OLTP is an awful lot of this sort of query and less of the other two. That is sort merge join and hash join, which are the sort of primary topic under discussion here today. Now I'm going to illustrate how in practice a nested loop join works under the hood. So here we're joining sales against coupons. Here's our join output. So on the left hand, the sales table in this example is sort of the outer side. So it's the first for loop, if you will, whereas this one is the second one, whose first value is at this point in time, still in determinant. So we're going to go to the B-tree, scan a value, we obtain what that value is, and that's fed in. So it's plugged in effectively. Let's do the next one. It's repeated. Again, we start from the root. It's a totally different part of B-tree in this case, which is not at all untypical. Again, we can obtain a value. That value is once again plugged in and so on. So if we're doing this at all, it's very effective if we only have a fairly small number of things we're ultimately inputting, because those indexed grants are possibly completely random in nature. It's very effective if we've got a relatively small amount of data to process all told. So here's an actual visible explain output. In fact, this example is lifted from the documentation directly, so you could potentially do this at home if you were so inclined. So we're joining this one table against this other table. We've got a predicate here. We also see various costs displayed in terms of the generic cost units that things are costed in terms of. So we can actually do a little bit of math here and determine how those costs sort of percolate up and how, in general, nested loop joins their cost is a function of how many rows you're processing quite directly with a tiny little bit of additional overhead, which is why the total summation here is not quite the same. But it's essentially the cost of executing this and then for each input, plugging in this and then getting. So the costs are multiplicative in that sense. So I already said low latency OLTP queries. This is the whole reason or pretty much the major reason why it's possible in general for us to have really well performing OLTP queries. You've got like seven joins and it takes less than 10 milliseconds. This is the main reason why. Nothing will scale as well as this but a logarithmic lookup for index scans in terms of the number of page accesses in general and you're never going to, nothing will ever scale as well as this will. This is in some sense special because it will be the thing that will continue to scale extremely well as you get more and more data and as less and less of it can be stored in shared buffers. So sort merge join hash when in competition if you will with this particular join algorithm. They're more in competition with each other. It makes sense to compare them to each other but this is a sort of stepping off point for even discussing that. So this is a quote from a guy called Goods Grefa, a database researcher that I'm known to follow quite closely and he essentially formally says what I just said, kind of a bold claim he makes here effectively independent. That's not quite true of course there's a logarithmic growth factor in terms of number of levels you'll have to descend but it is practically the case that the overall size of the beetry once it passes a fairly small point will effectively never add another level or theoretically of course it will but the next point that another level is added is so far off as to be not worth considering so far off that you could never possibly have a table that big in Postgres it would come against fundamental limits that the system has in that general area. This is only for point scans though as I said when you're accessing relatively few things. Anyway he continues that the real sort of trick with OLTP certainly is to have things characterised by looking stuff up as opposed to characterised by computing. So you don't want to be searching through and computing against a whole bunch of data. I think that's something that probably many of you have an intuition for already. In general the main cost tends to be just accessing data itself and if we can, by using desolute joins and by having good query plans from a good optimiser it's often possible for us to have things such that we're mostly just doing lookups and it's very, very fast and we'll tend to continue to be very, very fast even as data grows by quite a large amount. Okay. That's all I have to say on Nest Loop Join today. I'm going to be taking questions throughout I feel that's a better format for this kind of presentation than Nest Loop Join at this point if anyone cares to ask them. Okay. No. All right. I'll continue to the main topic under discussion. That is the duality itself. What is a duality? So mathematics it's a general term that refers to a relationship between two things that is somehow symmetric. This is perhaps best illustrated by this image you see here where on the left we have a cube frame with an octahedron dual inside of it. So the frame itself is, this is the cube this is something whereas here we have an octahedron frame with a cube inside and you get from the cube to the octahedron by way of drawing in the middle of every polygon or plane you draw a dot and you join them. But in fact you actually do exactly the same thing over here to get back to where you started. So as with this kind of illustrative example trying to convey the high level message that you can sort of understand hash join and merge join in terms of each other almost. In fact you could say that hash join was a response to merge join which was in general available earlier I'm not really talking about history history of Postgres so much as the history of relational databases in general right now. I'll tend to without acknowledging it jump from one to the other so I'm taking a small amount of artistic license in this area. Anyway so that's the big picture here perhaps that seems a little abstract to this point hopefully it'll make more sense towards the end. Okay let's start with an explanation of hash join. Hash join first we build a hash table on what is almost certainly the smaller of the two relations input tables. The hash table consists of the join attribute and the hash key for each row alongside the row itself. So we also having built the hash table we must then scan the larger table and we perform probes of the hash table lookups of this ad hoc hash table which was created just for the purposes of this one query and as we it might be that there's no in fact it often will be perhaps that there's no thing found in the hash table because if it's a join there just wasn't anything found on that particular value that you're joining those particular pair of attributes so you continue and then sometimes of course you will find things to join and then you'll output see that again in a minute. There's only sports equity joins for the simple reason that well that's how hashing works you know you can only use it for equality so this works particularly well when input is smaller so it's a lookup table now I've already said I've already said that it's not necessarily the case that you're joining on sometimes I'll say you're joining together two relations, two tables but in fact with hash join it's probably more realistic to imagine joining table A which has the tuples fed in in the first place using say a bitmap index scan or map heap scan which is sort of like halfway between an index scan and a sequential scan you might say so it's actually a little bit of a little bit of A and a little bit of B the output of each A might be a lot larger than B and so the join could be very poorly it could be that there's a very selective condition, join condition and thus most of the hash probes would would determine that you shouldn't join for that tuple and this is actually one case where hash join tends to do very well because there's no reason why the subset that you're initially taking up from table A should be at all correlated with table B in general there's no reason to think that and often there isn't depending on the exact details of the querying there's also this capability it has around spilling in the event of having not sufficient work to continue without so you can spilt a disk you can use temp files because of a lack of available working memory obviously you prefer not to do that because that will be slower but it can happen you'll see that in explain and analyze output so I'm going to go on to an example of this again this is from the same explain introductory chapter of the postgres talks so once again you could easily do this yourself if you're so inclined so this is actually very similar to the nestflip joint example the difference only being that before we saw unique one was less than 10 and now it's less than 100 so it happens to be that it's less selective so we're doing the expensive thing up front and building a hash table on a smaller side and then reusing it so we cannot start returning stuff until we're done here sorry, excuse me I misspoke okay there's a little bit of math here as possible as well and this illustrates how hash joint is in some way similar to nested loop joint except the index if you will you're using is one that you've created on a hot basis by building your own hash table there and then as opposed to a pre-existing index vitri index that you've explicitly created it's like you're creating your own index for that one query that one time which is actually quite a lot different but there is that similarity so we can we can see here there is some additional math showing some additional simple extrapolation of what the cost would have been had instead the optimizer to determine that we ought to have used nested loop joint it would have been more expensive actually that's an only approximate value probably higher than that but it certainly so we can see how the optimizer might have converged on that plan we can imagine what it might have been like had we forced it to use a nested loop joint perhaps by way of the you know we could for example used set enable nested loop joint equals off which would have had the effect of forcing us to use a different plan which might have been or sorry I meant to say hash joint off so we could see then perhaps the would be the would be nested loop plan which would have been similar to the first one except it would have been more expensive than this which is of course why the optimizer determined that it shouldn't do that okay I'm going to talk about merge joins now so merge joins expect sorted input on both sides so an actual sort node may appear on at least one side of the join sometimes both the input is again it's possible that if you've got a sort node it can spill the disk so it's a little bit like the situation with hash join except it's not exactly a given that you would use a sort although it is somewhat common I'm going to give you a similar example now for for merge join this is obviously a contrived example if it was things are really this small just by virtue of the fact that the start of cost would be significant this wouldn't actually happen so this is obviously a toy example for the purposes of illustration although the query plans you've seen those are not toy examples those are actually real examples so okay so we start with some input okay there should be a sort step for for that and rejoining against cities okay so they're both sorted and having having done that which is of course the major cost assuming an actual sort rather than an index scan that's ordered uses a total ordering of the underlying attributes the sort would be the major cost if we're imagining that these would use an explicit sort step in both cases in both sides of the join so again we've got a join output we're showing here and we go through we sort of have this merge reconciliation process we've got two pointers part of the sort of input the general rule is that we always move the pointer that's pointing to the strictly smaller value so I'm going to continue so going through the strictly smaller value goes down that's not equal so continue now 3 is equal to 3 of course so we output a single output and we proceed again smaller value and so on so we actually have skipped some things because the join was not indicated in a couple of the tuples that we went through okay so merge join more flexible than hash join in some ways it's not limited to an equity join and it is so you can have like range so it's not as the things that are the semantics that you could conceivably get a merge join for are not as limited okay another explain from the docs this is a different table 1k rather than 10 k 2 so this is similar number of tuples in this join this is why we see a merge join because there's the inputs on both sides are similar in the total amount of data being processed so there's unlike a hash join the hash join we saw before where there was a clear disparity it's a fairly similar number and though same merge join is indicated as the cheapest plan in the universe of possible plans to the optimizer we're doing the expensive things up front again so in that sense the sorts in this actually there's one sort here and one index again the sort itself would be done up front so we're doing an expensive computation of front which is bound in size by work amount so there's a similarity there okay anyone have any questions on hash join yes I'm getting to that I'm getting to that it's that's like about two thirds of the way through you're not necessarily expected to sort of grok that yet hopefully it becomes clearer okay so we're talking about the introduction and popularization of hash join a bit of a history lesson this image here of a deck alpha from early 90s sort of the first super scale processor I think it was also the first 64 bit processor this image just intended to illustrate the times and how they were changing at you know around the early 90s which is approximately when hash join became popular hashing of course is a technique for implementing a dictionary abstract data type for example it's been around basically forever it wasn't until much later that it became practical to use it to implement joins in relational databases and in general this is why but a quick jumping off point again let's go back 20 years to say the 70s now memory size is a major consideration this image you see here is from Donald Coon of the Art of Computer Programming it's a pull out section volume 3 sorting and searching it shows the scheduling of something called polyphase merge actually only one of them is polyphase merge they're I think it's the one in the middle and it's showing you how when we don't have enough memory to sort and you certainly wouldn't have in the 70s most of the time if you were doing a merge join you do sort of a small amount of stuff and then you spill that out and then you continue and you sort of that and so on and so forth until at the end you merge them all together now because you got so little memory at this point in time you then end up with you then it's probably infeasible for you to merge them all at once because you only have enough memory you need at a minimum to have enough memory to store one tuple from every run which is what these little piles of sorted bits are clearly you'd need at least one from each out of time in order to merge them using a heap or something like that and certainly at this time that would have been a limiting factor much of the time so this illustrates in respect of various different algorithms each represented by a different different row how the scheduling would occur such that I'm going to merge this set of them and then that set of them I'm going to merge those together and those together this is just showing how that would work out in practice for illustrative examples so stuff is kind of as old as the hills memory size then a major consideration so this is actually kind of this very old school Knuth himself sort of admits it in so far as he says let's pretend we're in the 70s so I'm not telling you this because I'm romanticizing this or anything like that I'm getting to a point I I'm not one of those people that goes to I mean nothing against them goes to a computer history museum or anything like that so that's not kind of where I'm going with this in case you were worried it's more like isn't it interesting in illustrative and even relevant that history is perhaps repeating itself so at the time we had a limited number of tape drives we couldn't have as many tape drives as we had runs I also mentioned we couldn't have as much memory we could in general have enough memory to be assured that we could do everything in one pass and then scheduling becomes important we need to use what little memory we have optimally across the various stages of the sort such that things are well balanced throughout these days though it's it's not like we've it's certainly still very possible to have to do a external sort you're never going to have enough memory or you're never going to care to budget enough memory such that in the very worst case you'll still get an internal sort it's actually not a great reason to anyway especially these days some of the work I've been doing on sorting external sorting it's probably better off using that for shared buffers or whatever so the reason why we don't have to care about the scheduling stuff at all really these days is because we may not have enough memory but we certainly have enough memory such that we can be assured that at the end we'll have no more runs than we would we won't have enough runs that we won't be able to do everything in one pass we'll certainly be able to easily exceed the amount of memory for our sort that that's practically assured it's not actually assured it could still happen but it's not really worth considering cases where you don't have even that much memory it's just too marginal these days with memory capacity being what it is in fact that's been the case for some time so this is like the last 20 years really or it's certainly 10 and this is because every time memory capacity doubles obviously the size of runs doubles with it but what also doubles is the amount of memory you've got for your merge so effectively your capacity for doing things in one pass quadruples as the capacity doubles so it's doubled a lot of time since the 70s and here we are I should also mention that grace joins hybrid hash joins didn't have any equivalent to this their cost function looked quite a lot different so you know we did tend to see there was no plausible alternative for when you had very little memory you wanted to do hash you just you couldn't really do anything that was good enough because you had so little memory relative to your input that it was infeasible it only makes sense to do some amount of spilling to disk with hash join when it's not that far off from being feasible to do everything in memory certainly you know not orders of magnitude okay so as I said hash join introduced because of trends in particular the trend around main memory scaling the increased main memory capacities and then hybrid hash joins have a plausible and good enough spilling mechanism that the performance penalty when you don't have you know tens or even hundreds of runs it's not going to happen it's going to resume and thus hashing is on a much more competitive footing with word join it becomes feasible it becomes possible I want to mention here specific optimizations to hash joins what are the authors in the audience actually that's Thomas so this was basically around clever management of memory to avoid cache misses use of memory chunks there's also an optimization called Robin Hood hashing in Postgres 10 and then there's parallel hash join patch Postgres 10 it might get in the next number of days but I wouldn't count on it it's just we didn't we got out of time for that one okay so sorting did actually benefit as well of course it was still the case that sorting was necessary we would still have some inputs that were more or less equi-sized and very large and so on so and very not selective so we would have to actually ultimately join most everything against most everything else on the other side so it's not like merge join is obsolete it makes come back this is an illustration from a paper that greatly owns my work on sorting called AlphaSort just showing you I didn't work in 9.6 on on sorting I made it so that the quicksort algorithm was used even for external sorts and not just for internal sorts part of that there was a replacement selection algorithm was used which uses a heap even for the generation of runs with a view to maximizing the size of runs on average it'll manage to make them twice as large as available memory by sort of juggling them for this juggling incurred quite a lot of cash misses misses and that has a very significant and at this stage in history growing penalty so here the authors argue well this is really cash inefficient particularly if things are not you're jumping around all the time and there's no way this could possibly pay for itself with increasing cash miss penalties memory latency becomes a major consideration starts to become major consideration around this time so sort performance has to weigh cash performance to a great degree to address this bottleneck and does and has when one of those methods is as I said quicksort there are others they're all in that sort of general vein of let's address memory latency bottleneck let's look at where the cash misses are and make them not happen using various techniques so memory capacity is not the first concern I already said no one really cares about that you know multiple paths polyphase merge stuff anymore elegance not the first consideration so we're kind of it's kind of brutal as to where we're focusing on really simple code with really tight loops we're not worried about complicated elegant algorithms that isn't where the gains are to be had this is another image from that same paper this is from 1994 and even here we see it's kind of acute picture I dare say some of you have seen similar things on the web this is an early example of let's use an analogy to illustrate the relative costs of memory main memory access cash access etc so I'm just putting it there for illustrative purposes it's not especially pertinent what those details are to get the idea so some stuff I'm sorting in 9596 abbreviated keys this idea that you take the first little bit of a string and you so when you're sorting in Postgres you're sorting something like say an index tuple or heap tuple or something like that and you need to follow a pointer so you're sorting sort tuples they're called which is a fixed size structure that has a pointer to potentially a heap tuple or something like that so what we're doing here is instead of necessarily doing that every time if it's a pass by value type such as text we are instead ahead of time as we're bringing that stuff into temporary memory ahead of sort we're generating at that same time an abbreviated value say the first few bytes and that is stored in line alongside the pointer and this would have happened anyway if we were sorting integers or something because those can fit there but text can be arbitrarily wide or almost arbitrarily wide so it generally pays to take the first few bytes and store those there and then attempt to get away with having most comparisons resolved at that level and if and only if we can't manage that for comparison only then do we do the pointer chasing and that's all we need to do so that was again another sort of example of memory latency memory bandwidth or really what we need to focus on it's not a very elegant technique certainly but it's just what we need to do okay I worked on parallel creating extra Postgres 10 that's not going to get in there might be some more discussion on that but I think in the same vein in that it turns out that parallelism and optimizing for the memory hierarchy are very much interrelated problems we cannot address one without first addressing the other there's also something called unique joins I don't want to get into that too much it's something that's going into the optimizer it's probably going to get committed to Postgres 10 concerns realizing that one side of a join can be a merge join can have a unique path and thus we can know that once we we don't have to go back up which is currently necessary and so we can that realization can result in more efficient plans that we're here for not possible that's actually a little bit abstract but I'm mostly putting it there for reference purposes by the way you should ask me at the end if you'd care what that actually you cannot see you can't make that at URL out but ask me if you want what that is and I'll read it out for people okay so now I'm talking now I'm entering a more speculative phase of the talk about you know my I guess you could say prognostications based on readings of papers and things like that where am I going with all this okay so this image you see here is from the stream website stream was a a benchmark I think to do with Postgres in particular but it's a benchmark for establishing the memory bandwidth of computers and this little image which is a log scale illustrates how this memory bandwidth bottleneck has become extremely acute over over the years CPU speed has increased enormously DRAM speed has not kept pace and this basically justifies my total emphasis of cache misses the difference is so enormous that it couldn't possibly not be the right place to look almost all of the time certainly as far as sorting goes and possibly also hashing multi-core error problems have their own challenges parallelism of course very important these days we all know this but then there is to some degree as a consequence of that and to some degree it's the other way pursuing parallelism as you know the CPU architecture community pursuing parallelism is to some degree as much as anything else an attempt to address that bottleneck which is not intuitive or it's not something that people tend to talk about as much as the simple idea that you're getting more things to happen at once so I've already sort of mentioned the context of sorting that I think main memory scaling the increase in the size of main memory as much as anything else and probably more than anything else the kind of transformative technology of the last 30 years it isn't SSDs or whatever people tend to talk about that stuff and it's fine but I don't think it has the same significance as the simple and boring fact that main memory capacities have increased as much as they have however this trend seems to have stalled because of power management issues it seems now that we're not going to reliably continue to get larger and larger memory capacities every year there are problems with the refresh rate of memory and so this seems to be a situation that cannot be sustained barring any kind of major innovation which certainly is possible but certainly cannot be relied on so core counts are going up but we don't see we don't necessarily see it's a matter of dispute to accept this is true a sort of concomitant increase in memory bandwidth per core so if you consider things in terms of serial execution performance it's possible then that you upgrade your server a slower execution of your query which is of course precisely what the opposite outcome to the one that you spent money trying to get so as much as anything else parallelism is justified by the memory bandwidth bottleneck the only way the CPU people were able to increase the amount of aggregate memory bandwidth was through parallelism so that's really probably more than anything else what's driving it so and the the implication for me say as a performance orientated postgres hacker is this needs to be the first thing almost I'm considering even with parallelism they're very interrelated it's tricky I also I didn't mention I didn't read it off the analogy has been made before it's like driving a Ferrari in New York City that's apt so parallel query it's a big thing now there's a talk I think tomorrow from Robert Haas about that and where it's going you might get a go to that as well so parallel join becomes more important than ever simply to avoid regressing performance or at least we can imagine that being possible if it isn't already that's a plausible thing that could happen even if it hasn't been directly observed as yet I'm not aware that it has but I'm also not aware that it has not been so I mentioned already CPU vendors have some ideas but it's not clear that they'll ever be able to over overcome the challenges I guess we'll find out so anyway and you asked me about the duality I'm sort of getting almost edging back towards that topic so sort versus merge could there be a merge join resurgence this is a speculation obviously so the meta observation the high level observations I made about the relative merits sort merge join hash join may need to be revisited in light of this new bottleneck that's become more and more of a problem if it was the case before that we had a relatively small amount of memory and a lot of disk which was slow it might now be that you know if caches the new memory then the implication perhaps is that memory is the new disk and thus it might further be the case that in light of that you know some of the relative importance at least of sort merge join might once again it might once again be that it becomes more and more important to query performance in general and that it sort of it's something that perhaps requires re-examination in light of these various trends obviously this is not to say that that there will ever be you know I'm not for a minute supposing that we will ever reach the stage where one will be the winner that will be absurd because of their fundamental properties that's ever going to be something that could be disregarded but the relative shift could actually occur it's in the realm of possibility to what degree I'm unsure what I do know is in principle if you extrapolate into the future not that you necessarily ought to but if you do it follows then that if you take it to an extreme enough degree it must be true that the if this isn't too abstract that the finally you end up at a place where sort merge join becomes much much more important well then that'll happen again I don't know but it's a reasonable speculation and many have speculated just that there's a link here to a paper discussing that there's a number of papers this whole question has been one that the database research community has at various times visited and then re-visited there's also other avenues for improving sort of performance that may at one point be evaluated for post-gres SIMD is something that could plausibly be applied to sorting but not really to hashing owing to the fundamental nature of how both of these two things work how am I for time? okay but we'll have to be quick sort I mean you can plausibly have a sort of you can structure things such that there's an accumulation of work with sorting you cannot really do that with hashing and thus if we extrapolate far into the future which is perhaps not you know I'm not supposing I'm not necessarily asserting that that will happen but logically we can imagine a time where that becomes the overwhelming absolute consideration to the extent that the fact is that it's inherently there are inherently more cache misses required for hash join trying to wrap it up now getting back to the duality aspect so here we see an image similar to the one earlier in the talk it's just that we have more complicated polyhedra here than we saw it the first time around but the principle is the same so when I say there's a duality I think more than anything else I would sum that up by saying that for sort there is a physical division and logical combination whereas for hash there's a logical division but physical combination this is the fundamental truth that I keep returning to without necessarily being obvious that's the intuition driving all this stuff if that makes sense so that's the big picture here it's less clear of course what the practical implications will be we'll see I suppose that's where I'm going with all of this and that's the overarching kind of theme if you will of my talk in summary locality, locality, locality memory locality, cache misses it's really important stuff external sorts degrade slowly under predictable rates whereas hash joins will tend to degrade as memory becomes less and less available either because the input increases or the amount of memory decreases it's much faster a rate of decrease so going back to what I was saying about the history of those two things it shows if you have a one pass sort in PostgreSQL depending on there's a lot of variables but there's certainly cases where that'll only be a little bit slower than doing it all in memory even though you might only have one tenth of the memory in fact I've even seen cases for various obscure reasons it'll be faster it's actually not that uncommon probably in the latest version of PostgreSQL you'll see more of it that is what I actually need is PostgreSQL 10 which is not out yet we're joined it works with low selectivity joint qualifications where it is inherently necessary for us to join most of A against most of B there's no getting around it so the overhead per tuple joined is much lower but there's a huge startup cost there's you know huge cost in general because the join is just inherently expensive and there's no way around it obviously we'd prefer to avoid it by for example reordering the joins within the optimizer but when that doesn't work out when we really do have to do a really expensive join with a lot of input on both sides that's roughly equal that's when we use merged one okay also with the final on the fly merge step of a sort merged or rather a sort that's having to spill a disk in PostgreSQL we can produce our first output before we've actually finished the sort because the merge itself of course is doing some out of sorting at the end so sometimes that'll make the startup cost lesser than even an internal sort which could as I've suggested already lead to us getting a faster overall performance even though intuitively that shouldn't be the case it just so happens what with all these various different considerations there are in fact cases where these days at least it just so happens that the external sort could be faster and not even possibly by more than 10% I've seen cases it's not that infeasible okay hash join okay gotta wrap it up quickly hash join works best with low carton alley attribute joined from the smaller table so you could have a lot of the same value again and again as I said earlier you could have a highly selective join qualification it could be what would work best so mostly joining some random already excluded the other values of table A so it could be like a it could be a bitmap heap scan that you're actually getting both sides of the join from and it could be that the it turns out that most of the hash table probes don't indicate that there should be any value joined because value not found could happen again and again now you might generally imagine that it would be far so that couldn't happen but of course you've already excluded before you even built the hash table perhaps much of the rest of the table so there's no you know because it's a random perhaps a random subset on each side it could be that it wasn't all selective when you went to do your hash probes your hash table that's where you really do well and also when one side of the join input is just smaller overall and it doesn't you know it's very sort of random the memory locations you're accessing and that is an inherent property of how hashing works for the reason I think are relatively obvious and that's all I got anyone got any questions oh okay well worked out yes I personally would not be inclined to because I'm busy but someone has there are theoretical models that support that contention that you could if you look at the slides there's make sure to write down on the pdf and don't use the web thing because it's better the popular links work only in the pdf so I'll read that out for anyone that's interested so it's speaker.com slash peter g slash sort dash hash dash pgconf us dash 2017 so you can review their thoughts on it I think it's obviously speculative but it's I would say at least an interesting speculation so I'm obviously a practitioner that's thinking ahead but I can only think ahead so far given my constraints on time and stuff like that that's where I see myself going with it if anywhere at all that's the only kind of frontier that is remains okay thank you thanks