 Hello everyone. I'll be talking about how to improve the performance of order by limit queries in MariaDB and what has been the history of order by limit queries in both MariaDB and MySQL. I work as an optimizer developer in MariaDB Corporation. I've been involved in the company for the last three years almost. So let's start. So now this part would be the history and then I'll talk about what is new in MariaDB 10.5. So to handle order by limit queries, what are the ways which we can do it? One thing is that we use an order index. The order index can be used in three ways. You could have a ref access where you could have some key part equal to constant and then with the remaining key parts, the rows would be ordered. Something similar would also happen with range scan. And with index scan, you cannot have an equality. So it would just be a normal index scan on the index. Second case would be file sort. File sort would have two ways how we have it. Either you sort the first table and then do the join with the remaining tables or you put the entire join inside a temporary table and then sort the temporary table. So this is the case, the first case that we will use with an ordered index. So the records from this table would be ordered by the order by clause if the index can resolve the order by clause and then you perform the join with the remaining tables. The important thing is while doing the join, you cannot use join buffering. An example would be because join buffering breaks the ordering. So you have to make sure that join buffering is disabled. Second would be if you have a descending clause in the order by clause, then you have to make sure that descending is for all the key parts of the index that is mentioned. Also with this approach, you can shortcut the join execution. You do not need to compute the entire join. You can stop as soon as you get the number of records mentioned in the limit. Now this would be the second case where you apply file sort on the first non-constant table. So here you read from table T1 and pass it to file sort and then you read the output of file sort and then perform the remaining join. So it is almost similar to how we have with an ordered index that again you cannot use join buffering. Another important thing would be any condition that is dependent on the first table would be computed before sending the output to file sort. You see if this approach is working in explain for example you will see using file sort in the first column of the first row in explain and again this is also another way in which you will be able to shortcut the execution with limit because the rows would be ordered already. Now this comes the third part where we try to perform the entire join first and then put the records inside the temporary table and then pass the temporary table to file sort and then we get the ordered output. To apply the limit you cannot shortcut the join execution. You have to compute the entire join and then only when you do file sort then you can shortcut the execution of file sort that you just as soon as you get the limit records you tell the file sort to stop and you are done. With this case there is no issue with using join buffering or any type of join order is not allowed but the main drawback for order by limit queries is that limit cannot be applied before and you have to compute the entire join and only then you can apply the limit. So this will this could be very inefficient for smaller limits and in explain the first row again shows using temporary using file sort which means that you are using temporary table and then doing file sort. Now what do we currently not have in the order by limit optimizer? We currently don't take the cost of sorting limit is not taken into account and the only way in which we are able to shortcut the join execution is after a join order is picked we check if a use of an index can be done to shortcut the join execution or if there is something like equality propagation you could do so that you could just file sort by the first table that you picked by the join order here. But the join planner doesn't take limit into a consideration these are all the things done after the join planner has picked the join plan. So here is an example we have a order by a particular column on which we have an index the join optimizer picks a strategy where we use to compute the entire join and then do file sort which has an executions time of 25.2 seconds. Now if I try to force a particular join order using straight join then you can see that here a t fact is using an index on column 1 to do the join execution and the execution time is like 30 milliseconds yeah. So if the join optimizer had any idea that with this index we could shortcut the join execution we could have come up with a better execution plan. Now here is another example where you have again order by t0.d but you don't have an index on this column the join optimizer again picked an approach where it was using temporary table plus file sort. Now if I try to force the join order now in this case it is using file sort on the first table and again it is because it is using file sort on the first table the output is ordered it can just go ahead and shortcut the execution as soon as it gets the limit record. No need to compute the entire join again. Now what is new in MariaDB 10.5 is we are coming with a cost base optimization where the join planner would take into consideration both the cost of sorting and the limit. So motivation wise both pushing the limit and cost of sorting and again the main thing that I have been emphasizing we want to shortcut the join execution for limit. So to consider the limit we try to push the limit now how the limit is pushed is we try to find a join prefix that can resolve the order by clause that is the prime condition that you can sort the prefix with the order by clause and then you can just push the limit. So inside the join planner code we apply the sort operation here and then we push the limit. So how is pushed the limit calculated? So what exactly is meant by push the limit is you find a prefix by which ordering can be resolved and then the records in the prefix when you push the limit you will only read a fraction of those records and this fraction is how this fraction is how we consider by taking limit into account. So this is the prefix which can resolve the order by clause. This is the total join cardinality and then you have limit. So this is the number of this formula would give us the number of records that we would read from the prefix that can resolve the order by clause. So to push the limit first you need the estimate of join cardinality. So we run the join planner once to get the estimate of join cardinality currently. Also you have axis methods like indexes that I was talking about that already have the ordering that have already all the records ordered. So these are also considered during the join planner stage. Now this will be a bit technical. So for each joint prefix you have two conditions. If that prefix can resolve the order by clause you can either push the limit at that very instance or you can extend the prefix by another table and then push the limit. Why do we do this is we want to consider all possible plans where we can push the limit. Also equality propagation earlier equality propagation is always done after the join planner has picked the plan. Now equalities are propagated before. So if you have a joint prefix like this T2, T3 and you have an order by clause by T1.A and there is an equality. So we would consider pushing the limit both after this plan and after this plan. So the equalities of the where clause are propagated and considered by the join planner. Now how does the join execution work? So as soon as we have a prefix that can resolve the order by clause we try to materialize the prefix and put the records inside a temporary table. Then we sort the materialized table with the order by clause and then from the output of file start we start reading one by one and then we join with the remaining tables and again the execution again stops as soon as the records are found. So again by this we are not computing the entire join. So this is how the execution path exactly looks. You have tables in the prefix, you materialize the result and put them inside the short nest. Joint buffering is allowed for all these parts because these are coming before we are performing the order by clause. Plus conditions, the conditions that are dependent only on these tables are checked beforehand, rest are attached to the later tables. So the short nest you have, you push, send the records to file sort and then you start reading file sort output and join with the remaining tables. Again now it is similar to the example where we had, we had file sort on the first table and then we were doing the join. So again no join buffering is allowed over here and the execution will stop as soon as the limit records are reached. So here is an example where we will be using a short nest. So you have an order by clause on two tables. So if you see in the explain it shows a short nest here which is materialized result of nation and customer table and so this is and then you can see that we are applying file sort on the short nest. Another interesting part is if you can, if you look at this condition for order tables it will not know anything about the customer key. So it needs to refer to the fields of the short nest and not of the base table itself because the base tables are now not at all over there they are all referring to the short nest table fields. So this was one of the examples that I was demonstrating earlier where I was showing the limitation. So we have again the same case where we have a key on column one and this was earlier using temporary table plus file sort but after this optimization it is able to use an index on column one to do the join execution and it provides a 1900x speed up for this case. Also here is another query where with the dbt3 data set which I created here. So again the execution time which was picking temporary table plus file sort earlier is now picking an index to do the join execution and short cutting the join execution as soon as the limit records are reached and again providing a significant speed up. So the most important part for the optimizer to pick any plan is the selectivity of the conditions. This approach that MariaDB has taken for order by limit optimization heavily, heavily depends on the selectivity of the conditions. So if someone wants to use it I would suggest at least use histograms to provide the selectivities to the optimizer because otherwise the estimates can vary a lot and when we push the limit then that can give us plans that are not that good. Also another limitation is that we have few predicates for which selectivity is not known. So what we try to do is we predict the selectivity as 100% but we have discussed internally in the team to maybe add some penalty for such cases because if this condition is highly selective then using a shortness or using this new optimization can be harmful for us and again this part that the estimate of joint cardinality are very pessimistic. We over predict everything so again when we push the limit then our estimates can be very optimistic instead of being pessimistic. Yeah that's all. Thank you. Questions? No, I didn't get the complete question. Can you repeat the question? Yeah? Yeah? No but the where close would be evaluated beforehand before you do the order by close if the tables are inside the shortness. That is the case that the conditions you have in the shortness more rows and then when you are joining with the remaining ones you are not finding enough rows to match that may be an overestimate case but if you give the optimizer the right selectivity estimates I think so it can make a smart decision and maybe include the order stable inside the shortness. Yeah if for such cases then I don't think so this optimization would work that well.