 Okay, so let's get started. Thank you everybody for coming. My name is Vicencio. I am a software engineer for the MariaDB foundation and Today I'm going to give you a brief overview of the new developments in optimizers We've already heard Oisten's talk about histograms. So that's something that MariaDB has in a different format in 10-1 and onwards And now we're going to compare a few other things that optimizers can do So first of all, what is the goal of the query optimizer? So we want to get given a certain sequel query the fastest time possible when executing it And we have different tools to do that. We can choose to either cache something some data before we actually compute the full result we can do some index lookups and Get your that data faster this way or we can try to rewrite your query in a more efficient manner So that we can query it faster that So now problem is that the number of possible plans grows exponentially with the number of Tables it's actually factorial number of tables, but let's say that it's exponential and In a perfect world the query optimizer should be able to give you the best result all the time Unfortunately, that's not the case for many queries. It does happen But there are certain instances where it works quite badly We're not going to be too concerned about those for now, but Any optimizations that we do we try to reduce the number of possible queries which behave poorly so in order to explain what kind of optimizations we're doing in Recent developments. I'm need a bit of background So first of all a derived table in my skill is any sort of table Which you can find as part of a sub query in the from clause When I whenever I say that if the right table, it's either this or a view as part of the from clause So a view is basically the same thing. Here's our query example We want to get all our Customers which had orders With which had large orders so greater than 1 million entries in their order There's only a few customers. So that's why they're VIP customers And the orders have to be from October start to end 2017 How would you execute this in a naive fashion? Well, we first compute the result in this table here So this is October orders. We just do a filter on the orders on the date And then we try to join this table with VIP customers Using the join condition the amount must be greater than 1 million so here's the set of October orders that have amount to greater than 1 million that part will enter the join with the VIP customers and You get your result however, we can do better than that and this is something that my skill and Marie DB and Obviously Percona since they're based on my skill also do we rewrite this query because we can and This data this condition here It's only based on orders And if we move this condition outside of the subquery and just replace the custom table with the orders table We're going to get this query So now the optimizer has a more general view of how we actually want to query the data We have all the conditions in one place And this leads to a better query plan If you do explain you will see that we only have two tables visible VIP customers and orders We don't see October orders anywhere in the explain plan and the execution goes as follows You get the orders Then we get the October orders and the amount There's an intersection happening. So there's not There are orders that are not in October which have greater than 1 million amount Only the intersection of all these conditions will enter the join operation and then this gets joined with VIP customers So instead of first computing everything here We just do one table scan of our orders and one over VIP customers There's there are some tricks to buffer results, but I'm not going to go into too many details details here So conclusion is that merging is good. It simplifies the query and As I said, it works in all stable versions of my SQL or Maria DB However, there's a problem when we when aggregation comes into play We can't use it when aggregation happens because if you rewrite a query which has an aggregate function in the sub query Results are not what you expected so here's an example we want the October totals so all the the number of orders the number of items ordered per customer in October if we were to join this query inside the Main one you would actually start aggregating stuff in This one and you don't want to do that. You actually want the totals For one customer in this case, which is customer ID equal one But we have group by customer ID so We can make use of this here All right, so we have lots of customers. We don't want to get all of all the totals just for one customer So we make use of this group by column here and we identified this condition customer ID and We push this condition all the way to the work loss here because we have this group by column here So now when aggregating we only aggregate one customer So we get our orders. We get only customer one orders and then we sum those up So instead of doing aggregation over the full data set we just aggregate one customer this optimization is available in Maria DB 10 10 2 and This tactic also works for window functions if you were here from the start you would have you've seen that we have window functions in Maria DB 10 10 2 and There is also window functions coming in mysql 8 0 So we can push the same condition if you have a window function With a partition by clause here. So partition by means that you compute This rank sequence for each customer you can see here customer ID equals one has orders one two three order ranks Same for customer two and so on and we don't want to have to compute this rank column for all of the table If we're only interested in customers and customer once top three orders so We only want this here and Because we have this equality condition here and there is this partition by clause here we know that we know that we can simplify the query and Push this part of the condition all the way up here in the work loss of this Query so here. So if you if you push this you only get to compute this part and this information is not not even accessed Okay, so if you compare how execution would happen currently in Maria DB 10 2 which is the stable release mysql 8 0 Which has window functions? Mysql 8 they would compute all three top top three orders for all customers and then you would get the customer The rows for customer ID equals one In Maria DB 10 free and also in posgres. We've also tested how posgres does it you would only compute The top three orders for customer ID equals one and in some cases this can be much faster Because we can make use of indexes to identify the rows for that particular customer Makes sense so far Okay, either I am very clear or nobody's understanding anything at all So final optimization I Think the name for this is not quite fortunate. We're probably going to change it So for now we call it split grouping for derived So what happens here where we have a similar query to the one before we have a group by statement But because the there is no equality We don't filter by one customer ID. We have a range of customers See we have customer name in John and Bob, but we don't have customer name inside orders We have customer ID Okay, so so we have the joint condition here customer ID equals October totals customer ID And we filter by customer name So we can't do condition push down immediately because there is no Column customer name here however, we can be smart about this and Actually, let's first look at how we are not smart so when we're not smart we do a sum of everything so for all customers we get the sum and Then we this means we get the October totals and then for each customer We filter only Bob and John for both and join them So we have to do aggregation over the full orders table again, not good. This gets big very fast As you see we don't need customer X When we're smart We actually look at customers first We see which customer IDs we are allowed to use and we only aggregate those Customers so we can compute a sum for Bob and we can compute a sum for John and This way we completely ignore this part of the table as well as this part of the table there is some requirements for this though in Or to actually do this we have to have a Joint condition here, so we have to have something that links orders to customers in this case customer ID is linked Is this is the column that links the two tables and we also must have a Index on the group by column to be able to quickly check if the customer that we are looking for is actually in the table So here we want to have an index to check if is Bob in orders But Bob is Bob Bob's ID in orders if you don't have this this optimization won't actually work right now It's still in the beginning phases So we haven't really optimized this for all cases We only apply it when we're definitely sure that it will bring a speed up Otherwise, we are very we don't use it because we're not sure if it's going to produce a better query plan and To look up to see how features stack up. Well, we have derived table view merge This is the first thing we talked about in all stable versions Condition pushdown is a Maria DB only specific feature window functions are coming into my skill 8-0 and I haven't personally checked, but I believe merge Actually works with CTs too in my skill And also condition pushdown for partition by and split table grouping is only available in Maria DB 10 free So we're hoping that my skill can also Implement this because if more more users get optimizations, it's better for everybody Obviously, this is not a comprehensive Comparison I'm not trying to show that my skill doesn't have any features or we that we have so a lot more This is just on the features. We I talked about during this talk. Okay. This is actually a a Overview of how things work. This is basically what I talked about if you want to read reread this Somewhere, it's it's available. I will make I will post this talk on On the fuzz them links so you can you can download it there Okay That's all I had Thank you very much. Yes If there's an I actually have more slides possibly but it's a bit difficult if people are interested I can talk more about it offline Yep, go ahead. So is it this this one? I'm not really sure I get your question. So let me try and rephrase it. So The question is why don't we have a Wear clause inside the view view here, right? Well, for example, if you try to make an application for users which Which works for multiple Say say you're in a company and you want this view to work for multiple departments Maybe you don't want to filter by a specific user for that department. So that's why Well, maybe the example is not the example is to showcase the optimization possible It's not to show that this is the way you should actually write all your queries Okay, yes, you can optimize it manually, but the idea from the start of the talk is this so In a perfect world any query should run fast Okay, so that's what we're trying for even if you shoot yourself in the foot, okay Thank you