 I will be focusing today on the optimal module, what is known in 5.7 and 8.0. So, we have two sessions in the evening about generated problems and JSON, so I will be focusing on the 5.7 part, we skip over to the 8.0, this is the standard safe hardware statement. Yeah, so I will start off with invisible indexes. So, a lot of times the DBA space is problem where they have a bunch of indexes and they have a bunch of queries and they are not sure whether all my indexes are actually relevant or they are actually being used significantly. So, one thing you can do, so what happens is that one thing you can do is always drop that index, run your queries and say okay I have some indexes that are slowing down, you do an explain on it and say that yeah that index would have helped and then you have to recreate it. Now, recreating an index is always a very costly operation. So, all you need to do here is, so all you need to do here is converting to an invisible mode and then run your queries and maybe look at your explains and if that query is not and if that index is not being used then that is good you can just drop that index. So, this is like a dry run of dropping that index without any real cost associated with it. Next we have the descending index. So, even in 5.7 you could say something like this where you could create a secondary index which was say a descending. Why is this useful? Basically, if you have a query that has ordered by a descending. So, there are two ways that the server could deal with it. One way is it will either do a backwards scan in the sense that it will come from 10 to 1 or it will bring all the data to the server and the server will do a 5 sort. Now, 5 sort is doubly expensive because one thing is you need some temporary table type structure and you also need some computational time. So, it is very expensive. Instead, if you can just create this sort of a descending type thing then you can just store all the values in a descending order and it will just be a plain read rather than any sorting or anything like that. Next we have common table expressions. So, to understand a little bit about common table expressions you need to know what are derived tables. So, derived tables are basically that you have a subquery. So, you have a subquery which is treated like a table in the sense that first whenever you give a query like this first this subquery is analyzed and the output is treated like a table with the name derived and then it is used for the rest of the query. So, there are some drawbacks here which we will discuss but with common table expressions what you do is you bring that part out of the query. You write it and then you have your actual query. So, why? Well, first of all there is better readability in the sense that you do not have to, you have select start from then you analyze this and this part will be used outside. So, all that kind of confusion is sort of gone now. Then there are deeper problems that when you are writing derived tables you cannot do chaining like this. For example, you cannot do a self-joint like this which is select start from T1 comma T1 you cannot do that you would have to write the subquery all over again. So, the query will become something like this you will have to say a subquery derived and the same subquery again and derived one. So, even for a simple self-joint type thing you cannot use it. So, what will happen here is that this will be analyzed separately this will be analyzed separately. So, for the same subquery you will be doing two levels of analysis. Instead with CTs that problem will go away you can just say derived and derived. That is all. Self-joints become very easy. Then whether it is chainable derived tables are not chainable you have a subquery derived but you cannot do this you cannot write the derived tables. You would have to actually put it in a CT. So, why are recursive CTs good? So, recursive CTs have basically two parts. The first is the seed part and the next is the recursive part. In the sense that this will supply the first part of the value and this will supply the rest of the values and will go on until as far as it keeps going. So, if you have used oracle maybe it is called connect by and the syntax is mostly similar to the one that mssql also uses. So, if you have queries there you can just quote them. Now, you have recursive CTs which are basically going from 1 to 10. You want to print values from 1 to 10 look at the last sentence. Select start from qn then you have qn this is the seed. The first value will be supplied here and this query will run as long as this condition is satisfied as long as 10 is satisfied you come down. Similarly, you have the Fibonacci values. So, it is similar to printing 1 to 10. What is the real world example here? The real world example is that if you have a table which is basically very simple and you say you have an ID, you have a name and a manager ID. How do I create a reporting structure in the sense that I report to my manager, my manager reports to a director, then VP, then the CEO. How do you get a structure like that? With normal SQL queries that is not possible. You have to use CTs and you have to use recursive ones. So, let us just go back and look at it here. The only person who has a null here is the CEO. So, this guy John here reports to Yesmina. Similarly, Pedro reports to John and John reports again to Yesmina. So, that is the kind of structure we want. Now what you can do is you can write a simple recursive CTE where you have this structure that is you have the employees where the manager ID is null. So, basically the first query, the seed query will give you only Yesmina who is the CEO. The rest of the guys will basically come from this query which is a join of this CTE. See, they explain the employees extender is the CT name and employees is the table and there is a join and based on that you will get it. So, you can see that John has 333 in the 190 which is basically their reporting structure. They will want to look at someone like Sarah, she reports to Pedro, Pedro to John, John, Yesmina. So, just by simple concatenation and recursive CTE, you are able to accomplish this. We have improved the performance of scans. So, what happens when you say select start from the application? So, what happens when you say select start from? What we do in the server is we go to the storage engine and say come on, I want all the rows that are in this table. What INVDB does is that it goes to its own B3 and says okay, do I get all the rows or do I get some rows? So, initially it knows that it needs to get all the rows. So, it will just bring in batches of 4. Now, what we are doing is we are saying why bring in batches of 4? Why do you want to tell us B3 every time? Why not just bring in batches of much bigger number? So, what we do is we supply this guy with a buffer, the storage engine with a buffer and the storage engine will bring that many number of rows. What is that number? Well, it will depend on some system variables that can be configured and the row size. So, if you have a very quick row, maybe you bring only 200 records at a time. If you have a very small row, maybe you bring 2000 or more at a single time. So, it basically saves you a lot of fetching time. What are the queries that will improve its performance with this? Not my table scans, which are basically select star from T. Range scans, which are basically, okay, I want all the values from 1000 to 10,000. Maybe this earlier would take about divided by 4, number of rows divided by 4, and now it will take number of rows divided by 100 or 200. Also, joints, but only for the first table. The second table is anyway a ref access, so you will really be helped by that. On dbt3, we noticed that with this feature, we have better performance for almost 18 queries, out of which two queries have shown around 15% improvement. So, this is a standard dbt3 value. Hints. So, yesterday we talked about force index. Force index is also a hint, essentially telling that, okay, I want you as a quality to use this particular text. There are other kinds of ways of accomplishing similar things. So, if you say, okay, you are not done, if you as a dbt3 feel that optimizer is not doing the good job of generating plans, you say this is not the best plan, and if I turn off this particular optimization, I will get a better plan. So, what you do is, you do a set with that optimization off for the optimizer switch. Then you run the query. Your job is fine there. But the problem with the optimizer switch is that it will basically affect all the queries that fall away. So, you will have to do a set, run your query, then go on set. That is essentially two round trips extra to your server from your client. Whereas with hints, you give much more flexibility in the sense that it is part of the query. And it doesn't have to even apply to the whole query. You can decide to go on some query or a particular table or something like that. So, we have new hints in 8.0, which is joint fixed order in the sense that if you say you have 10 tables. Now, what the optimizer does when I get 10 tables is I'll say, okay, I have to do a search of what is the best order. But it so happens that sometimes dbs know what is the best order. So, they don't want optimizer to spend a lot of time doing that. We can just say, join fixed order. Or you know that for a subset, it is taking a lot of time. So, you just say join order. Similarly, prefix suffix. How does it work? Yeah. So, you have this query. And you have two tables. Customer and order. If you leave optimizer for its own devices, it will do orders first. Then it will do customer. But according to the DBA, they feel, okay, you know what? That's probably not a good thing. Maybe my data is continuously changing. Or, you know, optimizer, somehow in this case is not doing a good job. Good. Just say join order. Just put this much. You put the customer table first. Order stable next. Get customer. Order. That's it. So, in your case, we put a force index here. This will be a new addition. This particular syntax will be new? Yeah. It's an 8-odd. So, this is an 8-odd 0? Yeah, this is an 8-odd 0. It's already the lab's version is out. Also, prior to this, there was no optimizer here in join order. Join is not there. Well, there was something called straight join. But straight join didn't give you much flexibility in the sense that if you had five tables, all the five tables, order was specified in that order. Whereas with join order, let us say you had a third table, something else, tp. You can say that for customer-in-orders, I want this order, but the third one, I don't care. You can put it back in here. So, it's a little more granular. Yeah. A lot of times when we have derived tables like this, optimizer will try to merge it into the rest of the query in the sense that we'll break up this whole query. We'll send the table out and we'll say join t1, join t2, and then do this. Maybe in some cases, that is not a good thing. And the DBA can just say, OK, I don't want this merge. You can just say no merge. And we won't do the merging. So the table will still, the sub-query will still be preserved. Better UUID and IPv6 support. So UUID is a standard value that is generated, that is unique across platforms, machine-based time zones. Everything, the problem here is that it's very big. The first part indicates the hours, minutes, and seconds. The next part indicates the days and then comes the months and years. So when you want to store it, you either have to store it either in a blog or in a big barcat 36, which is not very ideal. So a lot of people, what they do is they break up this UUID, they remove these extra dashes, and they just store it in binary 16. But as you know, you'd have to write a very big store procedure to it, maintain that store procedure, things like that. Now you don't need to do that. You can just say UUID to binary and binary to UUID. And you just give your value, get the value. And you can also check the validity. Easy UUID. There is another advantage here. Notice that in my previous slide, I said the seconds come first. Now, if you want to create an index on something, having seconds for doesn't really make sense. Because then your data is not meaningfully contiguous. So if you want something to be meaningfully contiguous in the sense that you want it, you want the years to come first, actually, this part. So all you have to do is UUID to binary with an extra argument, and that is done. So once this part is done, you can store it in a binary 16, and then it gets that value. And your data will actually be contiguous. So any data is not of contiguous, but it's meaningfully contiguous here, because it is a timestamp. And similarly, you have this bin to UUID with the extra argument. Why will these be important? Because a lot of times customers and users come back with saying that we have these huge store procedures that we've written just to do this particular function, or just to write this particular function. So we just want to create an index, and you create a reliable way of doing it, and this is reliable. IPv4 and IPv6 support. So IPv4, these values are generally stored in Begink, and when you apply a mask on it with an AND value, it works fine. But when you have IPv6, this is generally stored in a binary or a block, and binary and block are not supported for AND, that is, bitwise operators. So you would get incorrect results. A lot of times customers would just write multiple store procedures just to break up this value, then do the AND, and then return it to themselves. So now you don't need to do that. You can just say these bitwise operators are supported for both binary and block. So AND or all these will be supported. What is the future roadmap? There will be afternoon, we have some sessions about JSON, so you can ask questions about the advanced JSON functions that will be here, so here's the main one. Because in doing that, the cost model improvements in the future histograms are coming in with also the start distinguishing between data in memory versus this. So the cost model will take into account whether the query is used a lot. Then there are windowing functions, which is also a popular feature. Improve support for prepared statements. You can say prepare, the optimization part will be done, and when you say execute, from there only the execution part will be done. So that you won't be spending time deciding the join order, should I use range access, ref access, should I use this index, that index, all that will be gone. Just do an execute, the query will go directly to the execution. It's already there, but a lot of the optimization still happens in the execute part. We'll move completely to the prepared part. There is the optimization for GIS operations. So a lot of support for R-prease, which are used for GIS, and also a lot of new GIS functions that are part of the SQL standard. So these are the features. Any questions? I have a question. In some databases, you can store the optimized plan of the query. So next time the query will run, it will run according to that optimized plan. It will not calculate again. So is this something available in my SQL? It is available to a certain extent already. You can say prepare, run your query. So a lot of the passing part even in 5.7 is there. Even in 5.6 is there. You can say prepare, the passing, a lot of the semantic analysis, that sort of thing will be done. So if you want to just say, does this query even work? You can just say prepare. And when the execute part comes, a lot of these passing part, that will not happen. Some optimization parts still happen, like the giant analysis. But from 8.0 it will be even more, the walls will go up saying only the execution will happen. So 5.7 is still useful in the sense that if you have very big query within 100 values, you don't need to go to whether each value is correct every time you execute the query. You can do that in the prepare executions. I have a question on the recursive CTEs. Just I gave an example of the organization where the CEO, manager, and employee. So how expensive is it in terms of performance or memory? If you have, let's say, very large organization with many levels. And one other question is when you query, are you able to query certain parts of the the tree or something like that? So the first question I got, and you're saying, let us say we have a company like Oracle, which has hundreds of thousands of employees. Yes, it will be expensive. Second of all, you're doing the union all. I mean, the recursive CTE anywhere is based on union all. So union all is always expensive in the sense that you might have to use some temp tables. Internally, the server will use some temp tables. It is expensive in the sense that earlier you were not even able to do something like this. And now you're able to, you would have to know the depth of the organization structure earlier to write a very big query. But now, at least in terms of this, it's useful. And the temp tables are pretty much optimized in the sense that there won't be more than one temp table created per layer. So for example, for John and Tarek, they're both at the same level. So to generate their records, one temp table will be used. So going to the next one, another temp table will be used. And there is some repurposing of it. So it won't be very costly still. The temp table, I can use like a memory base instead of the temp table. So if it's a small one, it's generally on the heap and in memory. But if you have a very big temp table, which is created, it might be flushed to ignore you. Yeah. Is there such thing as a SQL profile or a SQL base name in my student? Sorry, I don't know what that is. I'm over at LBA. I'm thinking of this performance tuning. Is there such thing as SQL profile or base name or React if you run something with the query and Yes. That is the performance tuning. We have a session on that also. So you can monitor what is happening on the table or which particular phase it has spent a lot of time on the table. Sometimes it so happens that the query itself, the execution is very small. But choosing the join order takes very long because you had some 10 tables. We have optimized this 20 node in it and just choosing the join order. So that you can get some performance tuning. Any questions? Any other questions? You told that sometime we use temporary tables of primitive. Yeah. Did we do that? It's there in 5.7? It's there in 5.7. And what happens is that temporary tables in ODP you can actually tinker with it in the sense that by default it uses the one, the 10 tables on the heap. So it tries to create a 10 table on the heap. And if that size exceeds beyond the point it will flush the 10 tables to you know but in 5.7 you can set a system variable which will basically never use the heap, it will go directly to you know heap. So the heap we do not let it expand beyond the point. So does it more intelligently? Does it more intelligently for CTs? For CTs. For other queries there are minor improvements which are bug fixes Any other questions? So actually it's not given to us I have a question directed at Sanjay which I have booked on the last one Why does the question jump from 5 to 8 if I remember is this thing supposed to be the development? So we had we've been at 5 for such a long time so it was always 5.5, 5.6, 5. So there was a 6.0 that we had planned and that 6.0 you know there were just too many things that we had done there and we kind of lost our way with 6.0 so we decided we will take the best features put it into 5.6 and 5.7 so 6 was not possible because people get confused 7.0 is used for the MySQL cluster which is our you know in-memory database cluster it was gone and then we said ok it's only 5.8 let's call it 8.0 and then we have more possibility of making it 8.1, 8.2 and then 9 and 10 and I don't know all it was at 11 why don't you want to have major numbers? That's what I'm saying so we should have bigger numbers people feel more that they're being more innovation if the numbers are bigger I don't know this they might go to 108 for just now Arsha I have a question on the 10 table so in MySQL for the 10 table happens it can happen for each transaction it can be generated if the transaction meets the 10 table but then we'll be able to do is there any way to limit the number of 10 tables limit the number of 10 tables in things of number of 10 tables I'm not sure but you can limit the size of each 10 table using the 10 table size yeah 10 table size so you can set it to a minimum of 10 to 4 or you can set it to okay so so basically the this we will have to we will have to calculate the RAM for this server based on the number of the next heap that probably will be needed based on the number of transactions the other problem with having a very small size of 10 table is that you keep creating more of those and each 10 table will have some small metadata and all that there's a small overhead in terms of saving that 10 table so if you set it very small even that can explode is it the size of all the data files like in the Oracle database it could be the size of all 10 tables yeah that is resizing but not on the fly so once the fly is done the next way one thing to mention the last point you mentioned that there is an overhead to create the metadata right that's gone off the introduction of data so we don't persist the metadata on the disk it's just the thing is that the part of support has this advisory that is very small because maybe it's not just metadata but other internal structures that you maintain which you can also use in that case thanks