 Hi, welcome, and now we'll have Chaitra speaking about what's new in my SQL Optimizer 8.0, yeah, here you go. Good afternoon, everyone. As she said, I'm Chaitra. I work with my SQL Optimizer team. I've been with Oracle for about six years now. So today I'll be talking about the most important features that we have delivered as part of 8.0. So that's the safe harbor statement for you guys to just a disclaimer. It's not visible, is it? Okay, perfect. Okay, so that's the agenda for today. So I'll be covering some new SQL syntax that we have added as part of 8.0 and then some improvements that we have made compared to 5.7 in 8.0. So the first and the most important one we have added is the common table expression, also known as the CTE. So what's a CTE? It's an alternative to a derived table, typically used. So a derived table is something that you're going to use in your from-class, sorry, as a subquery. So you do your selects and in your from-class you're going to specify a subquery which will be treated as a derived table. Now how does CTE differentiate itself from derived table? So how do we specify a CTE? You start with the with keyword and then you specify the derived table name and then you give the subquery definition that you want to use. And then that's followed by the actual select query which uses this derived table. So and CTE may precede any of the DML operations. I'll just show you an example later on how you can actually use CTE. So when you have a derived table, why do you actually need a CTE? I have listed out a couple of reasons why a CTE is better to be used than a derived table. The first and foremost is the better readability. So whenever you're reading a select query, if you want to write a derived table, then you have a select inside your actual select which is using this derived table. Select start from and then you have another select and then you use your derived table. But whereas in CTE there is a clear distinction. You have your derived tables ahead of the actual select query that is using this derived table. So that is one thing and better readability as you know leads to better maintenance. And the second one is the it can be referenced multiple times. So you have one single definition of your derived table and then you can use it. So this becomes your initial data set and you do a union with the second select which is actually your recursive select. So the recursive select works on the first seed select and then recursively produces the subset of the final result. Usually the recursive select actually has a breakout condition. You can specify a where clause until when you want to produce the rows. So I have a simple example here, printing 1 to 10. So how do you do that? With recursive and you have your recursive CTE's name, select 1. So the first seed select is 1 and then union all and then you have select 1 plus a from this CTE. So it produces and then you have your where condition. So the first iteration will produce one row. The second iteration it will have 1 and 2, the third iteration until it breaks out till 10. And then you have your final actual select query using this CTE. But this is a simple example, but recursive CTE helps you actually in real examples. You can use it to traverse and hierarchical data. So how many subordinates are there for this particular VP or who is the underused VP, who is the overused VP, so on and so forth. So that's something that is introduced in 8.0. So the next one is the windowing functions. How many of you have heard or used windowing functions here? So it's something that is used in analytics basically. I'll just show you what's a window function and how it can be used. So typically a regular aggregation, grouped aggregation happens this way. That is you have multiple rows for a single group and then all these are aggregated, collapsed and you will have one row. Which has the final aggregate. That is you have say for example you have some of salaries, then you group accordingly and then you give the final salary in one row. What happens with respect to your window function? Window function doesn't do this. It does not collapse the rows. It actually keeps the identity of all the individual rows that contributed for that particular group. Along with that it also gives the aggregation. So that's what windowing function does. As the definition says it actually sees all the rows however you define it in your window clause. The rows that are in the vicinity of that current row. So I have some example which can be used to understand. So the first let's see the syntax of how a windowing function looks like. So as you see here I am just doing a select name, department ID, salary and I have some of salary as department total from employees in order by. So how does a windowing function differentiate itself from a regular group aggregate? So you have to specify this over clause. So the moment you specify a over clause here that means that you are going to treat the sum of salary as a windowing function. What does a partition by do? A partition by do is a partition by is similar to your group by. So it groups by the department ID and then gives the department total. So as you see here the output looks like this. Whereas for a typical group aggregate there would be one row for this and for the second row, second group there will be one row. And then the third group, fourth group like that. So as you see here for the first department ID there is one person contributing to the salary. So that's the end of the first partition. For the second partition which has six rows you have the department totals and all the six rows being a part of the final result set. So this is a very simple use case for a windowing function. A windowing function can be leveraged to do many, many things more. I have another example wherein in the previous example you just saw the entire partition being considered to give you the final total. Whereas in this example you see that you are actually calculating the total only for the rows that are only for two rows that are preceding the current row. So you can specify that too. Inside the partition how many rows do I actually want to look into? So here you have some of salary over partition by similar to what I had in the previous example. Order by the name then rows two preceding. So what happens in this case? So the first partition which has only null only one row so you get the department total. For the second partition that's the first row it doesn't have any rows preceding it. So you have null because salary is null. For the second row you have one row ahead of it so you consider that as well. And then for the third row you have two rows preceding. Now when you are moving to the fourth row of the partition you are not going to consider the first rows value. So it's going to aggregate only for these three rows because you are specified. I want to look at only two rows that are preceding me. So that way you can leverage windowing functions to do whatever you want mostly used by analytics. And it's not only the aggregates that can be used as windowing functions. There are a lot of other new windowing functions that have been introduced like ranks, row number, lead, lag. Like you can say that you know who is leading this row, who is lagging behind me, all those things. There are a lot of new SQL syntax as part of windowing functions you can leverage all of them. So most of the SQL standard syntax for windowing functions has been supported by MySQL in atoto. So you can see that for every row it will just consider. So since it's a new partition you just consider those. So these are the two major SQL syntaxes that are introduced in atoto. So the next important step that is taken for atoto is making the default cap set change. So for 5.7 we had utf-8mb3 and for atoto it's been changed to utf-8mb4. So what does that mean? That means that you can actually store many more characters that includes emojis as well. So all this can be sorted and then all the things that a mobile app now requires is being supported in atoto. So I can see some smiles so somebody is happy and it has performance improvement as well. So that is something that you can look forward to. And the next feature that we have added is a very simple syntax but most commonly required by a booking system. So the most common problem that you see is hot row contention when multiple threads are trying to look at the same rows. So we have provided two solutions for this problem in atoto. The first one being you can skip the application can have a logic to skip the rows that are locked or you can say that I do not want to wait for the lock. So the first syntax that we have provided here is skip lock and in the next slide I will show you the no wait. So I will give you an example of how it can be used in your business logic. So for example I have a booking system like in a movie or something like that. The first thread comes and then he looks at couple of seats and he picks up the seats and then he is in the process of payment. Now when the next thread comes the business logic needs that it should not look at this locked rows. I can actually skip this locked rows because the first thread is actually in the process of payment. So you can actually issue just this command saying that whatever the seats that are not there in somebody else's cart you just have to give me those rows. So this is a new syntax that is being given which can be used in your business logic. It is very simple and the next one is the no wait logic. So the no wait syntax is just similar to the previous one except that you are not skipping the locks, locked rows. You are just not waiting for the lock to be released. So that is about it. So this is a new syntax that has been provided and the next is the JSON functions. So as you all know that MySQL 5.7 has the JSON support and it is the most well received feature for 5.7 as well. So what we have done in 8.0 is made some more improvements to this JSON functionality. So we have given some more JSON functions that can be leveraged in many different ways for you to take advantage of. So I will just go through the most important functions that were introduced. The first one is the JSON aggregation functions. So JSON array ag. So I am not going to cover how to create a JSON object and all those things. It is just the new functionality. So what we have is here a table having a JSON column and what we have here is how to, what kind of output you get when you use a JSON array aggregate. So it has aggregated all the JSON key value pairs into an array. So this can be used in your applications in many ways. And the next is the JSON object aggregation which can be used to create a JSON object. I mean the previous one was for your creation of an array. This is for just the objects. So as you see here you can use it in your group by queries to just get the list of all the objects that are present in your group. So that way you can leverage it. So this is something that can be used in many ways, the JSON table function. So it is the most powerful syntax that is provided for JSON functionality in edoto. So what this can do is convert your regular JSON documents into relational data. So I have an example. I will just go through the example so that you can understand better. So I have a table here which has a JSON column having all the people's data. It is just unstructured and there is lot of fields and everything pertaining to people's data. But what the application is interested in is just the names and the addresses. So what now with JSON table function what you can do is extract these two and then interpret as a relational data. So we have this syntax now. Select start from t1 JSON table. Then the first argument is your JSON data. So from where the JSON data comes from. So in this particular case table t1 JSON call I am looking at the column data. And in the column data the path for the actual data that I am looking at for extraction of columns is people's data. So inside this JSON column what I am looking at is people's data. From which I am going to create two columns named the first one being the name for which I have a path. The second one is the address for which I have a path again. And you can also use your filtering. So this creates the relational data that can be later leveraged to do aggregations and everything. So I have one more example. So the previous one what you saw is a simple one wherein you know you did not have any nested paths. But this one you have some nested paths. That is you have people's data where father's name, mother's name, then children and children are in turn having name, age, everything. So how do you extract data from nested paths? So see here I have an example where all this data can be interpreted as a relational table like this. So this has a row representing every child in the data and their age and then there is an ID that is auto generated for the rows. And then whether the marriage date if at all it is present then you know reflect as one and then the father name. So how do you do this? So you just have to write this syntax. So JSON table, this is the column that I am looking at families. And I am going to look at all the data inside this families column and then I am going to create these many columns from this data. So the first one being the ID for ordinality. This is auto generated by MySQL. So it gives you the column numbers so that you know you will be able to identify. And then next column is the father column. So that path is father inside this and then married. If the marriage date exists then I want it to be reflected. If the marriage date does not exist then it is going to be 0. So then inside that I have a nested path of children's data. So inside this data I am looking at children's data. And from the children's data I am going to create three columns named the child ID again for ordinality. Then child that is the name and then the age. So you will get this relational data which is human readable. And can be used to leverage or can be used to do aggregations and so on and so forth. So I will just show you how you can use it for aggregations here. So you have the same JSON table definition that I had in the previous slide. Using this what I am doing here is select the father's name. And then the counts are how many children does he have. So from inside this data I am going to count the number of children. And then the average age of the children. So what it does is it gives you a mechanism to interpret the unstructured data into structured data. And from there you leverage all the functionality that your typical SQL can give. So this is very powerful if at all you are interested in relational data interpretation from your unstructured data. So that is something that is new in ADOTO. So next we have couple of index extensions. The first one being the invisible index. So typically a DBA what he would want to do is he would want to check whether any of the indexes are used correctly or not. So if at all if he finds that the index is not used by any of the applications he would like to drop the index. But it so happens that when he tries to drop the index what if the performance gets a hit. So what now with ADOTO what he can do is make this index invisible to optimizer alone. So when you insert data, update data all the physical changes will happen to the index. But only thing is optimizer is not going to pick this index for its plan generation. So what this does is after you make this index invisible one can check whether there is a performance drop. So if there is a performance drop that means that this index is actually used by the application logic. So he can make it visible again and then start using the index. Or otherwise if he thinks that the performance is not dropped at all after making it invisible then he can just drop, decides to drop the index. So that is one of the index extension. And the second one is the descending index. So all the up to MySQL 5.7 what we had support was for ascending index. That means physically when an index is stored it is always in the ascending order. Now whenever an application required a descending order what MySQL is to do was scan it backwards. So scanning backwards always has a performance hit for about 15 to 20 percent is what we observed. So that is one thing and the second thing is that since we did not have a way to store indexes in descending order for any query which had order by ascending, b descending, mixed order we could never use an index. MySQL could never use an index. It used to always sort it after fetching the data and then sort it at the server layer. So now you can create an index like this a descending and b ascending and then if at all you have an ordering that requires mixed order you can leverage that. So that is something that is done. So the next is the cost model change. So in 5.7 we did a couple of major changes to our cost model in Optimizer. In 8.0 we have improved upon it. I think my colleague covered one of it as histograms in the morning. So this is the second one that has been done. So in 5.7 and earlier for Optimizer to know if the data is present in memory or on the disk there was no way. So the storage engine was not capable of telling how much of the data actually resides in memory or how much of the data actually resides in disk. So with this what happens is that you always presume that the data is residing in disk and based on that you come up with the cost. So the plan might not be always optimal. So now with 8.0 storage engine is able to tell how much percent actually resides in memory and how much percent actually resides in disk. So because of that we are able to come up with better plans. And I think one of the dbt3 query which is a standard for measuring benchmarking has actually improved after we introduced this cost model change. So the next is the hints. So as I said earlier 5.7 we introduced new cost model. So a lot of times when Optimizer has made some changes we see that 99% of the queries actually improve in performance but there are very rare cases that you see a regression. So what is the solution for a regression is that the DBA can himself give a hint to the Optimizer saying that I do not want you to think I just want you to use this index for this particular query. So in 5.7 we introduced a lot of hint infrastructure wherein for every kind of optimization that an Optimizer uses you actually can give a hint saying that I do not want you to use this index extension or I want you to use this index optimization. So in 8.0 we have improved upon the hint infrastructure. We now have couple of hints for join order specification. So fixed order and then join order you can just specify a prefix for the entire join that you have or a suffix so on and so forth. I just have an example you can just see here. So what we have here is a select query which joins on two tables. So the default if at all you think that this is a join order that is preferable for you you can just mention in your query saying that join order first the table should be customer and the second table should be orders. So if there are plenty of tables in your query you can always specify the prefix should be something and then the suffix should be something and then this you can just say fixed order do not change anything about it. So that can be done. And so that's about what I had in my agenda but there are a lot of important features that have been added along with what I have covered since I do not have time. I have just listed all the features here that are there as part of 8.0. So we have improved GIS support. So 5.7 you saw major change in geometry support in MySQL. So we have improved upon what we did in 5.7. There is a grouping function which can differentiate between roll up nulls and regular nulls. And then as I said improved hint infrastructure it's not only the join order hints that have been added. There's lot many index merge and plenty of hints that have been added in 8.0. And then there's JSON improvements. So it's not only the functions that I mentioned that are added in 8.0. There are plenty of others as well utility functions like JSON pretty print and storage size and so on and so forth. And then we have also improved upon performance because we now have a way to detect whether JSON can be partially updated or not. So that's another performance improvement that we have done. And then there's parser refactoring which always helps us to add new SQL syntax with ease. And because of which we have actually added a lot of SQL syntax newly in 8.0 because there's lot of parser refactoring that is happening. The original parser of 5.5 is no more what you see in 8.0 or 5.6 or 5.7. I mean it's gradually improved. So that's about my presentation. If you have any questions. Yes, we have first value, last value, nth value. And it should be a feature in the storage. Window functions? No, I... Is that part of the MySQL kernel or is it storage? MySQL server. Yes, it's a server. So it means I can use these functions for any storage? Should be. It's a pluggable interface, right? What we have with respect to MySQL. So this is a server feature. So you should be able to leverage it with any... Yeah. So all it needs is a timetable changes but I'm not sure. See with respect to what we have is innerDB and MySQL and couple of other that Oracle supports. So we have had couple of changes for temporary table handling for windowing functions. So as long as those are done then I think you should be able to leverage it with any of the storage engines. But the recommended one as I said is innerDB. Yes. I wouldn't want a transactional storage. Okay, so a non-transactional engine if you want to do it, I think you should be able to do it. Yeah. How does this JSON function work? So from the engine you're able to read the JSON file and then turn it into... No, it's stored in a binary format in our storage engine as a column itself. So JSON you first import data. From the storage engine, we just read the... We have an interpretation of the binary format and then we give it out as a SQL readable format. Basically you can support in the JSON data type. So the JSON data type is on the disk or where is it? So like... It's a... You have blobs, right? Binary large objects and then you have just the character VARCAR. So either if you want to interpret as a text you can use the VARCAR this thing or otherwise JSON in general in MySQL is stored as a binary larger blob. I mean it's similar to... It's stored in the JSON record in the database. Yes. Yeah, so you just extract it. I'm sure there is a JSON file on the disk and your engine is reading it... No, no, no. It's part of the table. It's part of the table structure itself. You can create a JSON column and insert the JSON documents into it. We actually had a session at 10.15 explaining the same thing. How to... How you can leverage SQL and NoSQL in MySQL. Yeah. So... We can go through the... Yeah. How deep is the JSON storage? Can you go... JSON, you can just keep going. Yes, nested parts you can use and... The limit as per say, I do not know the exact to this thing, but then JSON extract should be able to extract the regular parts that you have typically in your... I am not aware of it. Probably I can get back to you on that because I haven't used as such the limit with respect to the nested parts that you are asking. Okay. Yeah.