 Sorry for this. This is a talk beyond where and grow by. Yeah. Let's start from what everybody knows. And this is the part of SQL that appeared in the very first SQL standard, SQL 86. And this is SQL that select this as we know from where grow by having order by and limit although it's not standard so I didn't put in the slide. It works. Everybody knows that I hope everybody's using that. That's fine. But really it's SQL 86. It's the time of dose 3.1. Anybody's using dose 3.1 here? Well, I don't, you know, okay. One hand. But actually the technology and software and hardware move a little bit ahead from 86. So why wouldn't we look at what SQL standard has to offer? Do we need to limit ourselves this 1986 standard? The next SQL standard that had interesting stuff. And yeah, I've been talking only about select because beyond where and grow by so trigger, stored procedures, all this stuff are completely out of the talk. So the SQL standard was SQL 92 and it introduced the concept of set operations where every select is looked as a set of rows and then you can do set operations with them. And Union again, probably everybody knows that it was in Moscow for many, many years. If you have two selects, then Union will show both rows that are present in both selects, at least one select. And there were two other set operations introduced in SQL 92, in SQL, yeah, 92, they were intersect. Intersect will show only rows that present in both selects. And if you will draw circles like in kindergarten, which are called spin diagrams, then you will see that, yes, only the intersection of those two circles will be present in the final result set. And accept, which will show only rows that are present in the first select, but not present in the second select. They're both part in many db10.3. Next SQL standard, it had an interesting feature. It was SQL 99, and it introduced the concept of common table expressions. Now, who knows what is common table expressions? Okay, good. But for others, I'll do some introduction. So this is subquery in the front rows. You can see that the subquery is put in the front rows of the other query. It returns some results and the outer query returns selects from the result of the subquery. Again, this existed in my SQL for quite a while. And common table expression can say pretty much the same thing, but in a slightly different syntax, where I first just specify the subquery, give it a name, and then you select from it. So it's pretty much just syntax sugar to do the same thing with different words. It does have benefits of better reliability in some cases. In particular, when I have nested subqueries and it's getting difficult to read, and in particular, when you need to self-join a subquery, then it's much better to use CT, because just imagine this is small subquery. It's in production like four pages on your screen. Then you have eight pages of two identical subqueries you need to scroll back and forth just to see whether it's the same subquery or if it's a slight difference in colonize. With CT, you just specify it once. But really, it's not much different from subqueries in the front rows. And I wouldn't be even talking about them if not for recursive CTs. So recursive CTs, it's much more interesting feature. It looks like it's almost the same as non-recursive, but in fact, it's pretty much game changer. It allows to do things that were not possible with this code before, so it's not a different way of saying the same thing. And let's see what recursive CTs look like. There's a keyboard recursive in there, and then there's a name of CT, then there's a first select, just like with regular CT, and then there's both union all and another select, which is defined in terms of this very CT that is being defined. CT is defined using itself recursively, which is why it's called recursive. So how such a thing would be executed? So first, in this recursive CT, we have a select from the false table where name equals, well, me, and the result gets error for me. Then we do union all and we start, we do a join of what was already found that is the row with me, with the table false, and this second select finds my father and mother. Then we do another select, the part after union all, and we join this new two found rows with the table false, and we find my grandparents, then there's another select, these four rows are joined with the table false, and we find eight more rows, and so on and so forth until the join stop delivering results until the table false is exhausted. This is how this recursive CT is executed. So you see this table, it records a tree-like data structure where there's a children and parents and their parents and their parents, and traversing recursive data structure in a scale was not possible in the standard before recursive CTs were introduced. So they're very useful for this tree-like structures, but also they're useful for data generation, and usually say you have, we just useful for reports, say you have an attendance table for some employee, I don't know, a student, and you need to know what days he's locked off, then you would do your life join with the table for all working days for a particular year, and before those CTs, you would need probably to regenerate this table and fill it in something, PHP for example, with all the days. Now you can generate those tables whenever you need them using recursive CTs, and just for the change I'm not doing working days, I'm doing all weekends in the year 2020. This recursive CT, it starts from the January 1st in 2020, and then every next row is the previous row plus one day, so we will have all the days in this year, and we will continue the generation as long as we stay within the boundaries of the year 2020. So we will have a CT that generates all the days for this year, and then we just need to select only weekends for it, and we will have the table of the weekends in this year. And you can do it, you can join this table with another table, and you can do some interesting reports from it. And the third example, it's a little more complex example, it's, say, NANA social, it's a very small social network, just eight people, and the friends table, saying who's friends with whom, and I'll show how to work with graph structures using recursive CTs. So you want to find a chain of friends that connects Chris and Jason. So again, we start with Chris, and then we start with the Chris name, and then we start append friends to it until we get a chain to Jason. So we join, for every next row, we join the previous row with the table friends to find the next friend, and because we don't want cycles in our path of friends, we make sure that the new friend is not already in the path. And if you have found such a friend, we just append it to the path, and we will do that until we reach the user with the ID equals four, which is Jason. So if I run this CT, I'll get all the possible paths between Chris and Jason. There's four, three short paths and two little bit longer paths. So that's how you can do graph structures in CT. So as a summary, there are non-recursive CTs, and they are not very interesting, they are just a different syntax for subqueries in the front rows, although they are more readable, and to some cases, depending on the implementation, they could be better optimized than subqueries in the front rows. In the cursive CTs, it's a totally different thing. It allows you to do things that are not possible with a squirrel before, so not new possibilities. It allows you to query hierarchical data and graphs and trees. It can be used for data generation on the fly whenever you need them. And they actually make an squirrel, so I've heard a Turing complete. So if you Google, you can find CTs that do the Solve Sudoku, that do brainfuck interpreter in a squirrel, and other crazy stuff that you never want to do with a squirrel. Now again, fast forward, next interesting skills time of the squirrel 2003, introduce the concept of window functions. So what are window functions? First, just shortly, what other types of functions exist in a squirrel? So there are a number of functions in a squirrel. They are invoked one per row, and they deliver as many results as there are rows. So if you select from a table, you get a table of 10,000 rows, and you use normal function like concat or MD5, then you'll have 10,000 results. There are group functions, there are aggregate functions. They're invoked one time per, they deliver one result per group of rows, and the result depends on the whole group. So if you have those 100 groups of 100 rows in each group, the normal function will deliver 10,000 results, while aggregate function will have only 100 results. And window functions are something in the middle, they still, they deliver one result per row, but the result depends on the whole group. They don't have this tunal vision looking at every row, they can see all rows around it as well. And about all rows around it, this is example to explain what is, why it's called a window functions. It's typical example, it's called a moving average, if you have some data that fluctuate, they have high frequency noise and then change slowly over time, and you want to see this slow changing pattern, then you need to remove the high frequency noise. And this often can be done using the moving average. Basically for every data point, you take few points around it and you average them all and put it inside of this data point. And to do that in SQL, if you ever need to do that in SQL, because usually it's not done in SQL. You need to calculate the average, you calculate the average for all rows starting from two rows before the current data point, and two rows after the current data point. So you have a moving window that moves with the current row, and that's why they're called window functions. And you're on average four rows within this window. A more practical example, something you're more likely to be doing in SQL. This is data for hypothetical example bank, which stores not database transactions, but financial transactions and table called transactions. And every transaction has a transaction ID, a customer ID and amount of example monetary units which I'll call coins here, because it's the hype of the day. So, and this table in table transactions stores what users did what transactions. So in the first transaction, first user took 50 coins out of his account, then third user put 50 coins into his account and so on and so on. And say we want to do, we want to know the balance on each account after every operation, something like this. So after the first user took minus 50 coins, the balance is minus 50. For seconds plus 50 then first user put 950 on his account and the balance will be 900 and so on and so on. So how can we do this in SQL? This is possible, not even very difficult. We just need to write a sub query that will sum all the transactions for this particular user, the transaction ID that goes above and up to the current transaction. So summing all those amounts, we will get the running total and it works. But yeah, so what are the problems with this approach? First it's, well it's not difficult, but it's not exactly easy either. So to understand what this sub query is doing, you actually need to look at it and turn it in your head a few times, trying to understand, yeah, that you sum for this user and then you understand that it's running total and how this sub query would be executed. So for every row in the result table, the server would need to read all the rows in the table, find rows with the correct customer ID and correct transaction ID and then sum them up. For every row, it need to scan the table and read the rows. So it'll be, complexity will be quadratic even if you have the table 10 times larger, the query will take 100 times longer. So it's not always a practical approach. On the other hand, you can do the same thing with SQL, with window functions, where you need to, with window function, you need to sum all the rows and the window will be between unbounded precedence that is from the very beginning and up to the current row. And this is by the way very rare, if not unique case, where SQL standard is not very unknowingly verbose because this is the default behavior and you can just, I mean the window specification. And then you get a very concise sub query, very concise query. So it's, first it's a lot shorter to write and it's easier to read because you immediately see that it's running total. But the most important part is that SQL, that server MariaDB, whatever, also understands that this is running total and it will not need to do a sub query because how would you do the running total with a pen and paper? When you see a next transaction of the user, you take the previous balance and you add the amount of the next operation. So you don't need to read the whole table every time to calculate the balance for every row. This is exactly what MariaDB would do in this case. For every new row, it'll take the previous balance, it can add the current value. So it will not do this scanning table for every row. And it'll be linear complexity if you have your table 10 times larger, the query will be only, it'll take only 10 times longer. And I did short benchmarks directly on this laptop, not very scientific. But you can see that sub query approach, it gets totally unusable very quickly. So it's three hours for 100,000 rows. And then I put an index on that, the best possible index tuned for this particular sub query to make the most of it. And it helped a lot and the query become actually useful and usable, it could be used up to 100,000 rows. But still the query execution time grows rather quickly. And I would think that if I would add 100 times more rows, it would be just as unusable as the first sub query approach. On the other hand, window function, it grows, it's not only much faster, always much faster. It also grows much slower. So even if I would use a 100 times larger table, it would probably be perfectly usable even with the tables that latch. So what are window functions? Window functions, they provide a useful way to avoid slow sub queries and self-joins. They provide better reliability and what important, you can better convey the meaning of what you want to do to the optimizer. So it can optimize the query better. And that results in much faster queries. Now fast forward and time we get SQL 2011 and this quote was right, system version tables introduced in the SQL standard 2011. So what are those system version tables? Let me describe through rather different problems that could be solved with system version tables. So I don't know if this ever happened to you. It did happen to me quite a lot, although not really in my SQL command line prompt, but yeah, if you mean, there is a real use case for undoing statements that you didn't actually want to do. Another use case would be doing analysis on historical data comparing how did your user base change this last year, what people were buying two years ago compared to what people are buying now, or what ads they're clicking on, or I don't know what hotel rooms they're booking. And the third use case is if you look in your logs and you find out that for some reason you had a data breach half a year ago and you didn't actually notice that, and now you need to dig it up and you need to understand what data might have been leaked and what users need to change their password and so on, this forensic data analysis. And these use cases all can be solved with system version tables. But this is not the case when SQL standard is not unknowingly verbose. So let's take, this is a normal node system version table and let's make a system version. So according to SQL standard, first you need to add two columns which are timestamped with microseconds. They're generated columns and the first one is generated always as row start and second is generated always as row end. Then you need to add a special incantation, a period for system time with those two columns. After that you need to specify that this table is created with system versioning, and then you have a system version table according to the SQL standard. And we in many ADB thought that first is a little bit too verbose to our taste and second, when you do select from that, you just want to have the table system version. When you do select, you don't really want to see those timestamps. They just garbage because you want to see real data but version. So those SQL standard syntax, it's optional and you can drop it and the table will still be system versioned and you just won't see timestamps in your data, in your output. And why would you want to do that? Because if you have a system version table, then you can do this magic thing. So you select from the table and you select from the table and after the table name you specify for system time as of some specific point in time and you see the data as they were at the table at that point in time. And this solves all those three problems that I just mentioned because if you accidentally deleted all your data in the table, you can just insert into the table the data as they were in the table five minutes ago and the data is good as new. And if you want to do analytics on the historical data, you can join this table with itself from one year ago and again you can compare the data and derive some useful conclusions out of that. And if you want to know what an intruder saw half a year ago, all you need to do is select from the table at exactly this point in time that you hopefully have in your logs. Then you see exactly the data that he was seeing and you will see what users existed at that time, what users didn't change their passwords since then and then you can alert them and do some damage control. Yeah, we'll have other extensions to SQL standard. I'll just only mention one because history might take a lot of space. You might want to store the historical data separately from the current data because most of the time you'll be working with the current data. And by partitioning by system time, you can store keep the history separately so that accumulated mass of history will not slow down your day-to-day operation with the current data. The next SQL standard that introduced, the new feature, new SQL standard, it's a SQL 2016, that's the very new SQL standard and introduced operations for working with JSON. So this is how a JSON looks in SQL standard that, and this is a SQL standard functions for working with JSON. There are not many and the last one is not yet in myDB or MySQL, but although there are plans to do that in MySQL, there's a lot of release even with JSON table. But it's still a very small spartan set, so both in MyDB and MySQL, there are lots of other functions to do something useful with JSON to modify, to query, to get some JSON metadata and the last three functions to reformat JSON in compact or loose way, although they're only in MyDB, they're not in MySQL. Although, but I won't be talking long about JSON because everybody knows what JSON is, everybody has seen JSON, so I just keep it. And this, the last part is not in any SQL standard at all, although eventually someday might be, this is new feature appeared in MyDB 10.3, this is aggregate storage functions. So you can do create function and create aggregate functions, not only, not aggregate function. And to explain me the syntax, first I start with a not aggregate function that's perfectly standard, not aggregate function that reads all data from a table and calculate the sum of squares for a particular column. So it creates a cursor for select, it reads rows one by one from the table, and then the table is exhausted, continue handler, jumps in, and it returns the sum of squares. That's perfectly standard, nothing new should be around here and that's how we do an aggregate function out of it. So that's a keyboard aggregate, then there's no accumulator here but there's a parameter for the aggregate function. And no cursor, either cursor is implicitly created and there's a special incantation, fetch group next row which reads a new row for the current group. Everything else works exactly the same as for non-aggregate function. It reads the rows when the group is exhausted, continue handler, jumps in, and it returns the accumulated sum of squares for the column X. And to show the usability of it, this is an example how to do immediate completing a scale using custom aggregate functions in a scale. To make it easier to read, I highlight you need to start reading from this one because this is where the execution starts, then it does the loop and certain data into the table and when the group is exhausted, continue handler, jumps in, and it picks the middle row from the table and returns the median. That's how you do a median in a scale in many between three. Questions?