 Hi, my name is Vicente Uchurbaro, and I am a software engineer at the MariaDB Foundation. Today, I'm going to give you a introduction into two advanced querying features, which are window functions and commentable expressions. Both of them are coming in the new MariaDB 10.2. Okay, and I'm not going to give you a full description of what we support, but my idea is to try and show you what they are capable of and give you a few examples of how things work. So we're going to go through both of them, give you some examples, show you why things work a lot faster with commentable expressions and window functions, and I'm also going to give you a short detail on what we support. So I'm starting with CTEs. I'm starting with a simple example. Here we have a table of engineers. So this highlighted in red is the false commentable expression definition. We start with the keyword with, this is how you identify commentable expressions, then you have the commentable expression name, followed by the body, which defines what you will find in that table definition. Afterwards, we can use it as if the table were in our database. A similar solution can be done without CTEs using derived tables. The two queries are identical. The difference is that you have the table definition in line within the query. This gets complicated as you add more tables, more derived tables. If you look on the right, we have a nested view of how two derived tables would look like if they were to reference each other. However, with CTEs, we can define this one at a time. So here we first define engineers, and then we want to select all the engineers that are from Europe. So we define another table expression, EU engineers, which references the first table. You can see it here as engineers. This makes things a lot more maintainable, and I'm not sure if you have had the pleasure of working with 20 nested loops, but things get out of hand pretty quickly. So one use case example is say you are part of the sales division, and you want to see which products have performed better this year as opposed to last year. We can do this by defining first a table expression with all the products and all the sales for that product, grouped by ear, obviously, and then we can just join this table with itself. See here that we are referencing the common table twice, and this will give you all the products which have performed better than the previous year based on the amount of sales. Doing this without the common table expressions would either require you to use a view or to define the derived table twice, which is not scalable. So in summary, CTEs are defined with the with clause. That's how you're going to see them in the query. They are similar to derived tables, however, they provide more advantages. There are also extensions to allow recursive definitions. I'm not going to go into details on those. Obviously, they can provide you with cleaner code, and sometimes the optimizer can better understand what you actually want from the data, and thus optimize your queries to produce more efficient query plans. Now, the second feature pairs very well with common table expressions. We're going to see this shortly. Window functions are another advanced topic, and I've seen a lot of people not really fully understand how they work. So that's what I'm trying to show you here. Give you a brief understanding of how you should think about them, and how you should think about writing queries using window functions. So window functions are special because they behave like two different things. One, they act like similar aggregate functions, and that they compute their result based on multiple rows, like sum and average. But at the same time, they do not collapse your rows. So you get one result per each entry in the select statement. You can obviously identify them with the over clause, which comes right after the function name. Let's start with a simple example. So let's say we have this table, which represents a set of users. We have email, first name, last name, and we also specify what kind of a user it is. In this case, either admins or regular users. If you would like to create a primary key for this table, say you forgot to define one as an integer, you would want to get, say, an auto increment column. Well, you can get this by using the row number function. The function does what you'd expect. It sets a number that increases for each row in the select statement. Now, the problem here is that we have not defined an order for this window function. The order here that you see, one, two, three, four, five, is not deterministic. These rows can actually come up in this order, two, one, two, one, three, five, four. At the same time, you could get a reverse result. Now, the reason why this is is that window functions are computed after everything else except the order clause. So the window function can get these rows in any order unless we specify differently. That comes as the form of a parameter for the over clause. We tell them that the window function should receive the rows in this order, alphabetically by email. Now, a second question comes to mind. What if we want two separate sequences of numbers, one for regular users and one for admin users? This can be done using an extra parameter to the over clause called partition by. It does what you'd expect. It splits the data up based on the column defined in the partition statement. So here, our rows are split into users that are regular users and administrators. So you see we have two sequences, both starting from one, one for the admins and one for the regular users. Now, I also mentioned that window functions behave like aggregate functions and that they are computed over a set of rows. Let's look at this example. Say we have some sort of sensor that picks up data from time to time and it puts that data into a table. We have the time stamp and the value that the sensor recorded. If we order it by time and plot it, we might get some sort of graph that resembles this one. We don't like the data because it's noisy and we would like to smooth it out. We can do this by averaging across a set of values. So we want the point that's represented to be an average of, say, a couple of points before it and a couple of points after it. And we can do this using the average function. Now we've transformed the regular aggregate average that we would use in a regular SQL into a window function using the over clause. And now the function behaves slightly differently. We first need to order the data by time because that's how we want things to be averaged over. Then we need to tell it that we want it to average stuff with a few rows before and a few rows after. And we do this with another special syntax called a frame definition. Here we tell it to average three rows before it and three rows after the current value. And now this is an average of seven values. We can increase the window that the average is computed over by specifying different numbers. They do not have to be the same. They can even be references to other columns. So it's a bit more complicated, but this is in essence how you can define the set of rows that the window function is computed over. If I'm to go through a step-by-step how this frame moves, I've taken this, the previous example, have replaced average with some just so that I don't have to divide everything by a certain number. But the idea is the same. So on the left we average, we sum three rows, the previous, the current, and the next row. And the other one does it with five rows. In blue, I've highlighted which rows are used in the computation, and the red is the current row that we're computing. For the first one, we do not have any previous rows, so we just take the current row and the row after it. This gives you two plus five in the first column and two plus five plus four in the second one. As we move forward, the frame advances. Now we do have previous rows, so we add those as well, and we add another row as the next row has moved as well. If we move even more, the frame now starts to remove rows from the result and adds new ones as they come in. Notice that now even the second column starts to lose rows because we are advancing the window function even more. Computing this is actually pretty easy if you look at how values get added and removed. So for each new row that we get, we get to keep a set of rows, remove some, and then add some. This is the key to why window functions behave very well for certain data sets. Because each computation is constant time. We just remove one thing and add another. So what we call it is online computation, and it works for almost all window functions that we support. There are a few exceptions, but I'm not going to go into detail on that right now. This is how the end result would look like. So now I'm going to give you two examples from real life applications that I actually had to solve. Well, slightly condensed, but it's the same idea. So say you're in a bank and you have a lot of transactions from users. Now we have a timestamp for transaction. We have the customer ID and the amount that the transaction was. Either deposit some money or pay some money to a different account. And we want to get a balance statement for all the customers in the bank. So the balance statement can be computed with regular SQL by doing a subquery where we sum up all the transactions before it. Okay, so here we joined the table with itself. And by joining we add all the transactions that have the same customer ID as seen here, but have a timestamp that is lesser than the one that we are currently computing. So this is done with regular SQL. With window functions this becomes a lot easier to define because we just need to think about our computation. So what do we want to do? We want to sum up things. Then we want to split our data by customer because we want a separate balance statement for each customer. We order our data by the timestamp. And then finally we want to tell the sum function that it has to add all rows from the beginning up to the current row. So here this is where unbounded proceeding and current row syntax comes into play. Now why is this important? Well let's look at the performance of this implementation. So if you were to write this query with a regular SQL, on the left you have the number of rows used in this test and the time it took for the query to execute. With regular SQL, without using an index, the time increases quite quickly. And past a certain amount this transaction takes quite a long time to finish. If you do have an index, things improve, but compared to window functions the speedup is 10x. So we've managed to define a query which no longer uses self-join, so you don't have to do n-squared operations. Instead it scales up in linear time with the data set. So this is how you can get quite a huge improvement by just rewriting your queries using window functions. Another use case is using top n queries. When you want to get the top n people from a certain data set. You can do this both with a regular SQL by just counting them. The expression is a bit complicated but you can just count them. Unfortunately if you want to get a rank column for them you will have to define the query again. Which makes the whole query look like a mess. But with window functions we have a very interesting solution which is the rank function which does just this. First we just partitioned the data by department and then we rank it in order by salary decreasing for this example. Now I said we wanted the top five earners so we want to filter this. Well you cannot filter it by using a window function in the work clause because window functions are computed after the work clause is evaluated. So this is where CTEs come in. With CTEs you can wrap the window function query and then filter that one outside of the window function query. Let's look at the performance for this one. The reason I put dot dot dot there is because it never finishes. At least not until a battery dies on this laptop. So even with a very small data set getting the top n is very difficult with regular SQL. However it is instantaneous with window functions. So whenever you have these kinds of queries making use of window functions can greatly improve your application's performance. So in summary window functions do help you eliminate self joins like we've seen with both examples. At the same time they make queries more readable no more duplicating of sub queries everywhere. And in most use cases that you will find with window functions they will speed up your computation as opposed to using regular SQL. Here's a set of functions we do support. This is almost an exhaustive list of what the standard suggests that we do. We do have a few exceptions but these are most of them. All right. Thank you very much for listening and if you have any questions. Yes. Yes. Okay. So the question is can you use can you have user defined functions used as window functions. And the question is not yet but we actually do have this planned and it will most likely come up in the next release. We have we actually have a Google Summer of Code student who's contributed a patch which allows user defined functions to be used as aggregate functions. And then getting aggregate functions into window functions is a trivial step. Yes. Okay. So I'm not quite sure I understand your statement. Sorry. Okay. What what what what I'm what I'm saying is that is that we have have user defined functions as SQL functions as procedures like you would define a. Yes. That's what I'm what I said that we have planned for. So we want you to allow to allow you to use regular SQL to define your own functions as window functions. All right. Well if there are no more questions. Thank you very much.