 Okay, so the next session will be from Sege Golubchik, roughly, about Meet MariaDB 10.2. Yeah. Yeah, so this will be session Meet MariaDB 10.2 about the upcoming MariaDB release 10.2. Oops, let me... So first, those are stickers, MariaDB stickers, small ones and larger ones. And just a tip, I found yesterday, those large ones are very good to keep in your bed from unlocking all the time. And there are also ones that look very good on the laptop, but there are only a few of them. Anyway, so let's get going. Is anyone who doesn't know what MariaDB is, do I need to spend a lot of time on that or a little time on that or not at all? Okay, so I'll go quickly. So it's a fork of MySQL, drop-in replacement, but the meaning of the drop-in replacement changes over time. So when we just forked it, it was like an exact copy, you could swap it. Nobody would notice anything. But over time, we were adding more features and fixing some bugs that MySQL didn't... And MySQL was adding features that we ported and some features that we didn't port. So now we say that it's well, drop-in replacement for all practical purposes. But if you would like to make a point that it's different, you could create some contrived application where you will see the difference between MySQL and MariaDB. For example, the version is different. I'll talk about it later. But MariaDB, in a certain sense, is more MySQL compatible than MySQL itself, because for every major version in MySQL, they make big changes that break compatibility. So for example, MariaDB is more compatible with MySQL 5.5 than MySQL 5.6, and MariaDB is more compatible with MySQL 5.6 than MySQL 5.7. So drop-in replacement depends on compatibility on how you look at it. It is in Debian since 2013, and now it's in Jessie at version 10.0.25. And the packaging of MariaDB in Debian is maintained by Otake Kalayan. He'll talk about it later today. And the MariaDB as a project is managed... MariaDB Server as a project is managed by an unprofit organization, MariaDB Foundation. By the way, talk rules. So I might tend to talk too fast, although I'll try not to. And I might get too technical sometimes. If you have questions, interrupt me right away. Don't wait till the end of the talk. So, okay? Yeah. Or you can shout, then you won't need a mic. Or you can take a mic, whatever. Yeah, so why am I talking about MariaDB? MariaDB is based on MySQL code base, and I was a core MySQL developer since 1998. And since it was before MySQLDB was created. And I'm in MariaDB since 2010. That was before MariaDB Corporation was created. Now I work in MariaDB Corporation as chief architect of MariaDB. I'm also responsible for security at MariaDB. So what is 10.2? 10.2 is now about nine months in active development. So we're constantly adding features into it. And currently it's 10.2.1 alpha, which was released yesterday. And we are going to make it better very soon, like nine months should be enough. And by the end of the year, the plan is to make a J, and then it'll be in the end stretch. Yes. So practically it's, for 10.1 it was one and a half year between the, we started developing and up to the J. And then we maintain it for five years. So if 10.2 will be in stretch, then it's five years. We'll have to do that. But because I said in JCS 10.0, so I'll quickly talk about 10.1. Simply because 10.1 was not well in any DBM. So 10.1 went J in October 2015. Currently it's, well, our latest J, we do one release every month. And the next release will be 10.1.16. It'll be in two weeks, hopefully. We actually are pretty good now with release schedule. So it'll be in two weeks. No hopes about that. And again, I won't do a big talk about 10.1 features because it'll need another hour and I've done this talk. So just very quickly, just some major features of 10.1 that are not ingested. It's, on the replication front, it's Galera cluster replication and optimistic parallel replication. Optimistic parallel replication means that the master replicates to slave. Slave optimistically assumes that those events do not conflict and depend on each other and tries to execute as many events as possible in parallel. And when it notices a conflict, it can roll back and conflict an event and do it well later. So it allows the slave to achieve much higher parallelism than master has. So it helps slave not to lag behind the master. On the security front, there was encryption and I mean data trust encryption. Data are encrypted on disk. Inadb tables are encrypted. Inadb logs are encrypted. Binary logs are encrypted. All temporary files that the server might need to write to disk are also encrypted. If everything's done properly, the data on disk will never be unencrypted. So anytime one can pull off the plug and examine the disk as much as you want, there should be no plain text data on there. There's also password validation that uses different plugins and one can enable Libcrack plugin and then your passwords in MariaDB will be validated. The same rules that you probably have for your normal Unix passwords. Data scrubbing is more useful when you don't have encryption but can also be used with encryption. It means that when the data are deleted, that's a separate background thread that goes and physically delete them from the table instead of just marking them deleted. It may be useful if you want to be sure that deleted data are actually really deleted and are not present anywhere. There is an extension of SQL syntax that says that statement which allows to change a configuration variable for the duration of one statement, automatically changing it back when the statement ends. Analyze statement which works similar to explain, but explain, explain guesses what the execution plan will be. Analyze actually runs the statement and then shows what the execution plan actually was. With exact row numbers and everything. Compound statements, then optimizer enhancements, scalability improvements like log-free algorithms in table cache so that it shouldn't be a bottleneck and many other features. But let's rather talk about Tent2. So different features in Tent2 can be roughly grouped in different groups like something that I call advanced data querying. That's a more complex way of querying your data, doing queries that were not possible with MariaDB before. And there's another initiative of removing historical limitations. Something that MariaDB and its processor, MySQL had for like forever, at least for 18 years. That's as much as I remember. And different optimizations to make MariaDB work faster and other features that don't fit in any particular group. And I also briefly talk about currently running Google Summer of Code and MariaDB. We are in Google Summer of Code for like four years maybe. And this one looks very successful, so it hasn't finished yet, but it looks like we'll have many good features coming from it. So I'll mention them too. So about first one feature about data querying, that's what SQL standard calls command table expressions. And this is one simple query that uses a derived table or a SQL standard calls it subquery in the from clause and using command table expression syntax. This is with keyword. It can be rewritten this way, which is practically identical. It's not interesting at all. Just different way of writing the same thing. When it becomes interesting, we support recursive command table expressions. And here you see the ancestor command table expression is defined using itself. Because select from itself, well, it used to define itself using itself. And that basically works like first it selects from the Fox table by name, well, me. And then it selects from Fox and itself for, well, ancestors, parents. And then again and again. So this can select all the ancestors graph, all parents, all fathers and mothers and fathers and mothers until the table ends. So this is SQL standard way of querying graphs and recursive data structures. Employees, bosses, fathers, mothers, kids and any graph, relations graph, whatever. This is how SQL standard wants us to do that. And we now supported and progress SQL supported and Oracle supported, well, and we now support that term. That was not possible with MariaDB before. And another feature in this area of doing more complex queries. Sorry, I should have brought some water. So I'll be using stages table on this slide and to explain it, let's imagine a complex process that consists of different stages and every stage takes some time. For example, I don't know. A user clicks on a link in his browser. So the browser processes the click and sends some HTTP request to the Apache and Apache files and loads a PHP page that does something and then connects to the database and database that did some data from the disk and does something and sends the data back to PHP and PHP sends it back to the client and to the browser and browser displays another page and then the user finally gets what he actually wanted. There are many stages, every stage takes some time and what I want to see all the stages when every stage has started, how long the stage was and this is the rolling total of how much time passed since the very beginning till the current stage. This is the join that shows that and using window functions, which is also part of SQL standard, one can show the rolling total this way. So selects name, start time and duration and sum over the window that starts from the beginning and ends at the start time, which is, again, not exactly interesting because you can do the same with the join or subquery but this one you can no longer do with the subquery or join. This one displays for every stage, it displays the rolling average. It includes two rows before and two rows after. So for current row, every row it considers a window of five rows and averages over them. This is something you can do with window functions. For example, the user complains that this mouse click showing the page takes too long and you want to optimize the whole process. Logically, you need to start optimizing from the most time-consuming stages and this one with the until function. It splits all the stages into buckets and if you look at the 10th until, 10th quantil, it'll be all the stages that take longer than 90% of all other stages. So the most expensive stages will be the 10th quantil and then you will know what stages start to optimize from. This is splits, this is calculates until function, calculates quantils. About removing limitations. Any questions so far? Okay. This is something that MySQL has, like, forever. And while it might look like an arbitrary limitation, it was rooted deep into how things worked and were designed internally. You can do a self-join like on this slide for normal tables, but not for temporary tables because you cannot use the same temporary table twice in the same query, not with sub-queries, not with self-joins nowhere. It's kind of silly, but that's how it used to work and we have fixed now, so starting from 10th to this limitation is removed and you can use temporary tables just as any other normal table. This extension allows to use expressions in the default clause until now there was only default can be either a constant or current time or now for timestamps. And now that it could be any arbitrary function for any column. Oh, thank you very much. It could be any arbitrary function for any column. This feature is also supported by Oracle and Postgres and DB2 and so on. But this extension is not. Actually, Google couldn't find any other database that can do that. So you can also use default. In default, you can refer to any other columns. So if you insert, like, three into column A, the default for B in the same row will be four. So you can use any expressions in defaults, almost any. And as a nice side effect of this, I also have defaults for blobs. This is all part of them too. And check constraints. So MySQL used to support this for, again, forever. You can always create a table with specified check constraint and the parser was happy to parse it and then the server would throw it away and not even store it and of course not enforce it. And so it wasn't really support, but the syntax was supported. And now we have, well, fixed this properly. Now check constraints are parsed. They are recognized, they are stored, and they are enforced. So this table, if I would create it, I would not be able to insert anything in the table where A is not greater than B. This is also part of intent too. Virtual columns. MyDB supported virtual columns since... Sorry. Since version 5.2. But they were subject to many limitations. Some of them are removed now in 10.2. For example, now virtual columns can refer to other virtual columns, like column C refers to column B, which was not allowed before. They can be indexed, which also didn't work until now. Also, I didn't have an example for that, but the maximum expression length was increased. The set of functions that can be used in virtual columns is increased, so you don't necessarily have to use deterministic functions for virtual columns. So there were many limitations removed. I have only mentioned a few. The decimal type, decimal numbers, they got larger precision now. They can store up to 38 digits after the decimal comma. And this feature was actually, for reason I don't quite understand, the most highly voted feature in our feature tracker. But, well, now it's possible. You can use subqueries from close inside views, which was prohibited before. This query would have failed, createView in 10.1, and now in 10.2 it works. So we also worked on different optimizations, making many be faster on different levels. The faster, the best benefits would be speed up like for spent development time or whatever. You get, of course, from my defined algorithm significantly. So there were changes in order by, for example. In this example, let's assume that T1 ID, that looked like a primary key. So let's say it's a primary key. This one, let's say it's not indexed at all. So in 10.1, Optimizer would have tried to order by this not indexed column, and that would require file sort. So another sorting pass, which could be very slow. And in 10.2, Optimizer would notice that those two columns, two values are actually identical, and it could simply order by the primary key of T1, and then it doesn't need to file sort. It can just use the existing index, which could be much, much faster. Also, I was saying that in 10.1 we've made a log free table cache. This was apparently not enough. So now in 10.2 we additionally to that we want to partition it to make it well, even more concurrent. Faster connect times. There was some work done on make connection time faster, but whether you will be affected, that depends on your application. So if you actually see the connection time, which means you do like connect, run some simple query, disconnect, and then repeat again. Then part of the whole process, connection time will be noticeable, and you can see up to two times speed up, that's what we saw in our benchmarks. If you connect once and then run many long queries, then disconnect, then connection time is very small anyway, so you probably won't notice anything at all. It won't be slower, but it will not be fast either. We have added some more batching into the protocol, so one can batch many protocol commands into one, and the server will batch replies as well, reducing the number of round trips. It can be done for any command, but more practical example is to do prepare and execute in one big packet instead of saying prepare, execute, okay, so it reduces one round trip. Also when your application connects to the database, in most cases the username and password are correct. In most cases you don't get an incorrect password because everything's hard coded anyway somewhere in your application. So it makes sense not to wait for the server acknowledging the password, but just send username the password and the first statement that you want to do in the first packet, and the server will either execute it and send the reply right away, or it'll say incorrect password or whatever. But it also says one round trip, and I don't have any scientifically good benchmarks for that, we've done only some simple non-scientific benchmarks, and we'll have better results, we'll blog about it, but we did see some huge speed up because of that as well. There are many other statements, any other extensions that are either done or we are currently working on and they're ready to be pushed. So JSON functions, data type plugins that allow to, well, write plugin for data types, and the first three will be UID data type, APV6, APV4 data type, and JSON data type. And this is a simple feature that I surprisingly requested many times, and you can specify tell the server to return any version thing you want. And why is it needed? Because there are quite a few applications, it turns out, that do not support my ADB, and the vendor does not want to support my ADB. Security scanners that complain that the version of my ADB, if you compare 5.6 and 10.2 as strings, then 10.2 will be less than 5.6, because 1 is less than 5. So they'll complain that it's too old version that has too many security vulnerabilities. Or, for example, MySQL workbench, I believe, it checked the version to see what features you support. And if it's less than 5.6, it says that you don't have this and this performance schema tables, and that won't allow you to run this query and won't show you these pages. So again, in the ideal world, the users would complain to vendor, and the vendor would fix that, and we actually tried that, it didn't work, so we finally had to implement this workaround so that our users, those who wanted could hack around the unsupportive vendors. And now what we have in the Google Summer of Code. We have actually had 10 students, not all of them for the server, but those who were for the server, they're actually pretty good. And we will have, I would estimate that out of five bullet points in here, we'll have at least four features pushed. That's what I predict. And those features are, no pet collation means, so a skill standard supports or dictates two different types of collations, two different ways of comparing strings. There are pet space collations, which mean if you compare two strings of different lengths, the shorter one will be paired with spaces to the length of the longer one. This is how MariaDB currently works. Every now and then users ask us to support no pet collations where the shorter string is not paired with spaces. And this is part of the Google Summer of Code and so far it looks good, so we will probably have it in. And yeah, we have create function where one can create a function in SQL and then call it from selects and other queries just like any normal function. And there's another Google Summer of Code project to implement create aggregate function statement so that one could create aggregate functions in SQL just as well. This non-standard skill standard doesn't support it. Every database does it differently, so there's no standards about that. Habitory long tuning constraints. MariaDB has, I think, pretty much, every database has a limit on the maximal key length, length of the index, which makes the perfect sense because the index is just a help for the optimizer to execute the query faster. And key is needed to be able to distinguish between rows. And starting from the certain length, key stop being... loses its discriminative power and short keys is as good for distinguishing between rows as the long one, but it takes more space on disk and it takes more time to read. So for optimizing the queries, there's certain key length in which it makes no sense to have longer keys. That's why we don't support that. On the other hand, unique constraints, they are not just help to the optimizer. They are actual constraints, logical constraints of the data. And for those logical constraints, it makes perfect sense to have very long constraints. For example, I don't know, if you upload your 5GB movie on Dropbox five times, it'll do the duplication and it'll store this file once, and other three will be remembered like it's the same file. So sometimes it makes sense to have unique constraints over very large data. But because unique constraints was implemented just as a regular key in MariaDB, it wasn't possible, and now we are removing this limitation. And in turn two, it looks like we will have unique constraints which could be of arbitrary length. Invisible columns is something that is very similar to what Oracle has. You can mark any column in the table as invisible, and then if you do select star, it won't be there. If you do insert without column names, you won't insert into it. But if you mention it by name, and why it is needed, if you have an old application that you cannot really change, but you want to extend the table for some, I don't know, put a timestamp to keep track of the time, or extend it somewhat, then you can add an invisible column. This application will not see it, but your specific queries will see it. And there's another type of invisible columns, those that you do not create explicitly. For example, like row ID, it's kind of a column in the table, but you don't see it if you look at the table structure. You don't see it if you do select star, but if you mention it by name, you will see it. There's another kind of invisible columns. So we will have both. And the last feature, which is Google Summer of Code, but it's practically done, so it's, well, it's good, it's already in. It needs a little bit more explanation. So this is a derived table. It's a derived table. It's up query in the from clause. And normal, there are two ways of executing it. For the views and for derived tables, it's the same logic. It can be merged into the main query, and then it'll basically disappear as an individual query. It'll be part of the main query. Or it can be executed separately, result is stored in the temporary table, and then this temporary table is substituted in here. And this particular example, it has a group by, so it cannot be merged. So it will always be executed in a temporary table. But if you look at the where condition, this where condition can be actually moved inside the parenthesis and executed together with the inner query and stored in the temporary table as well. And in particular, this condition seems to be very limiting. So it'll, if it'll be executed as a part of the inner query, the inner query will be much, much faster. It'll take much, much less time to execute and much, much less space and disk. It'll have a lot less rows than if you do that without a condition. Now, this task is about moving these conditions inside the non-marriageable views and non-marriageable derived tables. It's expected to provide great benefits for the cases when the query cannot be merged. So merging is ideal, but if it cannot be merged, this is basically the second best thing we can do to optimize those kind of queries. But, yeah, so that was about Google Summer of Code, but you don't need to wait for Google Summer of Code. You don't need to, if out of it, you can contribute anytime and you can, you can contribute with the code. So we are on github, github slash maydb slash server, slash server, you can fork it and hack away and submit full requests. We have a knowledge base which is Wiki-like and you can edit knowledge base, you can contribute pages there, you can correct mistakes there. You can report bugs at gira.marriagedb.org or you can hang around on IRC and help people on mariachen on free node or on mariage discuss mailing list. The simplest way to help us that will only take half a minute is to put enable feedback into your My.cnf. This is similar to Debian, to popcorn in Debian. It will enable feedback plugin that will once a week send us anonymized statistics about how you use my ADB. What exactly it'll send, it's in the knowledge base. You can see and you can prove, I can prove that it's anonymized, that it doesn't send anything that we don't say it does. And then it'll help us to make more informed decision of what we need to work on, what we need to optimize, what queries are important and what we don't need to waste our time on. What features I say nobody uses and can be removed or features cannot be removed. Questions? Any questions? Well, I have a question. You mentioned about the explain thing that the plan would be explained and now would analyze is it always the same plan but you get the additional numbers or can it be a different plan on the way? It can be. So every time when, in most cases when it should be the same, in some cases when it's not the same, we would consider it a bug and we would fix explain to show the same plan. But sometimes there are some decisions that are made on the way when the query is being executed. This is explained simply cannot do. Any other questions? I should check on IRC, but I don't think... Okay, so if there's no more questions, then let's thank the speaker. There will be another database session in here after this one.