 This is a talk on MySQL 8 features for developers. My name is Dave Stokes. I'm a community manager for MySQL. The slides are up at slideshare.net slash davidmstokes. And that's one of my blogs, mainly dealing with the PHP world of MySQL. Once again, this is a 30 minute talk. My slides normally take me 40, 45 minutes to get through, so I had to abbreviate. If you want the full deck, it's out there. I also tweeted it under the Drupalcon and Drupalcon NA handle earlier today. I do work for Oracle. So to please the lawyers who are 80% of the upper management of Oracle, I have to tell you that I'm talking about released products for MySQL. We may venture into stuff future products. And if I do that, please invoke the Safeheart agreement. What I'm saying has imperfect knowledge, so I might be saying something's blue. I'm thinking sky blue. You're thinking royal blue and it ends up being blue cheese. So MySQL is now 23 years old. I was kind of shocked by that. And then Rasmus Lerdorf told me a couple of weeks ago, PHP is now 25 years old. So a little shocking for those of us who are looking for a template engine and got lucky with his work. Oracle has owned us for about nine years now. Our current release is MySQL 80. 8016 drops on Monday or Tuesday. The big feature for most of you is that check constraints will actually work. Also going to mention, tangentially later, documents store and group replication. And by the way, the MySQL group is hiring. 85% of us are more work from home. So if you're looking for an interesting job with a somewhat mature product, it's a nice company to work for. Let me get rid of my screen capture thing. OK, MySQL 8. MySQL 8 has been out for almost a year. And the first questions I usually get about it aren't new features. How's it going to make my life better? It's what happened to MySQL 6 and what happened to MySQL 7. Well, like PHP, we had a version 6 that didn't progress too far. And a lot of the good features were both backported into version 5 for both PHP and MySQL. Our NDB cluster product, which is what tracks your cell phones, you go between cell tower and cell tower, or runs your Nimitz class and upper carriers for the US Navy. That's been used in the 7 series for a long time. So about two years ago, our engineers decided we're making such big changes. We're just going to rev up from 5.7 to 8.0. Data dictionary is the big change. Previously, if you've ever gone under your install directory for MySQL, looked under varlib.mysql, you saw a lot of files.myi.myd, that was the metadata. 27 years ago, when we started working on MySQL, it was acceptable to keep the metadata away from the database because the database wasn't exactly reliable. Now 23 years later, we decided to rule that all in the database. And that has a lot of interesting features, including instant add column and a couple of other features. The real good news is that you can now have millions of tables within a schema. The bad news is that you can now have millions of tables within a schema. I think the record right now that I've heard of is about 33 million records in a table before the guy at Precona ran himself out of disk space on a rather large box, waiting for that to be topped. Other big feature, SQL is probably the only programming language from the 1970s that's still around. And in the past several years, they've had new features like common table expressions and windowing functions to turn it in from declarative language, kind of like CSS, to an imperative language. CTEs are a handy way to write subqueries. Subqueries have always been a pain in the butt to write, harder to debug, harder to understand. They're kind of like regular expressions. You write it, it works on Monday, you come back two weeks later, and it doesn't work, you don't know why, you don't know why it works, you don't know why it's not working. Windowing functions are interesting analytic tools, we'll go over those in just a minute. Windowing functions lets you divide up your data into various groupings. Previously with SQL, if you were doing analytics, you could either do it for the entire table, or you could group it by one or other columns. But it was kind of messy, it didn't quite give you the granularity. With windowing functions, you can divide it up. Here's an example where we're partitioning by a department ID, so if we wanted to know what the various payroll is for the very department and the company, it's very easy to do. The keyword is over, over the partition of data that you're gonna work on. Here's another one where we're going to work on, work on going between a date. Their dates, intervals go between one week preceding today and the current date. So if your boss ever wants rolling updates of how sales are going on an ad hoc basis, this is how you do it real easily. Common table expressions. As I mentioned earlier, sub-queries are painted about to write. I know Laravel loves to write sub-queries, and sometimes they're kind of nebulous and they're hard to debug. Well with CTEs, you kind of define your query similar to a view, and then you query from that temporary viewer table that you created. Now here's an example where we're joining two CTEs. The first one is we're selecting A and B from table one, and the second one we're selecting C and D from table two, and then we grab from those two CTEs the data we want, columns B and D, and we join them on the thing, on the overlapping columns. Very, very easy to write, very understandable sub-queries. They're also recursive. So those of you who have to do recursion, either load tables, or because your company does nothing but Fibonacci numbers, it's very handy to do that. Lateral derived tables. My big problem with writing sub-queries for the past 20 or so odd years is that I'd write a sub-query and then want to ping it from outside the sub-query. You couldn't do that until 8013, I believe. Now, as you can see the code in red, we can define the values within the query and then the sub-query, and then call it from outside that query. Very, very handy. Optimizer and parser. These are the heart and nervous system of the database system. They're very much like the GPS, like Google Maps. They know historically how to get to your data. Like a GPS, they may not know about the road closure, the police blockage, the sinkhole, or anything like that. They're based off historical data. And we've done a lot of work in this area. The big change is that we do have true descending indexes. In the past, if you said, I want this index to be descending, we lied to you. But now they're truly descending. We have more information on the optimized or trace, if any or DBA type persons, you might be interested in that. The also, the two things I'll show off later are no wait and skip block, like that. Imagine you're ticketmaster, you're trying to sell tickets and someone wants tickets into seats in rows two and three. You go out there and look for those. In the past, the query would block because someone else had locked those rows. If you're using skipped lock, it skips right over those locked rows and returns whatever else is available. Also, you can put on no wait. So if you go out there and everything's locked, but you don't want to wait for them, then they'll return with a message saying I can't get any records for you. Contention aware transaction scheduling. University of Michigan wrote a paper a couple of years ago on this new way to schedule functions within a database. Databases have always had trouble trying to figure out how competing queries get answered. And these folks wrote a rather interesting academic paper. Our guys looked at it, were very fascinated, went to implement their stuff and it didn't work. So a series of emails and a couple of call calls later, we found out there's some stuff they didn't put in their paper. There's some stuff that we weren't reading right. And we put it in there. The basic idea behind this big paper, which I'm summarizing, is that if you have a query that's very, very greedy, it's easier to give it all it wants and send it on its way than have it fight for other queries that are greedy. So you kind of get the neediest query out of the way first. Works very nicely. So if you have hot rows or hot columns in your data, this will really, really speed up your server. Roles. If you're working at a company, like I used to work at a lot of time, springtime rush would come and you'd hire accountants for the tax period. Tax period ends, accountants go away. Mid-Fall, you start hiring people for the Christmas rush. They go away in January. And it was always a pain in the butt because she said, well, we have a new accounting clerk. Sheila used to be an accounting clerk. We'll just give him the new roles that Sheila had, find out that Sheila has a lot more privileges than she should have. Junior clerk, what, such a database. You spend your weekend reinstalling the system. Well, with rules, you define what privileges a function has. And as new people come or leave from the functioning company, you assign them to those functions. So if someone gets bumped up, you say, okay, they've gone from accounting clerk to one new accounting clerk to two. They now have reprivileges on these columns. It makes a lot easier to manage. Character sets, by default, MySQL 8.0 is UTF-8 MB4. Previously, we were giving UTF-8 MB3. There was only three bytes, no emojis. Our CJK support was horrible. And this might be the only thing you might have an upgrade problem with MySQL 8.0 from 5.7. By the way, with our new shell, you can go in there, type utl.checkforupgrade, give it the IP address of your server. It will go through and see if there's any conflicts like you're using reserved words you're not supposed to be using, what tables you have that aren't in UTF-8, and so on. So if you need to upgrade and want to see what's gonna go wrong beforehand, fire that up or copy your database into a Docker container and fire that up. By the way, we're also the only database that supports the GB-18 or 30 character set, so if you're dealing with the People's Republic of China, you're gonna be very happy with that. Why are we doing all this? Because you folks like this in your database. I was actually talking to a company a couple of months ago. Part of the three-part authentication to log in is you have to enter your favorite emoji. Guess what the most popular emoji is? Invisible indexes. There was a talk yesterday about query optimization, and they're talking about creating and blowing away indexes. Very nasty, very expensive, and it can be really miserable. What you can do with an invisible index is you temporarily, for your session, and your session only, turn that index invisible to the optimizer. So if you have a query, you're not sure that index A is properly working right, or it actually adds to you. Make that index invisible, run your query, complain the output, compare the output from the two explain plans, and see if that query does help you. And if it does help you, make it visible. If not, do not blow it away. Check with somebody else. They might be using that index. It might be really good for them. In the past, you literally blow away the index, rerun the explain fan, find out that you had to use that index, and then you spend minutes, hours, days trying to rebuild that index on big data sets. Set persist. If you work in an environment where I make a change on Monday, he makes a change on Tuesday. She makes a change on Friday. Things go catty-wompus on Sunday and everything reboots and you come back. The next Monday, all our changes are gone. And no one kept any notes. We go up to the Slack, Slack's down. We don't know who made what changes to where, and you're stuck, and your server's not running the way it was. But when you left Friday at five o'clock, it was running beautifully. Well, if you set persist before a change, it will actually take that information and put it into a file called mysqld.auto. When the system reboots, it will go through and run that last. So if you're working in an environment, so like containers, we have a lot of chefs playing around with your recipe. This is a great way to ensure that that information stays around. By the way, it logs who did what and when. Also with Mysqld 8, we're not really waiting for new features to be released. We're kind of putting them out there rather quickly. Like I mentioned, the check constraint coming out next week. You're gonna see a lot of new features coming in. They're all being implemented by a plugin. So if you don't like them, it's very easy to remove that plugin. If you like that feature, and it's not installed by default, it's literally load a shared object from the command line in Mysqld and away you go. Big, big, big change for us. Before 5.7, our GIS support was pretty poor. We had our own libraries and quite work. So we started working with the Boost.geometry folks. And with 5.7, we had a 2D world, their 2D libraries. With 8.0, we have their 3D libraries. So you're now gonna have a photo-load, a soil wrap around world. You can do all the lovely Google map type stuff. You can draw a parallelogram or a polygon and see if a point is within or without. So if you're doing your own Google map type thing application, you're gonna love this. Are we on par with post-TIS? No, but we're getting very, very close. Give us a little more time, but we're getting very, very close. JSON. JSON was the biggest cause of people upgrading from minus QL 5.6 or earlier onto minus QL 5.7. Why? Because it's a very handy way to store data. Also, it lets you get rid of some nasty many to many joints. If you're like me and have been normalizing your data to the nth degree for years, sometimes you have to do four or five dives into the indexes and into their corresponding tables before you pull up a simple piece of information. So if you're looking at your customer's last order, you pull the customer index to find that customer's ID number and find their record. From there, you go through the sales to find all their sales, and then you do another dive into the index and data to find out what their last order was. That's six dives, three indexes, three data tables, and I think it's very nasty. With JSON, you don't have to do that. You basically have a one gigabyte payload to store any data you want as long as it's in the JSON format. Very, very handy. Also, it lets you plan for mutability. How many of you are working on projects where your boss said, hey, go out and code your stuff. We'll tell you what you're doing later. There's a couple hands. There's more and more of that going on, which I find kind of disturbing. So with the JSON data type, if things change on you, you don't have to go out there and find your DBA, add a column, add an index, or change a histogram to get everything to work. You can just throw everything in basically a nice bit bucket. As I mentioned earlier, it helps remove a lot of many and many relationships, and it's a fairly easy way to handle the data. The only trouble with the JSON stuff that we found with 5.7 is that it returned a nice, long nasty string on the CLI or your MySQL I call. So we added a JSON pretty, which will pretty print your information. JSON's supposed to be human readable, but at four in the morning, when it's coming out in one line, it's very messy, much easier to do here. Also we have two interesting functions that you may want to take note of. JSON array ag and JSON array object. What they let you do is let you take data, JSON or not, and aggregate them into here and array. By the way, you know it's an array because it has square brackets around there. By the way, objects have curly braces. So here you can see that we have the curly brace around the entire object. So if you need to get your data, either JSON or non-JSON format, into an object or an array, this is a quick, handy way to do that. MySQL document store. This started with 5.7 and really came into light with the 80. Basically, it's a new protocol. Listen to port 3306. That's a new API or XDEV API. And what it does is it lets you use MySQL as a no SQL JSON document store. So very easily using some very simple current functions. You can open up a collection, open up a document in that collection, dump your data. You no longer have to wait for a DBA to set up the data, set up the schema, set up the tables, set up the relations, set up the indexes. You don't have to normalize your data. You can go from zero to writing data and reading data back very, very quickly. This differs from traditional databases as you don't have to rely on the DBA to do a lot of the stuff. You're allowed to do it on your own. However, after a while, you find this stuff is very valuable and you want to extract some of that and put it to relational columns. You can do that very easily. By the way, you don't have to do that. There's full indexing and everything else you can do within the document store itself. Now here's an example in PHP. Like PDO or MySQL I, you authenticate in with your username, password, post, notice we're speaking on 33060. We say what schema we want to talk to. In this case, we're going to talk to the schema world underscore X. From that schema, we're going to talk to the JSON document collection called countryinfo. And then we're going to do a query without using SQL. We're going to find where the ID is equal to USA. We only want the field's name and we're going to call the name country. We're going to go out there and get the geography description. We're going to call that geo. We're also going to get the geography.region key value pair and then execute the query. Very fast, very easy way to write queries. Now if you want to modify this real quick on the fast by doing a limit or a group buy or sort, very simple to do. It's a modern programming style. Very easy to do. If you go up to my slide here, you'll see a couple of other presentations on that you can download. Resource groups. Anyone here still running on a single CPU serving? Even on your laptop? What you can do with resource groups is you can dedicate certain CPUs in your multi-CPU system for certain classes of queries. Now this is an example where we said okay, CPUs two and three are going to use only for batch. And in our query, if we put in a comment that says resource group batch, we put in our query. The optimizer will force those queries onto those CPUs. So if you need a fast processing or you want to make sure that batch stuff is always stuck on one or two CPUs, you can do that. You can dedicate processes to however, to whatever CPUs you want. Histograms. As I mentioned yesterday, there was a nice talk yesterday on optimizing last QL but it had a lot of obsolete information. Indexes are wonderful. Let's go directly to the records or records you want. Very, very quickly. However, there's overhead because they are separate tables. So if you insert, update, or delete a record that's part of an index, there has to be overhead maintenance to go on to take care of that index. That could get very, very expensive. Histograms are another way to do this. This basically takes a distribution of your data and divides it up into various buckets. So if you're doing, so like states, you could have 50 buckets for all 50 states. If you're doing other stuff, you have up to 10, 24 buckets. And depending on the cardinality and the regiment of your data, the optimizer now has a better idea of where your non-index data is. Now as I mentioned earlier, the optimizer's kind of like a GPS. It knows historically how to get someplace. So if it has better information on how to get someplace, like it takes the shortcut down third street instead of going all the way down to fifth, it helps you out. Histograms are very, very easy to try out. You type in Analyze Table, the table name you want to do. Update Histogram, name the column or columns you want to put the Histogram on and the number of buckets. This is a fairly free resource to recommend starting off with at least 512 buckets unless you know the cardinality of your data is very, very small. Now the great thing about Histograms is they don't need maintenance. If the data is fairly static, maybe you update the Histogram on the weekend or for the cron job on Saturday evening. Basically it's free. There's no overhead like with an index and it won't hurt your insert, update or delete. Only selects will really benefit from this. Something else we did is we got rid of the memory storage engine. It was old, it was clunky and we came up with a new temp table storage engine. You might have saw a thing on Stack Overflow several months ago. Someone was rerunning TPC benchmarks wondering why everything that uses storage engine for running 10% faster. Well, the old memory and the old temporary storage engines would get to a certain point in size where they were limited artificially. And then the code said, okay, stop what you're doing, copy everything over to a temporary NODV table and then start again. That stopping, copying and restart was very expensive and a time waste. No longer does that with the new temporary table. So if you're using temporary tables and doing sorts or group buys, you should be very pleasantly surprised. As I mentioned, we have a new API. The old protocol we've been using for the past 23 years is good, but it's showing its age. There's some limitations. There's things like sending off a whole bunch of asynchronous queries. So your program can send out 10, 15 queries and goes off on its merry way. And then as the queries come back, you get interrupted and you can process MSACAM. Our old protocol here is on the left where we talked the standard protocol of SQL to relational tables. The new protocol. You can talk the new XDEV API, which is both CRUD and SQL, over the X protocol on port 3306. And you can talk to the JSON documents, the relational tables, or combination of both. You can actually cast the collections as a relational table and be able to deal with the SQL functions you're used to with the no SQL data. As I mentioned before, the payload for a JSON document type is roughly one gigabyte versus Mongo 16. I think our query language is a little bit easier to write there as you have to pass it. JSON documents is kind of nasty. And by the way, we have full ass of compliance. They're getting close. They're not quite there yet. Now here's an interesting bit of code. The stuff in purple is a JSON table. JSON table is a unique function. You'll find it in a lot of the big databases that have added JSON data types like us, Oracle, SQL Server. What it does is let you temporarily take your unstructured no SQL data and temporarily turn it into a structured table. In this example, we're taking the column store, which is an embedded array, grabbing all those things from the grade key value and making an average score. These are health scores or rating scores from a restaurant collection from New York City. Once we have this unstructured data turned into structured data, we can do other analysis with it. In blue, we have a common table expression where we're gonna go out and grab some of the data from that query for later use. And then with a windowing function, which is the stuff in red, we partition by cuisine and then we order from the average score, which is what we picked up with the JSON table query and we come up with the various ranks. And from the common table expression, we order by this ranking the average score. So this is a way you can find out the average restaurant score in the five boroughs of Manhattan. Something you can't do with Mongo, something you're gonna find very, very hard to do with most other databases. But it's something you can do very quickly and very simply in MySQL. So, MySQL 8, as I said, has been out for almost a year. You can download it today. If you're looking for Docker images, we do have the official MySQL Docker images out there. Go out and give us a try. Those are, even though it says Docker.com, they are maintained by MySQL engineers. Since this is only a roughly a half hour talk, I've kind of condensed a lot of stuff. If you're looking for bigger, more intense details than everything, unofficial MySQL guide goes into changes that we've done to the B plus trees, things that explain why we got rid of the query cache and a whole bunch of other stuff. So if you're looking for more details, this is the place to go. And wow, that was a quick run through of everything. I wanna leave time for questions. I do wanna mention MySQL group replication. You also see this reference as MySQL NODB cluster. The idea is that your application really shouldn't care when it's running to or reading from upstream. Your application writes to a proxy, MySQL router proxy SQL. From there, it goes into MySQL group replication where it's active, active, master, master replication. Your query gets sent off to one machine which accepts it for processing. That machine notifies all the others. We're going to process query one, two, three, seven, five. And as a group, they all process it in the same order. If one of those boxes go down, it has the smarts to automatically change over and know that that box is no longer in service and change the load over to another box. If you're adding a new box to replace another one or adjust for seasonal data flow, new box comes on, says I wanna join your group. It gets fed with a copy of the data that everyone else has. So this is your high availability data story. And even the time limitation and that we only go into questions, that's about as far as we go for now. I'd like to plug my book. If you're using JSON with MySQL, I wrote this book because I thought our documentation was hard to understand. You could tell it was taken out of the work logs and it really didn't explain how to use the various functions. I went through and wrote a book with a bunch of examples and I won't say it's a cookbook but it had a lot of coding examples on how to use the various functions. And it is available on Amazon. And with that, there's where my slides are. That's my email address and you can contact me anytime. By the way, I do have stickers up here and we do have one microphone. So if you have questions about anything, MySQL related. Do you tell them? The features you talked about will be, are these in the community edition or some of these partitions into the enterprise edition? And everything I've talked about today is in the community edition. What's kind of funny, if you take the server code for the enterprise edition and the community edition and you printed them out and held them up to the light, the only changes you would actually see is on the copyright notice. The difference between the enterprise edition and the community edition, as my sales folks would like me to tell you, is that you get a great backup tool. You get the best monitoring tool. You get a couple other plug-ins for workbench and it's that phone call ability. So at three o'clock in the morning where you're just pulling your hair out and you wanna talk to one of our engineers there. Price starts at $4,000 for four-core box and there's discounts available to call your local sales rep. So, by the way, I have stock in Oregon so I have to say that, even though it's a great deal. Yes, ma'am. I use group replication if I got some today. Yes. It's in the one hand. Could you say that a little louder please? It is much better than the old way of how you think of doing replication with MySQL. I have one site that's on very old version with the old way of doing it. It takes us seven hours to fail over to the backup. It takes me four minutes to fail over with MySQL 5.7 with group replication. It'll be much happier if they don't, by the way. With that said, we have a Drupal site, a Drupal 8 site and my binary logs are getting really bloated from the cache tables. And it's a bit of a problem. So, there's no way in 5.7 to say, hey, don't put this into the transaction log and ship it across because the cache tables are pretty transient and rebuilt, I think. Is there any way at eight without moving them to a whole separate schema? So far that's the only solution I've seen provided. To not include cache tables in your... Now, these are the Drupal cache tables, not the MySQL. The Drupal cache tables. I'm more of an old PHP hand. I started without framework, so most of the stuff I do is not. I'd have to refer you to a Drupal expert. I can tell you on the MySQL side, there's ways to flush the cache more often to get it out there. What I can recommend is call your local sales person and have them send out their SEO or SE to take a look at it. And they might have some idea. It is the one downside that I'm really seeing in group replication. And folks that are used to the old way, there was a bug in Drupal where your bin logs, your transaction logs got grew really fast. There's something wrong with your site. But that's not the case here that bugs been fixed. It's just that there's so much writing to the cache tables that a lot of what our replication is doing is cache table. Tell you what, let me give you this business card. Write that down to me. I'll put that, you know, summarize it in a email. And I'll talk to the engineers that take care of the MySQL I code. And they also sometimes play with the PDO code. They might have an idea. Well, thank you very much. Yes, sir.