 Hello, my name is Dave Stokes. I'm a MySQL community manager on the MySQL community team, and I am here to talk about MySQL 8.0 new features and let's get started. The first thing I want to do is introduce myself. For those of you who have not met me, I started using MySQL and first became available. I was doing a lot of work with open-source software, mainly because I was called in a project to save them after they already spent all the budget and still have things to get accomplished for a deadline. I actually joined MySQL when it was MySQL AB, a small scanning company handing out a database. I was a PHP programmer on the certification team and went through the acquisition by Sun Microsystems and then the acquisition by Oracle. I am, again, on the MySQL community team. You used to be able to see us all around the world talking at events. Unfortunately, this was a prerecorded talk for a virtual presentation. By the way, I have the MySQL certifications for developer and DBA and developer for 8.0, also for a lot of the previous versions like 5.6, 5.5, 5.0. I live in Texas and I do have the required Hound Dog and Pickup Truck. If you want the slides for this presentation, go out to slideshare.net slash Dave Stokes and look for the name of this event and the subject, which is talking about MySQL 8.0 features. On Twitter, I'm at Stoker and you can email me at david.stokesandoracle.com. By the way, if you didn't hear the news, MySQL is now 25 years old, just like PHP and we're still five years behind Linux, but we are racking up the candles. By the way, if you're running MySQL 5.6, it reached its end of life status at the beginning of this year. So if you're running 5.6, please upgrade to 5.7 or 8.0, please. There's a lot of neat things you're missing and we're no longer maintaining 5.6. Okay, on to MySQL 8.0. It became generally available in April of 2018, so it's been out there for a while. We move to a every three month or so release, depending on what we need to do. So there's always something else three months away. We are on the CI CD banged wagon, just like everybody else. And by the way, our software is much more complex than in the 5.5 era. It is astronomical, this amount of stuff that's been added. And because we're really seeing every couple months, it's a better product that gets the customers quicker. We used to hold off features for years and years and years for dot releases and we realized that was hurting our customers. By the way, if you have questions about what version of MySQL to run, make sure you download the same version for your server, your clients, your connectors, the routers, utilities, et cetera. Because we do run them through an extensive QA process together so we know they work together. So right now our current release is 8026. So if you're gonna download MySQL, both the community enterprise server are 8026 as well as our NDB cluster. This is mainly used by telephone companies and massively online games. It's not a general purpose database. It has a lot of great performance. It has some restricted use. MySQL Shell, which is our new CLI, which I'll show a little bit later. MySQL Workbench router, which you'll see a little bit later. And of course all our connectors to talk between the various programming languages. So we have a new number. What changed? Why did we rev to a new number? Well, we did so much work behind the scenes that we decided not to pursue going to the next five number, which would have been 5.8. We decided to go straight to 80. And I think you'll agree that you'll see why we made that decision. First big change was the data dictionary. In the past, if you went under var lib MySQL, you saw a lot of files, MII, MYDs, FRMs and all that. That was the metadata about your database. Well, that information is now stored in the database. Now, those files all took inodes. They were very attractive to junior DVAs looking to find files to delete. But now everything's stored in the data dictionary. And it's all stored in the inodeB storage engine which has a wonderful point in time recovery. So everything's out there. So we're consuming less inodes, it's less messy and I think it's a little bit easier to admin that way. Now, the thing with the data dictionary is there's always good news. The good news is by not chewing up inodes and all that, we're not tied to that. So you can now have millions of tables within a schema. By the way, we co-mail the word schema in database in the MySQL world. Okay, the bad news. Well, the bad news that's a corollary to this is that you can now have millions of tables in a schema. I haven't seen anyone get past about 22 to 25 million tables in a schema. And I notice it's gonna come back to bite someone in the butt one of these days, but it's nice to know you can do it if you want to. The other big change for us was everything was optimized around UTF-8 MB4. Before this, what we called UTF-8 MB4 was actually really UTF-8 MB3. Didn't have all the Unicode character sets out there. So we decided we really need to do this because a lot of our customers need it for their work. And yes, it's four bytes over a one, two or three byte character set, but we have done a lot to optimize around this and I think it's worth it. So not only do you get all the support for the Chinese, Japanese and Korean languages, but you also get that all important tool that you need in your database today. And that are emojis. That's just a side benefit. I just kind of like that graphic. Something else we changed. Hash joins. MySQL used to use only a nested branch loop. Worked very well for a lot of stuff. Other things, not so much. By the way, I'm using this in an explain command. And you'll see that when we look at what the optimizer is actually doing, it tells you that you're doing a hash join. Now, the hash join works for any query that has an equa join condition where A equals B. Or here we have a less than, somewhere that we do in doing the equivalency. And hash joins are usually faster and it's put into the algorithm. So you don't have to worry about turning it on. It's just there and things just run much faster. How much faster? Well, I have this lovely graph and you notice that the hash joins are here in blue. You barely see them down here. Can't even see it down here on the Q11. And you see our branch nested loop over here in orange. So some of your queries are just gonna run exceedingly quickly. Also another big change was that MySQL can now act as a NoSQL JSON document store database. This is all based on the Google Protobufs. It's an entirely new protocol for us. It's not an ORM. It also works with relational data. And the API follows modern design practices. So you do not have to struggle with the syntax of SQL. You're gonna find this works very well with JavaScript, Java, Python, C, just about anything else out there, PHP. Now, if you want to, you can still use structured query language or you can use the new API and you actually intermix them for a certain amount. Now, the MySQL shell is our new interface. It's absolutely gorgeous. It has a lot of advanced features. One of the things that I really appreciate is command completion. I get about three characters into a command and can't remember how to spell it or I don't remember the options. I can hit Tab and it will walk me through there. It has extensive help support that I find very useful. It also has three modes that it listens to. Structured query language, Python and JavaScript. Now the great thing about the Python and the JavaScript mode is that you can write scripts and store them for later and have them run whenever you want them. And by the way, you can call JavaScript from Python and Python from JavaScript so one script can fit all languages. Also, this is our admin tool for NODB cluster and replication. You'll find that it's a lot easier than the old shell. By the way, one of its neat features is that it has an upgrade checker. So if you're running 5.7, you put it in the name of your server. It will go out there and make sure that you aren't going to have any character set problems that you're not using any reserved words and so forth. It also has a bulk loader that will work in parallels. So if you have a lot of data in, it's like TSP, CSV or JSON, this will bring it in on multiple threads. Also, we have some really neat utilities for dumping schemas, just one database or the entire instance and then pulling them back. Very, very neat, very, very fast and goodbye, my SQL dump there. Now, this is a look at the new shell. One thing I'd like to point out here is that here I connect slash C hostname. And it comes up in my case in default as JavaScript, that's what the JS is for. I'm gonna type in slash PY, I'm in Python mode. If I type in SQL, I am in SQL mode. Now I'll have to come down here and show you, tells me who I logged in at. It tells me that I'm running secure socket layers and TLS version 1.3 and the AES-256 cipher. A little bit further down here, you'll see that we are using our new protocol, which listens on port 33060. It's ever a traditional 3306. Notice that everything, the server and the client, they're all talking UTF-8MB4. We have compression enabled. So I'm gonna suggest that you have, if you haven't tried this, give it a try. It is a wonderful tool. Another big addition that I found invaluable with MySQL 8 was the idea of new visible indexes. In the past, if you had an index, you weren't quite sure with doing any good for you, you'd run explain on the query, have your doubts about that index and you'd wipe out that index. Well, just as you're testing your query, you find out that maybe your query didn't need that index, but just about everyone else in the entire universe does. So you'd have to go back and rebuild that index. No longer have to do that. You just alter table, alter the index and make invisible. The optimizer can no longer see that index for the entire server. If you need to turn it back on, it's just alter table, alter index and make it visible. Very, very neat, very useful. Also, we added invisible column. Well, what's this for, Dave? Well, if you have a piece of software that you don't wanna go back and open up the editor and start playing with, because it does a lot of select star queries and once it does the select star, it has to name all the columns it brought back. If you add a new column to add something like a primary key to a database, that can get really messy. InnoDB wants to have a primary key and it organizes everything around that primary key. So if you don't have one, things may not run quickly. So if you don't wanna go back and alter the old code, what you can do is use an invisible column. Now here's an example here where we're gonna create a table called not here. We have a column that's our primary key and we have one that's A that's an integer and one called B that's an integer that's invisible. Now, if I put some data into the table and I take a look at it and just do select star, notice I'm only getting the value for A. I'm not getting the value for the invisible B. But if I do select star comma B from that table, I'll get the values for B. You have to explicitly call for the invisible tables, I mean invisible columns. Also, new async connection failover. You're running replication between, let's say, several sources to one or more replicas. And this will automatically establish an asynchronous replication connection to a new source after an existing source fails. Now this failover mechanism keeps everything synchronized. So you can have multiple servers all sharing the same data. Now the list of potential sources is stored in the replica, not the master or the primary or the source. And in the case that something goes bad, the new source is selected from the list with the way you can give a related priority. So the replica gets the data that it needs. This also supports group replication by the way. This is a handy tool I'd to recommend when I started using Linux and Unix before that. I always made sure I took advantage of the Geekos field. Why? Because I was always working with someone named Bill or John kind of, hey, this is Bill, can you reset my password? And you go out there and look at the Etsy password file and okay, which Bill are you? And you have to track that person down. Well, MySQL didn't have an analyte to that fairly recently. And now we have user attributes. And in the first case, you see that we create a user and we put in a comment in there called and give the name of the person or sent the account form where they're located or the extension they're at and it's out there. Now we can either call this a comment or an attribute and this will really save your hide if you're in a place that has a lot of bills, Mary's, Joe's, Abraham's, Miguel's, whatever. If you have more than one, and you probably want to be using this in your password file or your password table. Turn off, I'm sorry, this is turn off. I should say turn off logging for loading data. Turn off the redo logging. What happens when you're restoring a machine is it wants to do as many as the regular processes as it can, even though you're just loading up data. One of the problems we have is with transactions, you want to keep copies of the old stuff out there in case you have to roll back. Well, if you're just loading in data, you don't really need that because you're playing it off straight off a backup. So what you need to do is type alter instance and disable the redo log. This makes your restoration go much, much, much faster when you're done with the restoration. You enable the redo log and you're back with full asset compliance. Now, this is something of the MySQL shell. If you'll have to go to MySQL shell documentation to read up on this, there are three utilities out there that are really, really great for backups. As you heard me mention earlier, I stopped using MySQL dump. These three tools are real interesting. Over here on the right, you'll see an example, util.dump instance, a directory of where I'm going to throw this information. It will give you a progress graph and it goes out there and it gets your data, it compresses it and it stores it for you. Much faster and much neater than MySQL dump and they'll do it in parallel with multiple threads and file compression. And once you save your data with dump schemas or dump instance or dump tables, then you can do load dump and it brings it all back in parallel. Very handy, especially for those of you who are playing with containers or in the cloud, especially the Oracle cloud, this is really, really neat. Binary log compression, for replication, your source has a copy of what it's just done to the data and sends it over to the replica for replication. Now that information is compressed using ZSTD. That information is kept compressed, gets transformed compressed and then picked up by the replica in a compressed mode, never uncompressed and then processed. This saves a lot of bandwidth and a lot of disk space. Random passwords. For a while, MySQL's had the ability to expire passwords after X amount of time, make them as complex as you want, how many upper or lower special characters you want. But we didn't have a way to generate a password that would support that. Well, we now have a way to do that and it's real simple. You do create user, give the user name and specify the host where they're gonna be coming from. In this case, it's a wildcard for all identified by random password and it will go out and generate a hash password for you. Very handy. Explain Analyze is a neat feature. In the past, if you're running Explain on a query, what it would do was use historical information to guesstimate what the optimizer is gonna do to generate a query plan or how it was gonna go out and grab your data. With Explain Analyze, it actually goes out and runs the query and it's not using historical data, it tells you the exact time. Well, historical data is pretty good. Unfortunately, it's also like a GPS where if I wanna go to lunch tomorrow, left out of the driveway, turn right over the railroad tracks, the restaurant's on the right. What I don't know is my neighbors flooded the road, there's a train over the railroad tracks and the restaurant's actually relocated to another location. So historical information's good. Actual information is many times much better. Big warning, if you're running a huge impactful query, make sure you're not running this during production because it might hurt you, because it does run the query. Multi-valued indexes. In the past, you could have one entry in an index pointing to a key in a column of a table. There was a one-to-one correlation, couldn't get past that. Well, with the advent of the JSON data type in MySQL 5.7, we had a lot of folks using JSON arrays to store multiple values so we had to break that rule. With a multi-valued index, it is a secondary index, which means it points to actually the primary index, defined on a column in a array of values. So you actually now could have multiple entries in an index for a single row in the data. JSON document validation. JSON is a wonderful format, it's the format of choice for interchanging of documents, but my big complaint about it is there's no rigor on your data. Traditionally, in SQL, if you define something as an integer and try to put a string in there, it's gonna complain. If you set up a range or some other constraint on there, it's gonna complain, not take the data if it doesn't fit what you've set up with the parameters. Well, what we can do with JSON document validation is we have a function called JSONSchemaValid and we define what we want. We're gonna have an object and the properties of it is gonna be called myAge, that's the key for the key value. It's gonna be numeric, it's gonna have a minimum value of 28, a maximum value of 99, and it's gonna be required. So if you're trying to pass in a document, the database doesn't see myAge in there and we'll reject it. So let's take a look how this works. Insert into our table myAge27, remember the minimum is 28. It's gonna say check constraint myAge and range is valid, violated, so it's not gonna let it in there because it doesn't fit our standards. And if I put in one that matches the standards, away we go, very handy. This is based off work from the wonderful folks at jsonschema.org. Very interesting stuff to try to help us get more out of our JSON data. Oh, by the way, that's a blog I have on using this. In the past, we didn't upgrade of MySQL. You'd back up your data, hopefully two or three times, you weren't backing up your data, weren't you? Okay, you backed up your data, you had an extra cup of coffee, then you changed the binaries, brought everything up, hopefully everything came up nicely, and then you start running around for a while and suddenly you start seeing these odd error messages. Well, what happened is the metadata behind the scenes didn't get updated. You were supposed to run something called MySQL underscore upgrade that would go out to the MySQL schema and update the metadata. Well, our engineers got thinking, you know, we're smart enough now that we have everything stored in the data dictionary, we can run this automatically. So this is another step you do not have to do. As I mentioned earlier with the JSON document validation, we have constraint checks. How they work is that you create your table and you're gonna have a check. The first check is C1 not equal to C2. I have some other checks in here. Now I'd like to point out this one here. We have a column called C2 integer, we're gonna have a constraint on it and we're gonna call it C2 underscore positive and it's gonna check that C2 is indeed greater than zero. Why am I putting this out to you? Well, notice that I gave it a name. If you don't give it a name, it's gonna say constraint check number two failed, that doesn't help you right off the bat. But if you put in a constraint check name, it will come back and say, constraint C2 underscore positive wasn't met. You know, oh, column C2 has to be greater than zero. Histograms, indexes are great. They really help you speed up sourcing data from your tables. The only trouble with indexes is that every time you insert, modify, or delete, there is overhead that has to be done on the index itself. Now, if you have a lot of activity on your system, that really could get to be a pain, that overhead can really bite. If you have those same columns used over and over and over again, the contention for that could really drive down your system. Now, what you can do is set up a histogram, which is kind of like a bar chart of your data. And you know that it's like you have five buckets. You put 10, 20% of everything into one bucket and the optimizer gets told, okay, here's the ranges of your values and put it in there. Now, one of the troubles with the MySQL optimizer is it assumes that the data is laid out in a fairly even distribution in the column. And that's not always true. If you create a histogram, it now has better understanding of how the data is in that column so it knows how much it has to grab to be able to answer your query. This is very handy for data, it doesn't churn a lot. So if you have something that churns once a week or once a month or once a quarter, you're gonna love this. If your data changes much more often, almost on a minute basis or an hourly basis, you might wanna consider putting that data in an index. Cats. No, this is not the musical and or musical movie. This was a paper from a university on the Contention to Wear Transaction Scheduler. The idea was basically you feed the most contentious query trying to suck up the most amount of locks on your data and let it go and that will free up more power for other queries. Now I'm sorry for the folks who wrote that I know I'm butchering the conclusion of your paper, but that's basically the way it is. Now with cats, what happens is you get a certain load on your system, a certain amount of contention for locks and the switch is on automatically and it knows how to better handle contention. And this is places the previous first in, first out algorithm that we used to use. Better JSON support, one of the things I appreciated when ADO came out was the JSON table function. Well, what's so special about that Dave? Well, this lets you temporarily take your unstructured JSON documents and turn it into a structured relational table for processing by SQL. Now in this example, we're gonna use JSON table on a column called doc. We're gonna read in the entire document that's that with that dollar store stands for. And we're interested in the columns or the key values called country name that we're gonna cast as a car 20. And that's a dollar name in the actual document we'll point to that key value. If we want in-depth year, we're gonna cast that as an insurance, call it independent year. And we're gonna feed all that to the outside part of the query here where we're selecting the country name in the indie year from the JSON table function. And we're gonna say, okay, we only want the information where the indie year is greater than 1992. So this is all structured here. Everything from in here is unstructured. So this temporarily takes your unstructured data structured. Now if you have unstructured data, you permanently wanna put in its own column and make it structured. We have generated columns where you extract data and put in its own column. Common table expressions. If you hate subqueries, had trouble with them. I really recommend taking a look at CTEs. They're kinda like derived tables, but the declaration is before the query. Now, in this example, we have two common table expressions. We have one we're selecting A and B from table one, another one from C and D from table two. And we're selecting information from the various CTEs and we can join them. There's a whole bunch of other neat stuff here. Lot easier, lot faster, lot easier to comprehend. So I recommend taking a look at CTEs if you write a lot of subqueries. Window functions. This gives you more granular analysis, particularly for those you're doing data analytics. One thing I really appreciate is that in the past, you could either do stats on one column or the entire column or a group of columns. With window functions, it's much more granular. The keyword you're looking for is over. And in this case, we're using over W and W is defined as a window. We're gonna order everything by date and we have a range interval between one week preceding what we're looking for and the current row. So if you have to do something with your boss, it says, hey, what's the sale since the beginning of the month? This is the type of function you can do this. You can do cumulative ranks, moving averages. Just about anything you need to do statistically, you can do that with windowing functions. Now in order to be cluster is our high availability replication system. What we want you to think of for your application is this diagram over here. Your application is running a MySQL connector to a database and that connector is connecting through an instance of MySQL router. This is a lightweight level three router and it has all the smarts what goes on here. Your application doesn't need any more smarts than how to get to router. It doesn't need to know if this is a read write split like you have here or it's a multi primary instance here with everything asking as a primary. And all it knows is it wants to get to router. Router is very lightweight. You can actually run multiple copies of that on your application server. And router will take care of what goes on down in the cluster. It knows which machines are busy. It knows where to read only from, where to write from. If it's running multi primaries, it makes sure that they're coordinating everything correctly. And by the way, this is all administered by our new shell. Now, for those of you who are used to our traditional replication or asynchronous replication or maybe some I think replica set is based on some technology from NODV cluster, especially on the admin side. If you've ever set up replication in the past, it can be a little tedious typing up everything. One of the things about replica set is a lot of the things are now done for you. The only thing it won't do for you is the automatic switch over or failover NODV cluster does that. One of the neat things about this is like NODV cluster is that it uses the clone plugin. Well, what's that Dave? Well, the clone plugin basically goes out to a donor server and gets all the NODV table spaces and makes a snapshot of them. Very quickly brings them over to the new machine and gets them running. This is amazingly fast. If you've ever had to wait for a replica to come up because it's copying data back and forth, this will astound you. By the way, I mentioned that you could use my SQL without the SQL. Here's an example. I've logged on to a system and I'm gonna use a schema called FOSDEM. I now have a pointer called DB that points to FOSDEM. If I type just DB, it tells me FOSDEM. I'm gonna create a JSON document collection. So I said, okay, current database, create a collection, we're gonna call it A. If I type DB.A, it tells me that I have a collection called A. If I wanna add a document, DB.A.add, and I put in the key and the value, boom, it's there. If you notice, there's no structured query language in here. And if I go out and do a find, it will come back and give me my information. It also generates a primary key for me. Now you can generate your own primary keys if you want, if you have a good numbering system. But this is just an example of using MySQL without the SQL. I'm gonna go into something right now that's not open source, but I know I'm gonna get questions about this anyway. This is part of the Oracle Cloud infrastructure. And when you sign up for the MySQL database service, and I'll in a moment, I'm gonna show you how to get some free time on this. It's a data managed service and it is a paid service. And if you use MySQL database service, it's 100% built managed supported by OCI, MySQL engineering teams. When you fire it up, you have choices. Stand alone, a single instance of MySQL. High availability, which is actually running NODB cluster behind the scenes for you. And heat wave, you've probably seen a lot of press about this. This is our numeric analysis tool that also has some machine learning behind it. That's the end of the talk there for the cost stuff. If you are in a startup, I recommend getting hold of the oracle.com slash startup page. Sign up, we have lots of credits for you. We'll also help you with promotion and get you exposed to a lot of our over 400,000 customers. Now, if you wanna test drive the MySQL database service, you can do so for free. This is how you get $300 in credits. You go to oracle.com slash cloud slash free. If you stay on MDS to the lower machines, $300 will last you a while. If you're running lots of numbers through heat wave, it may not last you that long. By the way, follow us on social media. MySQL.com is our main website. We are on Twitter as MySQL. We're on Facebook as MySQL. We're on LinkedIn.com, company slash MySQL. We have just about 21,000 folks out there. By the way, if you are using the JSON data type, I'd like to plug my book. This is available for Amazon. If you are dealing with MySQL and the JSON data type, this is a guide for best practices and lots of coding examples on how to use the JSON data type efficiently. Second edition is now on sale at Amazon. And with that, I'd like to remind you the slides are out there at slideshare.net slash Dave Stokes. Go out and look for the open source summit and look for the title of the show or look for my name and you will find it. And if you have any questions, since I'm recording this well before the show, hopefully I'll be able to answer them for you now. If not, I'm at Stoker on Twitter and david.stokes.oracle.com. And I hope to hear from you soon.