 Okay, for those of you that just came in, actually there's some papers going around. We're raffling it off to three T-shirts and a little performance database monster at the end. If you're interested, there's some sheets going around, fill it out, put your name. There's a brown bag going around, put it in there and I'll ask at the end too and we'll quickly raffle them off. Just to get started, this presentation is on need for speed, best practices for my SQL performance tuning. Just before we get started a little bit about myself. I've been a DBA going on for over 25 years now. I started out with Oracle version 3, if you can imagine that, and actually I've done many other databases throughout my career. Most notably right now, my SQL, I actually worked with it version 3 and 4 in the early 2000. But we just know SellerWinds makes monitoring products and so we have a product. Last September we came out, we're now monitoring my SQL. This isn't about SellerWinds, this is about performance tuning but I just wanted to tell you where I'm from. I came to work for SellerWinds going on nine years now, and there one of my tasks is to work with our prospective customers and our customers, looking at problems in their databases, and giving them some ideas on how to tune it. There's always this common deer in the headlights look when they find that SQL to tune. It's like, well, where's that magic button? What's going to fix it? I don't know what to do. Kind of the reason for this talk. I'll take questions at the end. It's a long, I've got some case studies and stuff, so we'll take questions at the end if we can. The agenda we're going to talk about the challenges of tuning, who should tune, what SQLs to tune, and then we're going to talk about a technique or a methodology we've used, that I've used for years called response time analysis. I feel that's the best way to quickly find the SQLs to focus on, and also give you great clues on how to go about tuning it. We're going to talk about gathering details about the SQL that you're going to tune. We're going to look at the explain plan and examine the best execution plan and how to find that. I use SQL diagramming or query diagramming. Anybody familiar with that? Okay, good. Well, good. We'll have a few people in here. It's a great way to give you clues to find the best execution plan, so we'll go through that. It's kind of like a little scientific way for you to think of how to tune without having tools tell you. And then finally, we're going to go through some case studies. And then at the end, we'll talk about monitoring to make sure it stays tuned. Okay, so let's get started. Let's face it, I think SQL tuning is hard, and the reason why I think it's hard is because I believe you have to know a lot of expertise in many areas. You need to know how to write good SQL for one thing, because a lot of times when you tune, you may have to rewrite it. I also think you need to know how to read that explain plan and know how MySQL is going to access that data off-disk and find the best way for each SQL statement. It's not always the same way. And it's not always, you know, so I think you need to know that. I also think you need to know the business reason for that SQL. You know, what's its purpose? Who's running it? How often should it run? I can't tell you how many times in my career I've worked with developers and they've, you know, done some SQL, put it into production, and then they were just amazed that it's running millions of times a day when they thought it should run once or twice a day. So how many people are developers here? Just out of curiosity. Okay, DBAs? Do about anything? That's what my Twitter tag is. Okay. Yeah, so, yeah, so a lot of times the business side of it is really important to know. Years ago I worked for a manufacturing firm and every month then they complained about the database. It's slow. And it's like, okay, what's going on? So we started looking around and here this accounting clerk was running these reports, horrendous reports. They're taking eight, 10 hours to complete. And we started looking around to see who was using them and no one was. She inherited a job from downsizing, was instructed to run these reports and file them. Nobody even looked at them. They were obsolete. That was the quickest way to tune that situation, just to shut them off, you know? It's always not that easy. But tuning takes time. Large, you know, there's a lot of SQL statements in a database, you know? I've seen Peter's presentation, you know? What do you say, a million SQL statements a second or something can go through? There's different SQL statements running in MySQL. And not only do you have to find which one to tune to get your biggest bang for your buck, but your solutions to tuning it is not going to be the same. There's no cookie-cutter approach to each, to any different SQL statement. Low priority in some companies is another challenge. A lot of you, how many have vendor applications? Not too many, good. Because a lot of times they tie your hand and you can't tune them. Or a lot of people, how many have tried to throw hardware at a problem? Did it work for a little while? I bet you come back to it at the end. Inefficient SQLs will eat up resource. And the bigger the data grows, the more it's going to eat up. So sometimes that's a challenge. Challenge of tuning is also never-ending, but that could be a good thing, especially if you're known as a good tuner in the company. People will seek you out, and that's a good feeling because you know you're affecting something in the company, at least. Who should tune? That's another challenge. Developers, you think they should tune because they wrote the code. They know the application. Oftentimes though they're not focused on tuning, they're focused on a deadline. You developers out there, how many of you have been given a deadline before you've even gotten the functionality that's needed? Yeah, yeah. I always love that one. You want it, when? Well, then what do you want? But okay, so they don't have time to focus on tuning and oftentimes they don't have an environment that looks like production. So even looking at it running in a development environment might be the wrong plan that it isn't working in production. And a lot of times, and I worked at one network company, they were Java developers, and they used Hibernate to create all their SQL. Awful SQL, but they didn't know how to write SQL, so that's kind of a gotcha there. DBAs, you think they should tune because they know, they should have know how to write good SQL. However, they're often focused with keep the lights on. In this day and age, do more with less people. How many DBAs here are responsible for 50 instances? How about more than 100? Okay, I mean, I've seen some awful, some DBAs having an awful number of database instances are responsible. So finding a SQL statement within a database and tuning it, that's a time that they just don't have because they're trying to keep everything up and running. So I think the best thing is to do a team approach. And I really do think tuning should be a project. I get mad at DBAs when they go and shut their door and tune for an hour, don't tell anybody what they're doing. Because really, I think it's a team project, you should call it a project, get the end users involved, get the developers and DBAs, and actually tune it and show it to your upper management because it's the only way you're gonna get the biggest bang for your buck by tuning your databases with little time and little money. So great way to get back resource. As I said, finding the SQL to tune can be a problem. How many people have tuned something and nobody noticed? Okay, good, there's a few. Yeah, I mean, I kind of wonder if you're working on the right things then. If nobody notices when you're tuning, well, okay. I've used these methods in the past to find which SQL to tune. The top one there is easy. You know, end user complaining, you go about tracing their session or looking at what they're running and actually working through the SQL they're running. Or if a batch job is running longer, you look at the queries within it. But if you wanna holistically tune, I've used these other methods too. I often will sort my queries by IO. I'll look at the rows examined versus the rows effected or sent and compare that. And if I got a lot of rows examined for the results that they're returning, then I know they're inefficient queries. And I'll tune them and reduce that down. You'll see me use this in my case study as a metric to compare against as I tune. You might wanna find your high-consuming, CPU-consuming queries, because if you can tune those, you can actually give resource back for other processing. And then finally, what I'm gonna talk about here is high response time. DPAR product uses that, but basically we're gonna look at the performance schema and the information schema and get some good information on that to find not only which SQLs we should tune, but we'll get great clues on where to start tuning. What is response time analysis? Well, if you think about it, most historically, most databases look at health metrics. They look at their statistics in the database. They look at their server and see what that is. And the conventional tools often cause finger-pointing. It's one of the monsters we call, Blemosaurus is one of the monsters in our database, because conventional tools just don't get you to the root cause as quickly as looking at what the end users are waiting on. This is my only funny slide, but I thought I'd put it in here because these are monsters and I relate to them because I actually have a few more that I could put in there, but you've got the performance hog, the one that's hogging up all the resource and they are having to wait because of that performance hog. Or you've got the virtual vandal. If you've actually virtualized your databases, you don't know what resources you're actually getting for your database, because they're being taken from you or moved around on the fly. You've got the time sucka, trying to find the right SQL to work on. And then finally, the query blocker who's blocking everybody else in the database and they're just sitting and waiting. Blamosaurus we talked about, but look at that last one, it's a dev obstacle. And that's most of the DBAs that have taken on bad code and are afraid to put it in production because they don't want any more messes in their production database. So anyway, you can probably think of your own monsters that are happening in your databases, but just think about them that way. Response time versus health metrics, we'll just think about it. Well, how do you figure out the fastest way to work? Do you look at your gas gauge or oil? Maybe you look at your speedometer, but how does that help you if there's a wreck up a thread or a slow school bus or maybe you've got that one stoplight that you swear that keeps slowing it down your way each time. You probably use navigation tools like Google Maps or whatever to see that and try to go around that roadblock and try to find the fastest way to it. And so if you think about that, that's what my SQL and most of the database types that I've worked with in the past, they've instrumented themselves, well, either what we call weight or thread states. And so as you think about a SQL request going through a database and it does hundreds or maybe thousands of steps. Well, these databases not only record off the time each step took, but it also records off the resource that either weighted on or used up. So then if you think about that, if you collect that data at that detail, you can quickly see where all the time is being spent for your end users as well as getting clues on where to focus on tuning. If it's spending all its time on IO, would you want to add more CPU? I mean, you could, I bet it wouldn't help you. So you need to know what resources and where to focus. And just to show of hands, anybody on five, five still? Okay. Earlier versions? That's good. When 5.5 is when the performance schema came out, actually, and actually you can see down here at the bottom I have, it had 17 tables in the performance schema. At the beginning of 5.6, people, everybody on 5.6 or most everybody? Okay, 5.7? Anybody on 5.7 yet? A few? Okay. Well, 5.6 and above, they greatly improve the performance schema. It used to be a kind of a performance hit when you access it. It's not so much anymore. It gives you more information. In 52 tables, it has a 5.6. It gives you current historical events at all levels. You can get at the statement level, the stage level, and weights. So MySQL has actually instrumented itself so granular at the pico seconds. To me, it's just amazing. It's one of the better ones that's actually instrumented that you can get all that good detail about what your sequels are doing and who they're waiting on and all that good stuff. And we're gonna go into that. Beginning at 5.6, performance schema consumers are now turned on for default. So no longer do you have to try to configure all that. And then the storage engine now defaults to N-O-D-B instead of MyISAM, which is great because it's that transactional data. Now you can get blocking, locking and all good stuff on exactly what your queries are doing if you're using N-O-D-B. In 5.7, if you're not there yet, even more improvements. It actually has 87 tables. They've reduced the overhead and the footprint greatly. So you can use it. You can actually pull that and get good information back. In fact, they've instrumented in 5.7.10, that's what I've got loaded, a thousand and five instruments. And you can turn those on. They come out with a number of them on by default. But now they've instrumented transactions, metadata locks, memory usage and all good stuff like that so you can get really pinpoint where your sequels are running. When I say thread states, basically, I can't remember how many thread states they have out there. It's hundreds of thread states that you can actually look at. And each one of these thread states are documented, not only what that state is doing, but what solutions you can do to actually reduce the time spent on those thread states. So here's an example of sending data. And this is kind of one that's used a lot. And I can kind of zoom in here and you can see sending data is when my SQL tends to perform large amounts of disk, usually doing read. So it's saying who should resolve it, but you can get solutions. Basically, look at number five there. You look at your rows examined to your rows affected. If it's more than 10 times that, then you need to tune that SQL statement and it gives you solutions on how to. Now, the manuals have that out here. This just happens to be an example of what's in our tool, but you can go to the manuals and get the same information. So that's what I'm talking about, that you're recording off when you query those performance tables. And then I skip this slide because I thought I had, I have, it's coming up. Okay, so here's just an example. We had a developer in our office that decided that he was gonna look at each one of those steps. And just to show you how granular my SQL has instrumented itself, remember I said it instrumented the statements, the stages and the weights? Well, he actually issued this one command, it's select, let me go up there so you can kind of see select visibility from link DB, link table, it's just a test. Of course, he's passing some parameters and he had it for an update there. And you can see all the stages and weights that it went through. You can see that the first it did an init, you can see the event time there, the stage statement time, and then there's the actual thread state, okay? You can kind of look down here, I'm not gonna go through all these steps, but how granular each step and how much time it's spent. And you can kind of see here, there is my weight of 11%, a weight I host table SQL handler, which means it's going to disk. You can see that it's doing a fetch, okay? Above that was 17% of the time we spent on statistics. And that was coming from the statistics stage there. So knowing that data and concentrating on those expensive steps is a great way to tune because that's gonna quickly point you in the right direction on where to focus to tune. How do you get this information? Well, this is the performance and information schemas. Now, I haven't listed them here. I can kind of go through them. In the information schema, and if you haven't, how many know what that is? Anybody not know what the information and performance schema is? Okay. The performance, okay, the information schema is metadata about all the objects that are sitting in your instance, your MySQL instance, okay? It's gonna give you table definitions, constraints, all good stuff like that. The performance schema is a real time database, if you will, that records performance information as it's happening. Now it does it in current tables, that real time that you can look and see what your threads are doing right this minute. And it also has historical and long historical tables that give you a little longer time of history. So you can get good information from this. Now I've only showing you the current tables here. So in the process list, notice I can get the process ID. I can join that to the performance schema's thread table. And from there, I can get user, host, DB. I can get the command it's running. I can actually get the time. And you only wanna look at your active sessions. So the state there will either be active or idle. And you don't wanna pull idle sessions. And then finally you can get, if it's been instrumented or not. Now you take that threads table and join it into the performance schema event's weight current, which will not only give you the event name, but if it's working on an object, it'll give you the schema, the name, if it's an index, it'll give you that, and the operation that it's doing. Question? No, this is just the performance schema within MySQL. Yes, this is MySQL work, bitch. Yeah, yeah, and I actually go into that a little bit. So I'll talk about that a little bit later. But then you can join threads into the event statement current, which will give you the digest, the SQL text. Notice my rows affected, rows sent, rows examined. If I wanted to just get a quick hit, I could go to this table and sort by rows examined and bring back rows sent or it affected and kinda do that comparison. I could get my top SQL statements right there and know where to focus, because what is the slowest component in a computer? Anybody care? Disc access, IO, yeah. So physical disk is the slowest. So if you can reduce that, you're gonna speed up your queries. Okay, so then from there, you can join the threads into the InnoDB transaction table, which is in the information schema. There you can get not only the SQL again, but the tables it's using as well as if it's locked. There's any blocking going on. And that's where you can dip into the InnoDB lock weights. That's in the information schema as well. And you get the blocking transaction ID and get the thread for that. So you can actually see the blocker and the waiter, which is nice if you have locking blocking. How many people have locking blocking problems in their databases? Okay, yeah, it's nice to see the blocker when you do that. So basically if you were take these five or six tables and this is not all the columns by any means. And like I said, there's 87 now tables in the performance schema. So if you are not familiar, it's worth it to go out there and acquaint yourself with it because there's a lot more information. But if you wanted to do response time analysis, you could just create a query that goes on these tables, pull it at some interval, put it in another table, and then you can rank your sequels. We do it with a timestamp. We pull every second this information, put it in a table, add the timestamp, and then you can sum it up and see, okay, not only, I know how long it's spent, total time, but I know how long it's spent on any thread state. So I get the expensive thread states and start tuning them. Here's just a quick screen of how response time analysis graphically can show you real quickly. And I actually tuned this one. What this is, is basically a chart of 24 hour period. Basically the different colors are the different sequel statements. And you can see here, I've got this kind of olive green one that's running, spent an awful long time. I don't even know what it spent, what did it spend on this day? 10 to 11 a.m. It actually spent 35 minutes out of the 63 minutes of all the activity of the top 15 sequel statements. So pretty hefty, 57% of the time. So if I tuned that, I could give up to 57% back to my top 15, right? So you can see that in that hour, it executed 1,050 times. And average execution was one, two, two, or a little over two minutes each execution. It's pretty sluggish. That's my top sequel, and I can see that real quickly just by putting in that one table and pulling it back. And actually I looked at it, I tuned it. And what response time analysis can do for you can give you really quickly how well you've tuned it. So here it is after tuning it. It's now taking 12 minutes, 5,000 executions as opposed to a thousand. So not only did I reduce it to nine seconds per execution instead of two minutes, I increased the throughput five times. So great things to brag back to over management. I'm doing my job, I'm doing my job, you know, that type of thing, I don't know. But just quickly, another blocking, blocking issue. If you have locking, blocking issues, and you go to those NODB transaction tables, you can get the blocker, how long they've blocked and who they blocked. And not only that, what they were running while they were blocking, as well as the waiter. And this is just an example of that, where we can see that this one sped, he caused 235 seconds of wait. You can look at his details, you know, you can find that. He was actually updating stock. These guys were all trying to select from stock for an update, so they had to wait. So it gives you a clue of where to focus if you've got blocking, locking issues. Here's a scenario for you. Which scenario is worse? I have a SQL statement, number one, executed a thousand times, caused 10 minutes of wait for an end user, waited 90% of its time on sending data. Or I had SQL statement, number two, executed once, caused 10 minutes of wait for an end user, waited 90% on a system lock. Which one's better, which one's worse? Does anybody care to guess? I'm sorry? For a speed. Well, which one, yeah, as far, whatever you think, which one would be worse for the end user? Or? Great, you answered it correctly. End user doesn't care. He's waited 10 minutes. Yeah. But developers might see number two as being more difficult because it could be a concurrency problem, you know, as opposed to sending data which may be able to be resolved by adding an index or, you know, restructuring the query a little bit. Okay, so we've got that SQL. We found the one we should be working on with the response time analysis. What do we do with it? Well, the first thing I do is I gather baseline metrics. How long does it take now? And then find out what's acceptable. Especially if you're getting to tune a lot because you wanna talk to your end users. If they have something running maybe 10 minutes, they want it down to a minute or two, that's probably okay. But if you've got an eight hour or 10 hour process running and they want it sub-second, it may be more than just a tuning issue. It may be a design issue. So set your expectations and only tune to that. Don't tune it to the nth degree because I'll tell you you'll burn yourself out. You'll get yourself into a corner and you'll never wanna tune again. So just set that. I always gather rows examined and rows center affected as well because I want a metric as I tuned to compare against. You know, if I'm tuning a query that has a million rows examined and when I'm done tuning it and now it's 10 million rows examined, I don't think I tuned it, right? I want that reduced. So you wanna have some metric. I like that one. There are other ones you can choose. And also gather the thread states. You know, locking, blocking, system lock, IO, ascending data, calculating statistics, of course, statistics, thread state, network down is writing to that or slow down. Many, some sequels may have more than one issue. What I usually do is I will start with the biggest weight time spent on which thread state. And to first tune that because a lot of times the thread states will change as you continue to tune. And just remember that all of the thread states, many of them have different resolutions. So when you look at them. After you've documented your metrics, you wanna get the execution plan. And basically MySQL gives you many different ways to get that. You can do explain, just a simple explain, or you can do explain extended. That's the one I like because it's gonna tell me how MySQL is gonna transform that query after I've entered it. So it's kind of a neat way to see maybe you can improve how you code. You optimize your trace is available in 5.63 and above. And I've got examples of all these. And then MySQL workbench. Oh here, I wanna back up just a little bit. Let's just go through explain plan to see how you get explain plan to work as you just put explain in a session and then give it your SQL statement. And this one I'm actually, downloaded Sikila, the little DVD rental store out there. I think Percona has it. But you can play with it. And so I had this load going. And so the select title, first name and last name from film. And I'm joining that with film actor on film ID. And then I joined it on actor on actor ID. And I'm looking for last name like percent peck percent or a first name like percent Angelina percent. So I'm just nonsensical query. But as you can see here, I use aliases. And so the explain in MySQL is pretty easy to read. You just read it top, top down. So the first thing it does is it goes into actor, which I've aliases A. And this is something, if you don't have to alias with MySQL, I wouldn't do it. Sometimes you can't avoid it because you're accessing the table twice. But if you don't have to alias, you doesn't repeat the table name in the explain. So if you have lots of tables, it's kind of hard to read. But anyway, that was just the reason why I had that circle there. But you can see the first thing it does is it's doing a simple type into actor. And it's looking at the primary key, but as a possible key, and it decides to use IDX extra full name. Okay, and then you can see the key length, the number of rows it's going to do, and it's filtering in on the where clause. Then it dips into film actor. And basically decides to use the primary, and it returns 13 rows there using index. And then it dips into film, again using primary and returns one row there. So you can kind of see how that works. What's neat about this next cheat sheet, at the end actually, this is kind of dated, but I liked this chart and I actually printed this out because it's a great cheat sheet to actually see all the piece parts within the explain plan. You can see here this column over here, you can kind of see, let me see if I can actually get there. This actually identifies all the columns and what they are, but then you can actually see that over here, it gives you the definition of the type. The type is actually the data access, actually the most selective to the least selective. So you can see the top one is system, and then constant, and the bottom one is full table scan. So just kind of a neat cheat sheet to look at. It also gives you a great way to actually see what's in the extra columns and what all that means. So I don't know, if you download my slides, I don't know if we post them, I'll figure that out, but if there's a place to post them, I will post the slides so you can have it. Here's an example of extended explain. And basically all you have to do when you do that is come in here and do explain extended. Now you can see I'm not using the aliases, so that's why it makes it a little nicer to read. Same plan, I'm not gonna go through the plan, but what you do is when you use the extended explain, at the end you do show warnings, and that's when it's gonna come back and it's gonna give you what my SQL transformed the query into. So if you have implicit data conversions, if you've done a select star, it's going to actually translate that into a readable for itself. And notice it actually put in the database in there, even though I didn't have that there. So kind of a neat way to learn how to write better SQL. I mean, why are you making it transform it when you might should write it in the first place that way, so just some thoughts. Optimizer tracing, anybody use this? I thought this one was pretty interesting. It gives you all kinds of good information. How you use it is basically, you set optimizer trace enabled equals on, and then you run your SQL statement, and then you set it off, and then it actually stores the trace information in the information schema called optimizer trace table. So you can select from that, and you see it kind of spits out a lot of information. In fact, it will tell you every join consideration it was going to do, and it'll tell you all the filtering predicates, it'll give you all good information about costs that it thought about and all that. It's quite wordy. And you can see this thing that I've got circled down here, missing bytes beyond max mem size, it only puts by default 16K in this table. So if you got a table with lots of joins, it's gonna fill it up right away and not report it all. So just realize you can actually change that setting, and there's a couple of them here. Just before I go on to that though, notice this is how you set it off, just optimizer trace enabled equals off there. Yeah, up here, let me go back up here, you can see set optimizer trace max mem size, you can set that up. So if you wanna see all of the trace, you can do that. Just realize it's in memory and you're taking up memory. So in production might be wanna be careful, especially if you've got a huge query that you're using. You can also set optimizer trace features, greedy search equals off, which will make it not as verbose. It won't go through all the join transformations that it has to consider. So you won't see those, but it'll give you a decent plan. In fact, I couldn't list all the plans here. I'm in the whole plan here because I couldn't get it on one slide, but you can kind of see the row estimates, the cost there of all the tables that it's gonna do and how it decided to use the plan it did. If you get a execution plan that you think MySQL is stuck on, I would trace it to see why because this is very useful information. MySQL workbench, another one, it's great. If you haven't used workbench, I'd recommend you download it's a neat tool. This is where you can do ERDs really quickly, which I'm a great believer in doing an ERD of the join of your queries, of the tables in your queries, because you're gonna see the relationships. And I can tell you, I can't tell you, I can't tell you how many systems I've worked on where the developer didn't know the full relationship. He got the right answer, so he quit. But you need to know the full relationships between the tables and put them in there. That gives the optimizer so much more information to do better queries. This is that same query in workbench. You can do tabular or visual. And what I like about workbench, it's gonna show you the expensive steps. And so it marked that red as expensive. It actually goes through and says, cost can be high, especially for large indexes. So, and then it goes in and tells you that. Okay, we looked at the execution plans. We know the expensive steps. So what do we do with it? Well, let's go get the table definitions. And first of all, is it really a table? Is it a view? You might have to first look at the view and tune the view. I like to use this one little utility called MySQLShow-dash, that is, and you can give it the database name, or you can actually get specific and give it a table, or you can even do database table column if you wanted to. But I just did the database here, and you can see it'll print out the tables, the rows that it thinks it has, the average length, the data length, the index length, and the data free there. So it just gives you a good quick way to see how many rows you're working with within a table. I also think it's important to examine the columns and the where clause. Know the cardinality of those columns. Are there indexes on them? And if there are indexes, are they multi-column? And if they are multi-column, what is the left leading? Because MySQL only use a left leading in an index. If you don't have that left leading column and a multi-column index in your where clause, it won't use it. Let's see. Also know if there's data skew, because that can make it do different plans at different times depending on the values you're passing. Okay, I think I got all that. Also know the cardinality of the indexes as well. How selective are they? And their sizes, because actually MySQL will look at the size of the index and kind of make silly decisions sometimes. And I think I've got a case study that shows this where I didn't even reference to the column that the index was on, yet it used the index. And the only thing I can think of is it was the size of the index. And why it would do that. Okay, so we've got all that good information when we haven't tuned anything. So I like to use case studies to actually go through and show how to tune. And I think it kind of brings it home. And plus we'll talk about SQL diagramming. Okay, the first case study here deals with, and it actually was a real-time customer that called in and asked us to help him. It was a university. And they were trying to answer the question of who registered yesterday for their billing system because they billed their students as they signed up and all that. So I changed it a little bit and all the timings are on my little laptop. But it actually did come from a, actually this was an Oracle database, but I kind of transferred all the data and just to see if I could do it in my SQL. And it actually behaved just the same way. So anyway, we were trying to answer the question who registered yesterday for SQL tuning. And we're selecting student name, first name, last name and registration sign-up date from student. We're joining that on registration on student ID and then we join it to class on class ID. And we're looking where our class name equals SQL tuning and the registration sign-up date is yesterday, if you will. And then where our canceled equals in. The registration table was historical in nature. They never deleted from it. If a student dropped or canceled his class, they'd just mark it with a Y. So it was a very large table. When I ran this on my laptop, you can see 9,320 executions an hour, average execution time, nine seconds. Look at my rows examined. And look at the rows I got back. And look at what it was spending all its time on. Sending data, 99% of its time sending data. Actually what I did is I used my SQL slap. Does anybody use that? It's kind of a utility out in Linux that you can use. And you can say, okay, I want these many connections. I want you to iterate over this query these many times. And you just give it a query, or you can give it a script. And then what I did is I actually put it in a loop and I changed the date. So this date was kind of looping through the last year or whatever my data set was. So it was never running on the same day just over and over again. And so that's why it had zeros rows because nobody registered on the day that I looked at. Let's look at our response time data. Well, here we can see it's spending all of its time on sending data. There's a little table metal data lock that I'll get to waiting for table metal data lock there. But you can see it was spending almost 10 seconds. It dipped down a little bit. You can see the throughput here was going up and down as amount of time that that's the executions. And then finally you can see the rows sent. And like I said, the days were changing each day. And there was zero rows, nothing had registered. And we didn't get until probably around where it spent more time, almost 13 seconds on average. We're looking at hourly data. But it actually throughput went down but it started actually getting data, if you will. So it actually had a result set that came back. So you kind of see how it was running. Pretty sluggish. We go get the execution plan and what MySQL Workbench said, you've got a very high step here. Full table scan on student, almost 9,000 records. So and then we're gonna do nested loops into registration using primary. And we're gonna get four rows. And then we're gonna look nested loops into class and get one row. And of course, it's very high cost, large tables, no usable index is found on this table. So it's telling me all that good stuff. Well, how do I know how to tune it from looking at this? Other than the little thing about, might be very high for large tables. Well, that's where SQL Diagramming comes in. And if you don't know SQL Diagramming, I don't know this man. I probably sold him a lot of books because I always have him in my talks. He says, I really believe in SQL Diagramming your queries. And it's a great book to download and use. He not only goes over this, he's just a great tuning reference. But basically what SQL Diagramming is is you're gonna take your tables in your query and you're gonna draw on an upside down tree. And what you're gonna do is you're gonna take your detailed table as your top as which in our case is registration. And then you're gonna draw links or arrows out to your lookup tables or your master tables, if you will. So if you remember our query, our detailed table was registration with a lookup into student and a lookup into class, right? Okay. So then the little numbers on the side, I'm sorry, these little numbers here on the arrows are basically your join condition. So for this relationship, we're saying for every one student, he probably has about five hours of courses in registration for this year, right? That's probably a decent load of university for a semester. And then for every one class, there's probably about 30 students. Probably not so for a university, but stay with me. Okay, so the join criteria basically how you do that is for every one student, I should never have to read more than 150 records out of registration. If I'm reading a million, I've got a problem. Okay, so that's what you do with the join. The little numbers with the underline here are basically your filtering criteria. And if you remember, we had two filters on it. On registration, we had where registration signup date was between yesterday, if you will, and where our registration canceled equals no. And out of that, we had 4,228 records out of 79,981 records in the whole table. So if you look at that, about 5% of registration would have to be read with that filter, right? Our other filter was on class where class name equals SQL tuning. Well, we had two records come back, probably different levels of tuning, out of 1,000. Well, 0.2% of class would have been read with that filter. Which is a better filter? The more selective filter. Class, right? Okay, so we wanna drive this query by going to class first. Because remember, I don't know if I said it already, but really when you're looking at tuning queries, you wanna read the least amount of data and then build upon it. As opposed to reading all the data everywhere and then throwing it away at the end. I mean, isn't that a waste? So we wanna find the driving table. So I think class is our driving table. So let's go look at our ERD, like I like to do, and find our relationships. And you know what? No relationships. Well, right there is a big red spot here. And remember, I said I got this from Oracle? Well, I transported the data over from Oracle and forgot to bring the keys. Now that would have given my SQL, because my SQL uses foreign keys very good. And so, I don't know, how many of you use constraints or do not use constraints would be a better question. Okay, well just remember, if you don't use constraints, you're actually kind of tying the optimizer's hands because that's giving him more information on how to go about creating explain plans. So I didn't, I forgot to add the constraints. And none of that, when I showed the indexes, you can see I have an index here, primary keys. And my primary key in registration is on student ID, class ID and signup date, okay? I have a primary key on a student ID out of student and a primary key on class. So let's go add those foreign keys. And here you can see I'm doing altar table registration, add foreign key on student ID. I didn't give it a name, it'll create its own name. And I reference the student ID and I do the same thing for class on class ID, references, class ID. And then I make sure they're there this time by selecting from information schema key column usage. You can do this to see your constraints. And I'm looking for table schema equals CSU and the table name equals registration. And you can see here, I not only get the table name, the column name, but the constraint name that it automatically gave to it and what the reference columns are. Let's go add the index. I'm gonna add it on class name because remember I wanna drive it by class. So I create index on class name or class NM on class name, rerun the explain and you can see now I'm driving it. Very efficient, class name, two rows, nested into 38, nested into student with one row. Much less than that 8K it was initially reading the first time I did it. Did I improve it? Well, here we go. As you can see, it was running really sluggish. Here we can see that 13 upwards of even 22 seconds up there at the top at the end. You can see that little metadata lock. That's when I added the four keys, that little purple sign. I don't know if you can see that very good. But notice my class name when I added that index during that one hour after that it just went through the roof. Executions per hour, 86,000. I think I was doing nine before. 9,000. 432 milliseconds per execution. 2.7, remember it was like over three quarters of a billion it was reading before with no rows. 2.7 million examined and 1.1 million returned. So pretty good. That's usually the, you don't want it more than 10. You want it way less than 10. Still way of doing most of its time on sending data but look at very little in a millisecond. So great way to see that we've tuned it. I'm running out of time. We'll make sure, okay, no. Second one's just a little more meaty. I actually created, like I said, I downloaded the Secila database. This is a DVD rental store. And then I created these queries. They're kind of just my contrived. What I have is a master slave environment and what I set up is on the master they were doing, I had all these customers running and paying overdue payments and all this good inserting, updating type of stuff. And then a slave was doing all the reporting. So this happens to be on the slave and we're looking for overdue DVD rentals by customers for last month because sales reps needed to call them and say, come on, return the DVDs, will you please? And so this select statement is actually looking at customer last name, first name, phone number, getting the film title and the rental date. And it's going from rental and it's dipping into a customer on customer ID. Then it's joining to address on address ID and then it joins to inventory on inventory ID. Then finally joining to film and then it's looking for rental return date is null because it means it's still out and rental date and it actually does an interval with film rental duration and looking for the last 31 days to see if it's still out there. And then it's looking at customer last name and either passing that. And when we ran this, it's ordering by rental date descending. So when we ran this, it was 950 executions per hour taking on average two minutes each execution. Again, look at my rows examined. 424, so about a half a billion rows examined, 59 million returns. But still pretty sloppy there with all the rows being read. And of course, sending data. And you'll see, sending data is gonna be the thread where if you do have IO intensive queries, it's going to mostly be on sending data. Now I have a really poor VM environment that the sand is horrible, so all my queries are on that. So it may not be that way for yours. This is the response time data and if you remember this chart, it probably looks familiar. This is the one I tuned and that was a query I was running. But notice it was, here's just the sending it, the actual executions two minutes on each execution. You can see taking a lot of time. It was the top one in my database. We go get the explain plan again. And here you can see, of course all the good stuff, but here's the explain plan. It first dips into film. It doesn't use any keys and it reads all the records, full table scan. So 3679 records. Then it dips into inventory. It uses that IDX film, foreign key film ID, gets one record. Then it dips into rental and it gets 49 records and it's using aware. And then it dips into customer using the primary keys as well as in the address, the primary keys returning one record. You can see the transformation there. I always think that's easy and good to look at. Let's diagram it to see if that's the way we should really drive that query. So here we've got five tables here. Our very detailed table is a rental with a look up into inventory and customer and then inventory has a look up into film. Customer has a look up into address. And you can see we're saying for every one film there's three in inventory and for every inventory we have about 94 rentals. And we're also saying for every customer there's one address and for every customer there's about 91 records in rental. So again, you can do the math and figure out how many records you're going to join. In all actuality, if you make sure your join columns are indexed, left leading, however you've indexed and make sure that it can use that index, MySQL is going to do the right thing. You don't need to figure out the join data. Just make sure they're indexed. What really matters, and in fact, when I do SQL diagraming, I don't even bother with the join numbers. I just go for the filters. Because what really matters is you're trying to find the driving tables and the filters are going to do that. So we can see we had two filters, one on rental, one on customer. Customer was more selective. We can see here it was 0.2% on customer. And how I did that because I was doing a like is I got the average count of last name and I counted them just to see what would be the average to kind of figure that out. So 32 similar names were in there. And so that was 0.2% versus the 7% of rental looking for return date. So I want to drive with customer. So let's go get the ERD. And lo and behold, there is an index on customer. Well, why isn't using it? Correct, correct, sloppy coding. Sloppy coding because it's not going to, can't do the B3 because it doesn't know where to start. So it can't use the index. It has to do a full table scan. Other ways to fix this, we might be able to concatenate a field if you have to use this percent. You can concatenate, use a multi-column index and try to maybe put it on rental date. Well, you can't use return date because it's looking for isNull so it doesn't keep nulls in it unless you concatenate that with another column. So I just know this was me being a sloppy coder. I put it in my little load routine and just a percent and a parameter and passed different names to it. And so I fixed it. Just by taking it out. And so when you do that, I mean, you see that now it's going to use the index. And so instead of that 4,000 records or so, it only looked at the 32 that it needed and actually got in there. So did we tune it? This one was kind of funny because yeah, we did tune it and I kind of forgot it because I had it iterating a thousand times. Well, I sped it up so much it got done. So you can see it actually, I missed a couple of hours because it wasn't running anymore. I was like, oh, I did pretty good because it went through the whole load. Okay, last one. And I know we're running out of time. This one is just more meaty. I did 11 tables here. And it was trying to add, it's mainly for us to show you how to diagram, but a weekly sales report by category by region. And again, you can see all 11 tables here. Very sluggish, taking five minutes each execution, 110 executions when I ran it without doing anything to it. You can see this is where it was kind of funny. I didn't even reference this column. So it dips into film, but it's using this index IDXFK language ID. I didn't reference the column. So I don't know why it decided to use that index other than the size of the index. I don't know. Then it actually dipped into film category and all that. And as you can see here, again, reading all the tables and then throwing them away at the end. So look at the number of rows examined too. 233 million, the rows affected was 2,000. So again, very inefficient. This is where I was going, why isn't it using the primary key? It seems like that would be a better way to do it, but because that's more selective, the cardinality, primary key. This one, it had two values in it. So maybe that was why, and maybe it thought it was equally distributed and all actuality, there was data skew. So probably not a good reason to take that. Here's diagramming that out just to be more complex to show you how you can do it. Again, notice I don't put any joins there, I just go for the filters. And I'm saying rental or staff has the best filter. So we're gonna go put, oh shoot, I didn't mean to do that. We're gonna go put an index on staff. And when we do that, you can see that it rearranged it a little bit. It still had to go through a rental and look at some information and use those. But it actually really did improve it. The throughput, we went down to, oops, 940 milliseconds, most of it sending data and look at our rows exam. Instead of 233 million, we're doing 15 million now and we're affecting much more rows because we increase throughput. So finally, monitor is the last step. Prove that you're tuning made a difference, brag about yourself, because no one else will. And just keep remembering tuning for the next opportunity. Work on the right sequels to tune and shame this pitch for our product. Download us, we've got a two week free trial. There's a summary, I'm not gonna go through that because I know we got this raffle. But if you wanna stop by our booth, I've got this little infographic that it's a kind of neat little poster that's 12 steps in MySQL tuning. It's just a neat thing reminder. Stop by our booth, it's 531. You can pick it up for free. Or if there's some place you can download my science and you're a cat lover, there's a 12 step program for cats too. Thank you guys.