 All right, I just hit 11.30 so I can start now. I feel good. All right, so this talk is are you getting the best out of your MySQL indexes? If you want the slides, they're online right now and you can actually go to it. My name is Cherie Cabral. I am a senior database admin slash architect at Mozilla. So you'll see this logo on every slide because working at Mozilla is kind of awesome. Was it you, Demetri, who said that? No, I think it was Ravi who said that working at Mozilla, because people will be like, oh, is it as fun as you think it might be? Because sometimes you're like, oh, it must be really fun and they're like, no, it stinks. And I was like, okay, you know how fun you think it might be? Now multiply that by 10. And that's how awesome it is. And then somebody said it's like being a bartender. Not only do you love your job, but like everyone loves you because like you're having other drinks. So unless of course you hate us, which is the other. I can't believe it. You know, I love Mozzarella Fox Fire. That's great. How come, you work for browsers. How come Chrome doesn't do what I want? And I'm like, oh, I'm a database administrator, so I don't even know why Mozilla doesn't do what you want. So anyway, you can tweet at me. I've already turned my phone on to silent, so you can tweet at me and try to interrupt me, but it won't work. And that's my URL, which apparently now the database is down on. So after this talk, maybe I'll take a look at it. So let's talk about what is an index, right? An index is an implementation detail, okay? If you go to the relational calculus or the relational algebra, which is a lot of fun because I'm a math geek, you might not be a math geek, it might not be fun, but it's a lot of set theory and it's all sorts of fun. Venn diagrams, joining things, great. What's not in that theory is indexes, right? So if you go and talk to CJ Date, you're like, what's the best indexing thing? And he'll be like, that's implementation detail, who cares? And of course, that's the kind of thing that's actually going to make your database, it's going to make or break your database, how fast you slow your databases, how fast or slow you can access the data. And so while it may not totally matter to theoretical people, it actually matters to people actually programming stuff. So what's an index? It's a separate data structure in which stuff is put into, data is put into, so that you can easily access it. Now in MySQL, it is a separate data structure and we have to say you want that separate data structure. It's not just going to make it for everything because there's overhead. Whenever you do a write, you do an insert, an update, or a delete, it's going to change that index because you've changed data. So if you have an index on first name and you add another name, right, it's going to put something else in that index on first name. Who here is old enough to have used a paper dictionary? Sweet, okay, so I don't have to explain what that is. Now remember back when you used to use the paper dictionary and you used to have like a word that begins with S, like surfing, right? You would like open it to the middle and you'd be like, okay, that's M and so I have to go further and you do whatever, or you'd kind of open it to two thirds because that's where S was. But you would use the guides at the top. Remember the words at the top to tell you what was the first word over here and the last word over here. Those are indexes and you can actually think of MySQL's indexes like those. It's not exactly how they are but it's a pretty good start to get a sense of how they are. So I have a little rant about terminology. What's the difference between a key and an index? In MySQL, there's somewhat used interchangeably. There are some places where you could say either key or index and there are some places where you have to use key and you have to use index. What's the difference? Well, first of all, an index comes from, you know, you have an index and you're looking it up, that might seem obvious as to where index comes from. So where does key come from? Key comes from the fact that something is a key constraint, a very important constraint and that's actually from the relational, the relational calculus. You actually have those in that theory. You don't have indexes in that theory. Those are implemented in detail. You do have keys. So a key is a key constraint. You have a primary key constraint, a unique key constraint, or a foreign key constraint. And those are the only three. And so what those do is a primary key says this is how you identify the data. If you're a U.S. citizen, it would be your social security number. That's how the government identifies you. If you have a bank account, it would be your account number. If you have a credit card, it would be your credit card number. Usually, it's numbers because if you start to do names and stuff, my husband's name is Antonio Cabral, which is pretty much Portuguese for John Smith. So if you tried to look him up, you know, if he got stopped for a speeding ticket, right, they would like, they looked up Antonio Cabral, they'd find a million people around where we live because there's a big Portuguese population, including like his dad, who has the same name and the same middle name, right. So it's not a really good, unique identifier, but his driver's license number is unique. A unique key is like a primary key, but it's not the first identifying one. It's also another way to identify, identify something. So you might have something that's a primary key that just has a number, but then there might be some set of like four or five values that make it unique as well. You know, some kind of, you know, sales order and timestamp or something, for example. Foreign keys. We'll talk about a little bit later in detail, but basically, they're kind of a parent-child relationship. So you could say, in one table, let's say you have a customer, you don't want to keep duplicating the information, you would say, well, this customer ID refers over to the customer's table, and it has to be in the customer table, otherwise you can't make a sale to that customer. And everything else other than keys is an implementation detail. So I have a little rant sometimes where people, you know, interchange key and index. Don't worry about it. It doesn't matter, right? Use key, use index, people are going to know what you mean. But if you want to know the technical terminology, right, a key means that it's a constraint on the data. An index doesn't constrain your data. You can put anything you want in an index, but if you have a primary key or a unique key, you can only have that piece of data once, right? So in your table, if you have a primary key on social security number, you can only put one row with that social security number. Same with unique. With foreign, it has to refer to something in the other table. If it's not in that other table, you cannot put it in your child table. So it actually constrains your data. It limits it. Indexes don't limit your data. So what's the plural of indexes? Anyone? I heard indices. Anybody want to go with anything other than indices? Okay. No, it's indexes, actually. It's actually literally indexes. And it's funny because I used to work, like, way back in college, I worked in the library. And in a library system, it's officially indexes as well. So forget your Latin. It's indexes. Deal with it. Okay? Is it Linux? Is it Linux? Whatever. Just is it my SQL or is it my SQL? You know what? It doesn't really matter. Some people will say, well, technically it's my SQL because my SQL, that's really the Microsoft SQL server, wanted people to call it SQL and blah, blah, blah, blah. And you know what I say to that is that if you look, if you really want to be technical, it's me SQL because it's named, you know, Monti Udenius named after his daughter. And his daughter name isn't my, it's me. So if you really want to get technical, stuff it up your whatever. So really, who cares? Don't be that guy. So this is actually a cute little Sheldon comic, you know, where some geek comes over and be like, I think you mean database. I'm like, nope, I have three datum bases, right? So don't always assume that you know what you're talking about when it comes to plurals or whatever, like, really, it's not worth the fight. So let's talk a little bit about more lingo. What is a simple index or a simple key? A simple index is the one that has one field in it. That's it. Right? So an index on last name would be a simple index. Compare and contrast with composite. A composite index has more than one field, right? So if you have an index on last name, comma, first name, that's a composite index. Composite indexes are totally loud. They're allowed in primary keys and unique keys and I think they're also allowed in foreign keys. I haven't done much with foreign keys in composite because it gets a little complicated. So let's talk a little bit about data structures. In MySQL, the data files are B-trees in MySQL, right? Unless you're using a memory table and then it's, you know, hash. So by default indexes also in MySQL are B-trees for EnoDB and for MyISO, which are the two main used ones. I'll do a little bit about B-trees. It can also be hash for a memory table and you can also specify hash specifically. You have to specifically say when you create the index that you want to be a hash for EnoDB. So this is what a B-tree looks like. Thanks Wikipedia for the image. And what you have here is you basically have this tree and you can see that going two nodes deep you can find any number that's in this tree, right? We don't have all the numbers because these might represent rows in our table or rows in our index, right? So it's 1, 2, 5, 6, 7, 9, 12, 16, 18, 21. And how it works is, you know, if you want the number 12 you would go to here. You'd say, okay, you'd start at 7. Say is it greater than 7 or less than 7? Less than 7 you go here. If it's greater than 7 you go here. If it's greater than 16 you would go here, right? So anything less than 7 is here. Greater than 16 is here. Between 7 and 16 is here. And you can have more than two rows or whatever, but this gives you an O of N implementation, right? With maximum two lookups, right, where you look this up and then you go here, you can find any node in this tree. So what are bee trees really, really good for? Bee trees are excellent for a range search, okay? And somebody asked last time about hierarchical data and there's this model, the model of nested hierarchical data actually kind of uses this kind of concept. So if you want to search between 5 and 10, you find 5 and you just walk the tree, right? So you start at 5, so you have to find 5. So you go to 7, it's less than 7. Well, I found 5 or I found where 5 should be. Let's say it was 4. You would say, oh, it has to be here. And then you just walk the tree, right? So 5 and then you go next, the next one, right? Because it's all pointers actually. So this points to 6 and this points to 7 and this points to 9 and this points to 12, 12 is too high. That's it. So it's really, really easy. It doesn't have to look up 5, 6, 7, 8, 9 and 10. That would be five different lookups. It only has to look up one thing and then stop when it gets bigger. So ranges are great for bee trees. And by the way, because of the same idea, MySQL was built on magnetic discs, or as I like to call them, spinning discs, right? Not flash, right? So flash and bee trees don't go as well together as spinning discs and bee trees. Because you're talking about where the head goes and so when you're talking about ranges, right? If your tables are, if your rows are in order on disc and you're just going one to another to another, you're minimizing your seek time. Really good for an equality match, right? You just look it up and you get it and you go. They're actually pretty good for a few equality matches. So if you want to get 5, 10 and 11, now what MySQL would usually do is it'll change that to a range search. So it will look for everything between 5 and 11 and then just filter out what's not 5, 10 and 11. Composite indexes. So let's say you have an index on last name, first name. How does that work? If you think about it, when you had your dictionary back in the day and you opened it up and it was a big book, it kept it on your shelf and your mom was like, look it up in the dictionary. So you did. It was really easy to find the words beginning with S or SU, right, surfing. Really hard to find all of the words ending in G. There's no index for that. So remember I said you could think of MySQL index as like a dictionary index? That's kind of how it works. It's really easy to find words beginning with say the letter G. So if you want to find all last names, right, that begin with G, you could do that. If you wanted to find all first names that begin with G, well that's farther into the index. That's like finding all the words that have a middle letter of something. This index won't help you do that. And that's how you kind of have to think of the indexes in MySQL. And so you can't use, you can't find words ending with G so you can't find words with the first name because it's not the first thing in the index, right? But it's okay if you want. You might say, well what if I want to look up a last name and a first name? My last name is Cabral, which as I said before is like the smith of the Portuguese world, right? So people are like, oh, your last name is Cabral. Do you know how Cabral? And like that is literally John Cabral. And you know, I'm like, yeah, I have six cousins named John Cabral and there's probably a million other people. So you might say, well, what if I want to optimize on that last name, first name search, but I also want to search on first name. First name's Sheary and that's pretty uncommon. It's like me and Sheary Rappaport who was on NYPD Blue and CSI and whatever and that's it. And I own Sheary.com because it's, you know, unique in whatever, the way that I, specifically that I spell it. So you might want to say, well, I don't want to have to bother with Cabral. I just want to look up Sheary. Well, you might say, I want an index in my table on last name and first name but also just on first name. So I could look up either by last name or by first name or by last name and first name. And like I said, it's like a dictionary index. So what does MySQL use indexes for? When would you use it? Why is it useful to have indexes at all, right? So you'd use it when you're matching a wear clause. It's called filtering. You're filtering stuff out, okay? Because if you're filtering full table scan and be like, okay, now let me start throwing stuff away. Eliminating rows. Resolving min or maximum values. If you want to get the maximum value here, you just go to the right of the index. You could go to the right of the tree, but the tree might be bigger, B tree. So this is how it looks like in your index B tree. So it's already sorted in order. You can also help with grouping. If you're grouping stuff together, having an index can kind of help if you're having an index on that grouped field because they're already grouped together in order, right? All of the words that begin with A are already grouped at the beginning of the index. And then if you have a query that can be resolved by only looking at the index, that's called a covering index. And that's also pretty good because it doesn't have to go to the data itself. It just, the query can completely be resolved in the index. So MySQL, what does MySQL not use indexes for? And this is really great to review when you have that kind of like, okay, I didn't explain. And how come it's not doing what I think it's doing? It should be doing. It should do this and it's not. Why not? Well, it ignores indexes for functions. So for example, if you have a timestamp column and you want to find all the sales you made on a certain day, you might say, well, let me find everything where the date of the timestamp column is, you know, August 11, 2012. I want to find all the sales then. Well, MySQL doesn't do that because it doesn't really know what date is doing, right? You really want to use where the TS column is between midnight on the 11th and 11.59 p.m. on the 11th. 11.59 and 59 seconds, whatever. That's what you want to use. Comment? Oh, okay. It ignores indexes for joins if the fields aren't similar. And what do I mean by the fields aren't similar? I mean, if they're not the same data type. Okay, now MySQL is very helpful. You know, somebody asked last session, what's the difference between MySQL and Postgres? And the answer was like, MySQL is easy to use and it really is user-friendly, right? You can read like a three-page web tutorial, like literally three screens full, and be up and running with MySQL. With the Oracle database, you need like a week-long class just to learn how to log in, or at least I did. You know, it's not so easy. You know, if you don't know how to query, how do you find out what your databases are? In Oracle, you have to learn how to do a select query and get from the metadata. In MySQL, you do show databases. It's that easy. You don't need to know SQL. The problem is that MySQL tries to make things a little too easy sometimes. So if you wanted to join two fields, and one was a date, and one was a timestamp, it would just convert that date to a timestamp. It would say, oh, you said August 11th and you're comparing it to August 11th at 10 a.m. Well, what you really meant was August 11th at midnight. I'll forgive you. But it doesn't really tell you it's doing that implicit cast, but that's just like doing a function. So for example, if you say a date column is equal to a timestamp, it doesn't implicit conversion, and that's just like doing a function. What else does it ignore indexes for? If you have a query with multiple where clauses, it's really one where clause with a lot of and, or with a lot of joining. So multiple clauses, not all using the same indexes joined by or. I'll give you an example. Let's say we have a test table with an index on last name, first name, we've done that a lot. So here's your test table. So queries that will use this index, select star from test for last name equals cabral. Select star from test for last name is cabral and first name is sherry. Because again, it will look at the last name. It will find cabral. And then it will go to the s's of cabral. So you'll find seraphina cabral and sherry cabral. What you won't find antinella cabral, because it's B instead of an A. Select star from test for last name equals cabral and first name is Tony or first name is Antonio. Because again, it goes to cabral and then it will find, you know, things beginning with T-O-N-Y and things also finding, you know, it'll do two kind of range searches there. But here's a query that doesn't use the index. Select star from test for first name equals sherry. We already talked about that, that if you don't have the last name, you can't do the first name. Just like trying to find a word where the middle or the end is a certain letter. Select star from test for last name equals cabral or first name equals sherry. So this is what I was talking about a couple of slides ago where I was saying that it's multiple where clauses, right? It's two different things you're trying to filter out, but they don't use the same index. Because first name by itself doesn't use it. So last name by itself will use that index, first name by itself won't. If this said and, it would use it, but because it says or, it won't use it. So let's talk a little about composite indexes. Let's say you have an index on last name, first name, middle name, okay? This is equivalent to having an index on last name, first name, middle name, having an index on last name, first name, right? So if you wanted to search on middle name, this wouldn't help. But if you wanted to search on last name, first name and middle name, it would work. If you just had the last name and first name, it would work also. So does this thing called prefix indexing? And it's also the same as last name. And this is why I said if you really wanted to look up both on last name and first name, where you might have any combination of last name and first name, what you might want to do is last name, first name as one composite index, and then first name as a simple index. Now, some of you might be wondering, why do you do your composite index as last name, first name, and your simple index as first name? Why not have your simple index be last name and your composite be first name, last name? And the answer is you could totally do that. But like your dictionary, right, if you're trying to find all the words beginning with s, that's a lot of words. If you're trying to find all the words beginning with q, that's fewer words, or x, a lot fewer words, right? That section's a lot smaller. So what you really want is your most limiting field first because that will really help optimize your queries. So what else does MySQL ignore indexes for? This is probably the number one source of why isn't MySQL using this index. It's there, it's got the right fields. I know I'm doing the query right. I'm using, you know, I have last name and first name and it's an index on last name. Why isn't it using that last name index? If you're trying to scan for too much data, MySQL will just do a full table scan. For example, if you wanted to look at 7, 9 and 12, okay? Really did it, really do that? All right. I hate it when it does that. It doesn't always come up right. All right. Let me just show you this slide like here. So you can see here, if you're doing a full table scan, that's six look-ups, right? You go to the 7 and then let's say you're doing a select star. You're getting some data that's not in the index, right? So here's the index and here's the data. So you find 7 and you say, okay, I found 7. Now let me go to the data row for 7 and get what I need there. Then I'm going to 9 and I'm getting the data row I need for there and then I'm getting 12 and I'm going to the data row I need from there. Well, that's six look-ups, right? You go to the index for 7, the data for 7. The index for 9, the data for 9, the index for 12, the data for 12. Versus you just start at the data tree. Forget the index tree and just do, you know, 1 to 2 to 5 to 6 and say, oh, 7 is what I want. Okay, to 9 is what I want, 12 is what I want. Okay, great, 16, 18, 21, right? It's a whole lot easier to walk that 10-node tree than it would be to do those six look-ups because you're going back and forth between the data and indexes, right? The comment is, so that's why the spinning disk impacts it as well. Exactly. So what is too much data? Too much data is about 15 to 25 percent. So think about that. You have a field and you have, you know, a yes-no question. It's about 50-50, yes-no. You know, did you like Sherry's presentation? It would be 50-50, right? It wouldn't be 100-zero. I would hope it would be, you know, better. But if you have a yes-no question, it's about 50-50, and you say, okay, well, I'm searching for everyone who said yes, and it doesn't use an index. Well, I have an index on that field. Well, that's why, because it's easier to actually do a full table scan. So here's the thing. MySQL usually does what makes the most sense. So if you're like, hey, how come it's not using what I think it should be? Chances are it's your expectations that aren't following, like it's not that MySQL is doing something wrong, although sometimes it is. But usually it's, you have to recheck your expectations. Are you looking for the needle in the haystack or are you looking for the whole haystack, right? Where the whole haystack is 15 to 25 percent or more. So it can be difficult to really, really optimize your indexes, and you may just say, hey, you know what, the best index is really no index, because it's not going to use it anyway. So why bother having the overhead of every time I do a write and insert and update delete to have to update not just the data but the index when it's not even going to use the index? And I didn't even talk about duplicate indexes, right? So if you had an index on last name first name and then also an index on last name, you might say, well, what's the harm? It'll only use one of them. But the last name first name one pretty much contains the last name one. So why would you do it? You can actually define more than one of the same exact index in MySQL. It lets you do it. Why? Because MySQL is very friendly. It's very wasteful to have two of the same index, because you're not getting any extra power out of it. You're just getting extra writes, because MySQL has to update all of the indexes when it does a write. So how is that 15 to 25 percent calculated? Okay? You and I might know that, hey, my data is exactly 15 percent everything. So why is there 10 percent every? There's 10 values. It's 10 percent of everything. How come it's not doing what I think it should be doing? Well, it's calculated by metadata. Yay, metadata. All right, I'm the only real... Come on, data geeks. You know you love metadata. And here's why you love metadata, because it was used in these kinds of calculations, right? So in MyISOM, the metadata is exact. And this is why if you've heard that MyISOM isn't so good for transactions or whatever, what happens is when you do a write to MyISOM, it updates information. It could update a table checksum if you turn that on, right? Which is actually pretty cool, because you could check from the table it comes back right away, because it's already a metadata. But like number of rows in MyISOM, the size of the data in MyISOM, and a lot of the index metadata is exact. So you know it's going to do the right thing with the query. In fact, MyISOM was designed years and years ago for what we might call now data warehousing. It was really designed for reporting queries. So that's why it does this, right? Because when you're doing a whole bunch of analysis and reporting queries, you want MySQL, the optimizer, to make the best decisions it does about queries, right? But a lot of us are doing transactional stuff where we want the sale to go through. We don't care about exact or not an exact. And in EnoDB, you have approximate metadata. So if you look at the number of rows in a table in MySQL by looking at the metadata, you do a show table status, it's not necessarily 100% exact. If you do a count star, you'll find a different number. Count star takes more time. In MyISOM, count star just brings back the metadata because it knows it's the same. So about indexes, MySQL has this concept called a value group. Explain what that is in about a second. And it uses the average value group size to make determinations about what it should use in indexes. So your average value group size is that 15 to 25% on what it will use. And it's used for the approximate rows that it's going to read for when it does a filter or a join. So if you have a table and you're like, I want to join on first name where the first name begins with S, it's going to kind of use this metadata to figure out, well, approximately how many things are with S because if every first name is with S, I'm not going to use an index. I'm going to do a full table scan, right? If you have an index on first name. But if not, then it will go to the index. So how does it figure out how to do that? It uses this average value group size. So let's talk about what this actually is. So let's say we have a group called body parts. It's a standard. And so that's what we'll use. So if you have a table of body parts and you say, okay, Shiri has two eyes and Shiri has ten fingers, two rows, whatever. So the average value group size here is six. Because you have twelve things and two different groups. Twelve values, two groups. Average value group size is six. Okay? Which means on average, you have six eyes and six fingers. Right? Which is about as good as saying the average person has like 0.9 testicles and 0.9 breaths, right? Like the average, that's not valid for anyone. Like I don't know anyone with six eyes or six fingers. Six fingers is probably more possible. But you know, it's certainly not the average that you would find among the population, right? So averages can be a little tricky. So this is not a really good optimization for either eyes or fingers. Right? Because eyes is way too, way too big for eyes. I don't know why I said longer and shorter. But the estimate is way off for both. So be careful when you're talking about average value group size. Right? If you do the people in this room, you're going to find the average value group size. There's 50 people in the room and there's 40 men and 10 women. The average value group size is 25. Right? For the genders. There's two different genders. Male, female. And we'll just go with that reality for now. And if you split it down the middle, it's 25. But if you search for the women, it would actually be a better to use the index. Right? Because there's fewer women. But if you search for men, it should do a full table scan. But in both, it will probably do a full table scan because it's saying, okay, half the date is male, half the date is female. EnoDB does do some approximations, too. It does some page dives to look at the information. And so this is really important when you're like, why is my, well, not using my index. It could be because of this average value group size problem. A composite index, you can think of it like a sorted array when we were talking about dictionary, first name, last name. It kind of sorts the last name, last name, first name. Right? It's like a sorted array where you have the last names all sorted and the first names all sorted. It can be ascending or descending. And if that image actually decided to show up, you can see, right? You can read from 1, 2, 5, 6, 7, 9, 12, 16, 18, 21. Now imagine this was like 1. And then over here, like, you know, kind of inside it, it's got 1, 1, 1, 2, 1, 5, 1, 6. Right? Because the first value is 1. Right? So all the last names have been with A and the first names are different. There might be the first name of John. Right? So it's all sorted in either ascending order if you read it this way, from left to right, or descending order if you read it from right to left. You actually physically cannot write an index in MySQL that sorts something where the first thing is sort of an ascending order and the second thing is sorting a descending order. You can't actually define an index like that. It won't work. If you do an order by where you order one thing ascending and one thing descending, it won't use the index for both. It'll probably use the index for the first part ascending, but it cannot optimize for that descending part. It has to do a manual sort on that because it won't come exactly from that. Let's talk about nulls. So Dave in the last talk, talking about not using nulls, one of the reasons to not use nulls is that you have to do an extra pass through the data when you're doing an index because nulls are stored differently. Nulls are not a real value. They're this kind of empty random thing and you really have to worry about nulls and equality. So why is this? So in theory, in SQL theory and practice, null equals x is not true for any value of x. If you want to test if a value is null, you have to do the is null, not equals null. Nothing equals null. In fact, null does not even equal null. You laugh. It's going to get a little more complicated in a second. So if a referenced value in inequality is null, mySQL immediately returns false because nothing can equal null. So if you say this really complicated query with 100 joins where x equals null, it's going to go real fast because mySQL knows nothing can equal null. Ha-ha, I've got you. Nothing can equal null. I'm going to return it fast right away. I'm now, if you're using subqueries or whatever, it might have to actually evaluate the subquery once for every row in the outer query. So it could still be pretty slow, but pretty much, you know, if you're not getting the right answer, this might be one of the reasons. You might have done equals null instead of is null. So the null safe operator is this thing. So if you want to say if something equals, if a value equals 10, but if that value is null, right, it might be false, whatever, so the null safe operator is that. There is no null safe inequality operator. You would have to do it like this. So here, this is another reason not to do equality. So for example, I just ran into this problem a couple weeks ago. I have a, I have monitoring set up so that we do checksums on the master and they replicate to the slave. So you can look on the slave and say, does the slave have the same checksum as the master, meaning is the data at the same on the slave and the master, right? Because the whole point of having a slave, you want it to have the exact same data as the master. They can get out of sync, it can get weird. And so in our monitoring set, we say does the master checksum equal the slave checksum? And you get problems when the master checksum or the slave checksum is null. So we actually have to say, like when we say show me all of the rows that aren't equal, instead of saying where the checksum of the master does not equal, right, bang equals the checksum of the slave, we actually should say this. Because if the master is null, it actually won't return because it's not unequal. It's weird. And we actually ran into a situation where we had a false positive on this monitoring check where we went to the thing and we're like, why isn't it paging? Oh, because it's null. And again, you have to remember that too much data feature slash problem. If you have more than 15 to 25% null, right, you might run into this kind of problem because of that. But let's talk about nulls and value groups. So who here thinks that all the nulls are grouped together in a value group? So if you have 10 nulls, it's one value group. Who here thinks it's 10 value groups? Who here thinks it's no value groups? Okay, who has no idea? Pretty much everyone, okay. Here's the cool thing. It could be all of those. And, and, and, but wait, there's more. You can actually change it on a storage engine basis. Okay? So if you want all of your... There's variables called enoDBStatsMethod and myISEMStatsMethod. And these variables are how you control how these work. So by default, both of them are set to nulls equal. And what that means is that all of the nulls form one value group. So you have 10 nulls, 10 value groups. Okay? I'm sorry, one value group. Right? So all the nulls are equal. And, and so let's say you have, you know, 10 men, 10 women, and 10... You don't know the answer, so you put 10 null. There are actually three value groups. Men, women, none. Null. Nulls unequal means they're not equal. So you have 10 women, 10 men, and 10 null. You now have 12 value groups. Okay? Why is this important? It's important because remember your average value group size. It's going to be really skewed one way or another. If you think your average value group size is, should be three, right? Men, women, null versus 12, right? Your average is really way off whether you divide by three or divide by 12. The other one is nulls are ignored. You might say, you know what? I don't care about nulls. I'm usually searching for non-null values. So what I want is I want for men, women, and null, I want two value groups, men and women. So you can do that too. But the fault is null is equal. So you can change that now. Unfortunately, you can't change this on like a per table base. Wouldn't it be great if you could change it on a per index basis? Well, wouldn't it be great? We can't. So primary keys are row identifiers. I don't know why I'm telling you this now. I probably told it to you 20 minutes ago. They actually can't be null. This is probably why I'm telling you here. They cannot be null. And one of the reasons is because in order to be orders these on disk in primary key order. So where would you put null at the beginning, at the end? I don't know. So primary keys can't be null, but unique keys are also row identifiers and they can be null. Now here's the thing. You can only have one null value. Right? Or you can have, I think you can have more because they're not equal by default. But you should try that. I forget if you can, you know what, let's do that now. Let's see if we can make a unique key and have it be null. Create table, actually, drop table foo, create table foo, id int, unique key. Insert into foo, values, null, null. Yeah, you can put in more than one. All right. Didn't want to give you more wrong information. So unique keys, you can actually have more than one null value because the nulls don't equal each other. Both the primary and unique can be composite indexes. That's fine. Foreign keys. Let me talk a little bit about foreign keys. Foreign keys are a parent-child relationship. So for example, a customer ID, in order to get a payment, somebody has to pay you, right? You might want to always know who gave me that payment. Even if the customer is long gone from your company, you may want to keep that customer's information around because you need to keep the payment information so you need to know, you know, what was that person's address when they made that payment or whatever. You can actually do cascading updates and deletes in MySQL. And so what you can say is, for example, if you delete the customer ID, delete all the payments. You know, just in case you need to start shredding documents, you can do this cascading kind of delete, which can be useful if it's something you don't actually need to keep. You just delete the customer and all their stuff goes away and then you never have to worry about it again as opposed to if you delete the customer and then the child table is still trying to refer to the customer can get a little complicated. So let's talk about foreign keys. I'm a database administrator which means people call me up and I have to go into the database and look at stuff when things break. So when I see a table like this, this customer ID and status ID, it makes me want to cry because I have no idea what customer ID 121 and status ID 1 means. Does that mean they're a free user or paid user? I don't have no idea what that means. Like, are they administrative users? I don't know. What's status number one? Now, maybe if there's only three statuses, I kind of know, but, you know, it can be difficult. There's no way to do a foreign key, right, where status ID on the left-hand table refers to status ID in the right-hand table. And if it doesn't exist in the right-hand table, I can't put it in the left-hand table. So, for example, if I tried to insert customer ID 126 with status ID 4 and there was a foreign key, it would say, nope, there's no parent ID of 4, so you can't do that. Now, what I prefer is something like this. We have a table, right? Oh, isn't this so much more readable? Isn't that great? If somebody says, okay, customer 122, what's their status? And I'd be like, oh, status 2, right? People would be like, the customer service guys would be like, that's not helpful, right? But here, so I would have to do a table join and I'd have to think about it, and what's the status table, and hopefully I named the status table something like status. But sometimes it's labeled like field 1. You never know. People are weird with things like that. If you do something like this, we have a foreign key to a table where it's just characters, right? It does have to look it up and stuff, so the lookup takes a little longer because you're looking up something that's 4 bytes instead of 1 byte, right? With a number. The number could be 1 byte if you have a tiny int, and this, you know, is 4 characters, which might be 4 bytes, it might be 12 bytes if you're using a 3, if you're using ETF8. So it can be, you know, it's a little long lookup, but you never have to do a join to get the information. So it saves a lot when you're reading the information. When you're writing the information, there's a little overhead. And so this, you know, again, it's kind of like using your index as well. You might say, well, this index is a little bigger, but it's actually more useful because you're reading more than you're writing. Question? That? Great. Yeah, so the question is, what's the difference between using that and an enum? Using an enum is actually easier. So an enum is an enumerated list, and with an enum, actually, you get rid of this status table on the right-hand side. Okay? So with an enum, I'm putting my hand over the right-hand side, but it's really clever. So with an enum, you'd only have the left-side table, and status, instead of being like a var, car, 20 or something, you would define it as an enumerated list, enum, and then you'd give it a list of enum, quote-free, quote-paid, quote-disabled, and so old-school, my-school. Here's the thing about, you know, either being an early adopter of something or something, you get used to how things used to be, and then things change, and you're like, you forget. You realize that they've changed, because you've always done it this way. With an enumerated list, it used to be that if you wanted to change the list, it was a completely offline operation. So if you have a yes or no question, or you have the status, right, and you have free, paid, and disabled, now you want to add an administrative user. You want some of the super user, right, some computer, some customer service guy who can go and look at everybody's records, because that's their job, you would have to take the entire table offline and build. That's not the case anymore. And when I say that's not the case anymore, I mean like, you know, for like five years that hasn't been the case. So, or maybe three or four. But, you know, certainly if you download a version of MySQL right now, whether it's 5.5 or 5.6, what you can do is if you want to add one enumerated item to the end of the list, right? So when I say enumerated, under the hood, what this actually does is the free status, the first one you put is actually index number one, and the second one you put is index number two, and the third one you put is index number three. So if you want to add something to the end of the list, right, which would be index number four, it actually will do that in an online way. You don't have to rebuild the whole table. The only time you need to rebuild the whole table of an enumerated list is if you're changing the order, right, because you actually have to change the rows. When you're just changing whether or not new records have to look up a new value, there's no work that needs to be done. The existing records are fine. And so the reason I don't mention enumerated lists is that it doesn't really have anything to do with, like, the foreign keys, and this is kind of a section on foreign keys. So if you are going to use foreign keys for something like this, where it's actually in another table, like a status might be in another table because there might be a description, or there might be some other things tied to the status, maybe there's more attributes that it has, then you might want to think about doing kind of human readable stuff in the database. Prefix indexing. In MySQL, if you have a text field, you may not want to index the whole thing. Do you really want to index the whole, a whole URL, you know, that's tons of characters long? Do you want to index an email address, like the whole thing? How do you want to do that? Well, it's for strings, and so what MySQL has is limits on how much you can index, because otherwise there's only so much you can store in the tree before it has to go to separate data structures on disk, right? And the more, if you have a separate data structure on the disk, then it's going to take more time to look it up. MySQL actually has a limit for tables, and the limit on tables for strings and for indexes is 767 bytes on EnoDB and 1,000 bytes on MyISOM. I don't know why they're different, but there you go. Now, when you say bytes, you have to be careful, because these are strings, and one character may or may not equal one byte. If you're using Latin 1, I think Dave said this before, if you're using Latin 1, one character is one byte, and that's the default, but if you start to use different characters, you know, non-ASCII, you might want UTF-8, which is three bytes. And I say three bytes, and you're like, no, no, no, Shiri, you don't know what you're talking about. UTF-8 is four bytes. Well, I know that, and you know that, but the developers a long time ago at MySQL decided to make this a little easier, just like Chinese characters really. They're not that many people that use it, right? Of course not. So, you know, who really uses, you know, all these 10 different Japanese character sets who really uses those, right? Nobody really uses those. So, UTF-8 is actually three bytes, and if you want the full four bytes, you have to go to something like UTF-16 or UTF-32, and... Yeah, it's a pain. So, you have to be aware of the character set, is what I was saying, because if you have, let's say you have your text field. I know a little HTML, right? Input equals, you know, whatever your text field, and you have 100 characters because you went along the URL. That might be up to 300, 400, or more bytes. So, you have to be careful, you have more than one of these in a table, and the good news is that MySQL is pretty good about the errors. If you do an alter table and you try to add an index that's too big, it will give you an error that says you can only have up to 767 bytes of an index. And you might be like, but I'm only using 100 characters, what's the problem? And then you have to remember it's characters versus that. Full text indexing. So, I did mention primary, foreign keys, whatever, and indexing. There's also full text indexing. Full text indexing doesn't do prefix indexing. What it actually does is it takes, like, all the words in a document and says, okay, if you're looking for the word sherry, it's in document one, five, and seven, and everything else doesn't have it. So, it actually just indexes the words themselves. It's only for car, bar, car, and text fields. And it was MySQL only until MySQL 5.6, which just came out, like, three weeks ago as general availability. So, question. Is there a word length limit? Yes, there are both minimum and maximum word length limits, which you can change in MySQL. The default for the minimum word limit is four. So, if you have a lot of TLAs, three-letter acronyms, you may want to change that. The other thing is that MySQL as a database system is pretty good, a lot of things. There are better ways to do full-text searching. Elastic search, things like that. If you have, or document stores, they were made for documents. MySQL was not made for documents. So, it has this kind of stuff, but it's not necessarily the best way to do it if you have a blog, if you have, let's say, Reddit. You're running some kind of a blog or forums where people are searching for words, if you have heavy full-text searching, you may want to think about something other than MySQL for that. Group by and sorting. I said before that MySQL uses indexes for helping in grouping and sorting. By default, if you do a group by, it also sorts. Okay, that makes sense, right? You have sales by state and you want Alabama to show first and then Alaska. That makes sense, right? But, it actually does an extra pass-through to sort the data if you do that. By default, it does extra work for you. Now, personally, that wouldn't be my way of designing. I wouldn't design it such that it does an extra sort. I would design it such that if you really wanted it sorted, you would have to say group by X, order by X as well. MySQL does that by default. So, if you're looking at your explain and you see in your extra field, you see file sort, file sort is the algorithm. So, if anybody took, you know, algorithms that you bubble sort versus file sort or that's what that means. So, it's not necessarily using a file. It's not the worst thing in the world because you're going to disk, but it might go to a file. There's another thing in Explain which will tell you if it's going to a file or not. But, it may cause a file sort. It is doing extra pass-through to data which does make it a little slower. So, if you don't want that, if you don't care about the order it comes back in, maybe you don't care about the sorting, maybe your application is doing some kind of, you know, parsing and sorting too. You can actually use order by null to do this. And the interesting thing is I probably went 10 years before learning about this. And it was because I did this talk and somebody was like, what about order by null? And I'm like, I don't know what you're talking about. And they're like, well, I heard this thing and I'm like, that sounds really dumb. There must be some reason they did that. This is actually the thing with MySQL is that you see stuff like this. You're like, this is either really dumb or really smart and I don't know which. There must be some reason they did this, right? Because it looks really dumb on the surface which may mean it's a stroke of brilliance. And sometimes MySQL has a stroke of brilliance. This time, I think, in my opinion, it's dumb. I asked a bunch of people and they were like, no, no, it does that by default. I'm like, why doesn't it do the not ordering by default? Because if you did a query and you did a group by and it didn't come back in order, oh yeah, duh, I didn't order it. It would be obvious that it widened to come back in order because I didn't have an order by. Here, it's just implicitly a little slower which is a little tricky. So if you don't care about the return order, you might. So knowing all this. Okay, so the question is if you want to do a different ordering than the grouping. So you want to group by state but order by the sum of the sales. So you want to know which state is the top producer or something. Okay, well the question was would it do two sorts and the answer is no, it would only do a sort. If you specifically didn't order by, it would only order by that one. It wouldn't do the order by the group. So knowing all this, use explain. And if you're not getting the index that you expect or the method you expect, check your expectations. And this is kind of a douche thing to say. It's like when somebody is like, oh, go read the manual. If it's not doing what you think, you're wrong. But frankly, I banged my head a lot about MySQL. I'm like, all right, that's because of average value group size. Crap. So you do have to check your expectations and say why is MySQL doing this? Now that being said, sometimes there are bugs. So don't always distrust yourself. In the worst case scenario, you make a bug and they say no, can you give me some of the sample data or whatever and you give it to them and they're like, oh no, that's actually how it's supposed to work. See this manual page. So now people might say, well, how do I use explain? Well, I'm glad you asked. If there's really time in the break, I can quickly go through it if you guys want. But if you really want, I've done this, I've done an explain talk before. So bit.ly slash explain video, all lowercase, all one word. Wait, don't take a picture yet. Because some people are like, you know what, Shiri, I don't like videos. They take too long. I don't need to hear you talking for 66 minutes on this. I want the slides. I do better with slides. Now you can take a picture. So bit.ly slash explain slides if you just want the slides or if you want the slides to follow along. That's what they are. The slides are a PDF, so take it home with you, read it on the plane, it's, you know, whatever. I don't do slideshare because I like to kind of put everything in a directory that I kind of control over and whatever and also it's good for offline viewing. Does slideshare do offline viewing? I don't know. Okay. So you won't find mine on slideshare just because I'm lazy. I like to put them in one place and one place only. Source of truth. And now I'm going to talk a little bit about index merge because if you do look at the explain slides, I talk a little bit about index merge. Now, MySQL 5.6 has changed how index merge works. So I'm going to tell you about how it works in 5.5 and then tell you how it works in 5.6. So on the surface, index merge looks really good. So what index merge is, is you can use more than one index. Let's say you have an index, only a last name, a simple index, and a simple index on first name and you want to search first name and last name. You can be like, oh, MySQL can use index merge. You can do both of them. It can do an intersection or a union of them. That would be great. That would be awesome. I don't have to worry about everything ever again. I will index every field. I will just put one index, simple index on every field and I'm good. And this really works really well in Oracle. So when I see tables designed like this, I'm like, here at Oracle developer, aren't you? Because Oracle actually really optimizes that kind of thing and you just have to put the Oracle database, not Oracle, the MySQL database. Just, anyway, so confusing. Thanks for laughing. I know you're still awake if you're laughing. So on the surface, it looks really good. It can just do it. But it uses more than one index, but it didn't really work very well and usually, so in the explain talk, it will say if you have it, it's pretty much an indication that you need it. If you have an index merge and it says, I'm using the index on first name and the index on last name, it's usually an indication that you need to have a composite index of the both of them together and it will actually work better if you do that. So it's usually like, if you see this, you can make it better. It's a warning flag. You can make it better by changing the index. It's actually better in MySQL 5.6. It's no longer like a warning sign. Now it's actually useful and good. So before MySQL 5.6, as I said, indicates you can make a better index and it was not always used even when it could have been used. And so if a range scan was possible, it would use that instead of index merging, right? So if you have an index on something here and something here, it would do a range here and a range here instead of do a range here and nothing here as opposed to merging the two indexes. And it may have merged more indexes than necessary based on the order it was doing in it. So in 5.6, it's better and it uses it when it should use it and it doesn't merge more indexes than necessary. And I don't want to go into the implementation details because it's complicated. It just has to do with the order in which it actually merged the indexes. Now, someone here asked last session, where can I go to find challenges on how to do stuff? Like, I know how to do queries and I know how to join tables. Where can I learn more? Where can I get hard problems to do? And there's actually a site called artfulsoftware.com that has some really good, you might call it like cookbook recipes and stuff, really complicated, like how do I know if a certain date is a Monday? Like, tell me what is it, you know, what's today, it's a 22nd. Tell me a February 22nd, 2213, is gonna be, what day of the week is it gonna be? And there's actually some things that you can do in MySQL to see that. There's also aggregation, how do I find, you know, highest ranked things or whatever and it's the artful software is actually a book about MySQL and it's actually a virtual book. So it's already updated with MySQL 5.6 and everything and you can, there's stuff for free and there's stuff that you can buy and this stuff is free. It's really awesome because if you're like, I wanna do this really complex thing, you can kind of look up the recipe. So if you wanna challenge yourself, artfulsoftware.com slash infotree slash queries.php. All lowercase, kind of all one word, there's no underscores or anything and it's really, really good for those kind of challenges. And then, you know, certainly feedback, whatever, I think there might still be some podcast CDs in the back. If not, you know, I like to give a physical reminder to people because I can tell people it's all blue in the face. Oh, rsql.com, it's the rsql podcast. It's like MySQL but it's rsql because it's a community podcast. But you know, are you really gonna remember it, right? Like your brain, your brain is only starting to get full now because it's Friday morning by Sunday. You're gonna be like, I learned something, I think. I don't know. I didn't write it down in my little notebook so I don't really know where to go. If you have a CD, you throw it in your backpack in like five weeks from now when you're cleaning out that backpack to go to like your next conference, you'd be like, oh, right, let me put this in my computer and see if I like it and then let me download things. So it is on feed burner and iTunes so whatever you use, whether you're open source or you know, you use whatever Apple tells you to use, you can download. And let's be clear, I have an iPhone and a MacBook and you know, so I kind of use what Apple tells me to use because so the MySQL Ministry of Bible is, it's booked about three years old and stuff. It's for if you're new to MySQL. It kind of teaches you concepts and terminology. It doesn't like teach you how to query things like that. Planet MySQL is a bunch of blogs and stuff. If you're totally new, it's kind of like reading a magazine for hobbyists. Like you might not get everything when you first start reading it. That's okay. This last link here, MySQLMarinite.com and see this book here, this O'Reilly book Learning MySQL, the butterfly book. We're actually taking that book and going through chapter by chapter and learning it. And this actually will teach you how to install MySQL. Like chapter two is installing MySQL. People were coming, we started this January 1st and people were like, well, what do I need to do? Do I need to install MySQL and where can I get the files when I run like nothing? All you need is the book and a computer because so if you want to go to MySQLMarinite.com it'll just redirect you to a meetup group. If you want to join the group, you'll learn when we're starting the next one. And it's actually really awesome if you really want to learn like what's on the left join and what's on the right join, that's the chapter we're on this week. So it's really good to do it. You can do it at your own pace. If you want to just do it and start it now, I'm not going to be like, no, you can't do it now. The homework is submitted by GitHub and there's instructions on how to use that too so that you can really kind of learn how to really do. I don't know. I think it's great and it's a great learning resource for you guys, I think. And I think that's all I have. Yep, that's all I have. Any questions or feedback? Sure. Right? Right. So the question is I said before that columns have to be similar enough. So what's similar enough? Is a tiny inch similar to a long inch? Is a var car similar to a car? Like a var car 64 similar to a car 20. Now the rule of thumb is, well first of all you could easily test it. Right? Make a table, give it about maybe a thousand rows and take one of the rows and see if it uses, you know, do an index where one of the rows is unique, see if it uses that index. So that's an easy way to do it. And I really advocate testing it because you never know what version is going to change something so I hate to say like here's how it is. But the rule of thumb that I use is does it have to do a conversion? Right? So for a date to date time, those are different data types. Right? You know, var car to car, it can compare it. They're not really different data types, right? But you know, something like an inch to a decimal is different and I don't know about an inch to a long inch if you're comparing to, like if you have a tiny inch and a big inch, this is actually what it's called, not long inch, if you have a tiny inch and a big inch and you're comparing the two, if they're both under 128 or something, would it compare? And that's a great question and I don't know the answers but I would just say, do you know, get some sample data and test it yourself because that will be the right answer for your version. Yes. So the question is, I said that you can't, when you have a composite index, you can only sort the two the same way. They can either be ascending or descending but the MySQL Workbench allows you to do that. Do you have the MySQL Workbench open on your database right now? Right. So this is where I say, well what happens when you do it and do a show create table or look at it and what actually happens is that MySQL Workbench silently ignores that and just sets it. Now the question is, is that a MySQL Workbench problem or is that a MySQL problem? Because again, MySQL is very friendly. So let's do this. Drop table foo, not foot, drop table foo, create table foo, id int, var car, nope, it's going to be, id int, name var car, 20, and let's do this. Alter, oh, table foo, drop table foo. Really? I can't spell apparently today. Tobl, it's totally my keyboard. I'm going to fire my typist really. Alter table foo, add index, id name, so would it be id, ask, ask, name, yet, right. Oh, yeah, it's not a MySQL Workbench problem, right? So this, you can't see it but it says query, okay, it's not a problem. So now if I do a show, create table foo, look at, see this, by the way, see this, backslash capital G, that means output it in vertical format. It's pretty cool, otherwise it looks like this, where I see these lines, because these lines are really, because this is basically one big long line. So, backslash capital G, if you learn anything else today. So you can see that the key is just, it's all the same, it's ascending, or descending. So, that's not really a workbench problem, that's a MySQL. That's MySQL being helpful and friendly, okay? Other questions? All right, what is it? It's 12.30, and the next one is what, at 1? 1.30? Oh, because it's lunchtime. I see how that works. So I don't want to take up lunchtime by doing a talk on explain, I think, because, you know, you're probably hungry, you want lunch? Yes? So 6 p.m. tonight, there's a new features in MySQL 5.6, and I have to tell you, there's a new replication feature, which makes it, I don't know if you're going to be talking about it, but, which makes it so that, like, you could just start MySQL and say, okay, just do it automatically, and it just knows exactly what it's already done and what it hasn't, so there's no more fiddling around with the binary log file and binary log position. And I have to say, just in the last week, whenever I had to fiddle around with the binary log file and binary log position, I'm like, I can't, which, you know, it pretty much does. In like five years from now, this will be one of those things that if you, if you even think about binary log file and binary log position, that you can tell us how old you are. It's like, you know, how does the pencil and the cassette tape go together, right? Because you have to wind it. You know, you'll really show your age if you're doing that in like a couple of years. You laugh, but like seriously, that's how cool this new, one of these new replication features are. So yeah, it's 6 p.m. in this room, it's a boss then, or something? Oh, they had an open, okay, great. All right, awesome. Thanks guys.