 I'm going to tell you a story today about a wizard named Basil Smock-Whitner. He styles himself a gentleman as well, but his manservant kind of takes issue sometimes, a little quirky. This is Basil on the left. Ignore the similarity to Mark Twain. The artist has issues. But that's Fabian there on the left. They're currently vacationing in the Sahara, trekking across the beautiful rippling golden sands over dune after dune. Basil's having a blast. Fabian's not so happy about being the one to carry the luggage. But when you're the manservant, that's kind of how it works out. Now, you know, they're doing all the touristy things. They're there to see Basil's old school friend Tullamy. And so, you know, they're spending some time doing the stuff. They're doing the things tourists do. They go to the pyramids and get their portraits painted. They visit all the good restaurants. They've sampled the local cuisine, visited all the flea markets, found all the touristy bubbles. And somewhere along the way, Basil discovers an old brass lamp, oil lamp. You know the type, right? Super tarnished, really ugly, battered and beat up. But you buff it up a little bit and that brings out a nice shine. Not only that, it brings out a nice genie. Now, that's a little bit cliche, but a cliche that offers you three wishes. Like who's going to turn that down, right? Basil certainly wasn't. But when you're a wizard, you've got a lot of power. Like what is a wizard going to wish for that he can't already conjure? So, you know, Basil thinks about this and it's a tough decision. He spends a lot of time. He finally remembers, though, that when he was a kid, before he started dabbling in all this magic stuff, he realized that there was a time when he was passionate about librarians. He knew all of the great librarians by name. In fact, he had all of the trading cards. His Eratosthenes card was signed. He was that hardcore. Now, the problem is that back then there weren't a lot of libraries. There were few and far between. And so, if you want to be a librarian, what are your options? You've got to wait for a librarian to die. Or, you know, you could, you know, offer them yourself to make room, but you've got to be pretty bloodthirsty. I mean, honestly, let's face it, librarians are a cutthroat bunch. That's kind of the way it works. But Basil wasn't either patient enough to wait for one to die or bloodthirsty enough to kill one. And so, reluctantly, reluctantly, he followed his father's advice, traded in his dream and his collection of cards for a career in magic. But now it all comes rushing back as he's staring this genie in the eye. And he realizes that the thing he wants more than anything else is a library. So, he wishes for one. And not only that, I mean, being a wizard, he thinks big. So, he's like, I don't just want any library. I want the biggest, the most magnificent, like, world-wonder library that there is. It's going to have more content than anyone else's. Eratosthenes, his idol from childhood? No, Eratosthenes is going to envy this thing. He's serious. And so, to make sure he has more content than any other library, he stipulates a very specific condition to the genie. He says, genie, I want this library, but for every minute that the sun is up, I want one new scroll to appear in the library. So, every minute during the day, a new scroll. Genie's like, all right, whatever, snaps his fingers and poof! There's a library. You know, Basil is stoked. He is so excited about this. Fabian has never seen him like this before. Basil is seriously jumping up and down and running in circles. He's so excited. He runs inside the library and immediately begins writing out a schema to describe how the data is all going to relate, like how the scrolls are going to relate to each other. He's really excited. And now, because he's excited and in a hurry, he leaves out a few, you know, minor details. We've all done it, right? Things that, you know, we can take care of this later. It's not a big deal. But he rushes through it, gets a schema ready, and this is what he comes up with. You know, he's got a bunch of tables up there. He's got languages and nations, you know, people, scrolls, obviously, and how they relate to subjects and then the materials for each scroll. He's really serious about this. He's obviously spent a lot of time thinking about libraries. But he's left something off. You caught it? No indexes. But who's going to need them, right? Not a big deal. In this little, you know, world of imagination here, he can go and grab a scroll really easy off his shelf. It's not a big deal. Now, how many of us have ever shipped a production table without putting indexes on it? Yeah, I mean, we do it. We're not proud of it, but it happens. How many of you have lived to tell the tale? I mean, obviously, if you're here, those of you who aren't here obviously did not live to tell the tale. You are the fortunate ones. So Basil, to his chagrin, is going to learn soon why you don't do this. Because, though it starts small, the first couple of minutes, it's a few scrolls, big deal. But by lunchtime, he's got more than 200 scrolls on his shelf. And by the time the sun sets that night, he's got more than 700 scrolls. 700 scrolls. I mean, that's a lot of scrolls, let's face it. And the only way to satisfy the requests that are coming in, because people have already heard about this world-renowned library, even in the first few minutes, they're coming in and getting in line and saying, I want this scroll, I want Aristotle, I want Plato, right? I want, I want, I want, I want. And so Basil and Fabian both are starting to have to answer these queries. And the only way to do it is to go scroll by scroll by scroll. Is it this one? No. Is it this one? No. Not a good situation to be in, because by the end of the next day, there's more than 1500 scrolls. By the end of the week, we're talking 5,000, 6,000 scrolls on the shelves. And he has to scan the entire thing. Now, this is exactly what happens in our databases when we are foolish enough to leave off an index. Our poor, beleaguered database has to go row by row. When we say, give me any scroll by Aristotle, that poor database has to go, OK, is it this row? Is it this row? Is it this row? Is it this row? Which is not a big deal when you're in development and you have 10 rows in your database. And when you first deploy your database to production, before your users get their grubby little fingers all over your clean, pristine data, things run really fast too, but as soon as those users are in, man, things go south in a hurry. As you all, I'm sure, know, you're sitting there one day doing just fine, and you start getting alerts that your system is slowing down, and you look at it, and you're like, your database is like working really hard. What in the world is going on? And people start to panic, and the requests come in, and you're working your brains out, and it's just not enough, and pretty soon, you're in a blaze of panic and flame, right? Ugly situation. This is the case here. It was just too big of a job for Basil and Fabian to handle alone. They couldn't do it. They couldn't keep up. Patrons were queuing up in line. They were starting to get restless. I mean, you think librarians are cutthroat? They're cutthroat for a reason, because their users are merciless. These patrons are lining up. The line goes out the door, down the street, and over the next two sand dunes. And no one wants to wait in line that long, least of all the patrons. And so they've had enough. They torch the library. Basil and Fabian barely escape with their lives. Now, you have to give it to Basil. He, you can't keep that guy down, especially when a genie owes him two more wishes. So singed and a little fire shy, he's like, okay, what did we do wrong? Well, obviously, we forgot indexes, right? If we'd had indexes, it would have all been okay, because we would have been able to look up stuff. And so Basil, he's taking no chances. He indexes all the things. Right? If no indexes are bad, all the indexes have got to be better. So, I mean, he's got indexes on, like, the ink color, and he's got indexes on the word count of the scroll. He's taking no chances. Like, this is gonna work. So, eagerly, he jumps into his new library. Now, how many of you remember card catalogs? Legacy of other days. I'm glad to see so many hands going up. It makes me feel less old. Basically, these card catalogs were these great, big, enormous cabinets with these little tiny drawers filled with thousands and thousands of cards. You'd go to the author cabinet, find, if you're looking for Aristotle, you'd pull open the A drawer and you'd thumb through till you find A, Aristotle, in alphabetical order in the card. And then it would tell you all the different books that Aristotle had written that existed in your collection. Pretty straightforward, right? Now, you'd go to the title collection for subject. You'd go to the subject collection. Really not much to it, but this is exactly what an index is. It's a sorted list of data that points to where something exists. That's all an index is. And so, Basil's indexes were implemented as these card catalogs, and they were enormous. Because, remember, he had to account for a new scroll coming in every minute, and he was planning to be in it for the long run. Here's an example of what Basil's title index might look like. A user came in, and they come in and say, you know, I'm looking to kill Medusa. I need something that lets me kill Medusa in 21 steps. And so, Basil's like, okay, I haven't had one right here. And it's over here on the shelf, and he pulls it off. And Query is answered in seconds. Super easy. In fact, Basil and Fabian together, they have lots of downtime in the morning, especially when things are slow. Scrolls come in. They have to fill out a few dozen slips to file in all the different cabinets. But when it's slow, it's not a big deal. Basil's like, yeah, totally nailed it this time. He has this spare wish in the background. He's like, I'll have to think of what to do with that one later. Sadly, load is not always constant. In the mornings, it's really slow. When people come in for lunch, though, people want to check out a book, do some light reading during lunch. After work, huge spike, because people come in by the droves trying to get something to read for the evening. And when you've got this huge rush that lasts for an hour, and you're working hard to answer those, I mean, even when it's a fast query, it still takes time. And people are lining up. They don't have time to deal with these scrolls that are coming in. That's lower priority. We're serving requests right now. And so these scrolls are piling up. At the end of an hour, there's 60 scrolls that have piled up. And each one has dozens of slips that need to be filled out for it. So suddenly, they finish that rush. Now they've got this huge backlog of work, and they're working through it. Patrons are still coming in at a lower rate, while things are indexed, and it's just not pretty. On top of that, patrons do ruin scrolls. I mean, sometimes they'll fall asleep in a carol and drool all over a scroll, and the ink runs, and Basil has to throw the scroll out, and it's really tragic, and he's upset about it. But that's not even the worst part. When you ruin a scroll, you have to go through to all those indexes and pull the card, rip it up and throw it away. It's a lot of work. And on top of that, the worst thing of all, I mean, something like, I need a book named X. Well, okay, that's easy. You go to the title index, and you've solved it. But what if someone comes up and says, you know, I'd like a book by a Greek author about linguistics published more than 200 years ago in blue ink on yellow parchment. And Fabian would just get the deer in the headlights look. He's like, I have no idea where to even begin. And Basil would like, it's okay, you go handle the other one, I'll take care of this one. And he's chugging through it, trying to find the indexes. That's a lot of indexes to choose from. You know, they did their best. Things were backing up. Basil did his best to work through the backlog. He summoned a small army of a monkey lie, these cute little things up to about your knee that would work like mad and could get a lot of stuff done, but holy cow, they have an appetite. And they were eating Basil out of house and home. They were breaking the bank. He finally had to dismiss them before they totally ruined everything. But that bought them a little bit of time, they didn't have enough power to process everything. Patrons were lining up. It went again down the road, up the hill, over the dunes, and again, they torched the library. Now, Basil's getting a little despairing at this point. You can't blame him, right? What am I doing wrong? Now, Fabian's not despairing so much. He's used to Basil's escapades. Like, this is par for the course pretty much. Being almost burned to death twice in a row, it's kind of a bummer. But he's like, you know, whatever, Basil, you figure it out and let me know what we're doing next. But Basil's racking his brain. Okay, no indexes is bad. All the indexes is bad. What do we do? I mean, you think about it, if this is your database and you have indexes on everything, your database has to work really hard, too. Every time something comes in, your database has to go through and file something in every single one of those indexes that correspond to it. If there's a deletion, you have to go through and pull them out. If there's an update, if the title of a scroll changes, it has to go in and update every index that is affected by that. And every bit... I mean, work is a finite resource, right? We all know that. Every bit of work that your database is doing not serving queries is work that it can't spend serving queries. That sounds obvious. But it's an important point. The more work your database has to do that's not related to serving information, the less work it can do serving information. It turns this to his chagrin. Now, as he's staring at the pile of ashes that was once his pride and joy, he realizes, okay, I have this list, about four days' worth, two days here, two days there, four days' worth of queries from previous patrons that he can analyze to figure out, maybe all the indexes was a bad idea. Let's index just the things we need. So he's like, okay, I can do that by looking through this data. But this is a small snapshot of data. He didn't have the libraries open for very long before everything came crashing down. How do you know the window of the snapshot you have of your queries is sufficient to say what indexes you really need? Who's to say that the next day a patron wouldn't have come in and queried something that wasn't in here? I mean, obviously, you've got let's look up by title, let's look up by author, let's look up by nationality, but what if there's something else? How do you know? That is the crux of the problem. How do you know what index is your application needs? It's not this magic wand kind of a problem. Basil had to rack his brain and it turns out that the key is a foreign one. No, that's bad. The important thing here are foreign keys. That's better. What are foreign keys? Well, a foreign key is a column, is a field in the database that refers to a field in a table, that refers to a record in another table. In this case, we have scrolls that have an author ID and we have people that are referenced by this ID. So the scroll has an author points to the other table like foreign, right? External outside of that table, foreign key. Foreign keys are a great place to start with your indexes. Now this makes sense when you think of it in terms of rails because with rails belongs to and has many correspond to these foreign keys. You put belongs to on the child table, the one that has the foreign key in it and you put has many on the parent table, the one that is referenced by it. So a scroll belongs to an author and a person has many scrolls. Easy reciprocal relationship and bingo, there's a foreign key for you and you can just look at all of your belongs to and all of your has many queries and be able to infer a bunch of your foreign keys and that is a great place to start for your indexes. But, should you index every foreign key? It depends. Let's look at this example where we have a person and nation, two models, right? A person belongs to a nation and the nation has many people. Easy enough, person.nation, that's not really using a foreign key, right? It's taking the value in the nation ID and then doing a lookup by ID in the nation's table. And since we get primary key indexes for free and indexes on the ID that's not really helping us a whole lot. We've already got that one down. But the other one if you're saying nation.people give me all of the people in Germany. Give me all of the people in China. Yeah, you know that's going to go through and that's going to look at the nation ID column in the people table. But is that a useful query? What are you going to do with millions of rows from a single query? You're going to crash is what you're going to do. That's not a very helpful thing. That may not be the kind of query you want to do. Maybe you don't need to index nation ID in that case. Maybe. We'll see in a bit how maybe that's not the case. But consider the direction of your queries in order to determine which foreign keys you really need to index. Because you might not need to index them. Here's another one that Rails does for you. It also has and belongs to many. This is the type of query that gives you two models with a join table in the middle and it lets you do the many to many kind of relationships where a subject has many scrolls and a scroll has many subjects all through that intermediate join table. Now you look at this and you're saying okay so where is the foreign key in this one? With belongs to and with has many the foreign key existed on the child table. But where's the child table in this case? It's not scroll and it's not subject. It's that join table in the middle. It has both of the foreign keys on it. It has the scroll ID and it has the subject ID on it. So you index both? Again, that depends. Let's take a look. A scroll has many classifications. That's that central table that connects subjects to scroll. A scroll has many classifications and it has many subjects through that classifications table. If you were going to say scroll.subjects it's going to go through that classifications table and find every classification where the scroll ID matches the ID of the scroll. That foreign key, that scroll ID is the one you'd want to index for a query going this direction. Now go in the other direction. It's the other one, right? This time we have a scroll with many class or subject with many classifications and many scrolls through that classifications table. And so if you're going to find classifications by subject ID, now you're looking at that subject ID column. That's the foreign key you're going to index that time. Now in Basil's case he's looking at it. He's like, yeah, people are going to want to know what subject the scroll has and people are going to want to know which scrolls belong to a particular subject. So I'll index both of them. Totally reasonable, right? Ha! Fabian now. He thinks he's caught his master out. This never happens. So Fabian is pretty chuffed about it. He's thinking, okay I finally have one up to this guy. He says, master you know you say that we need to cut down the number of indexes and then you go ahead and you create two indexes. Can't you just do one with two column? Isn't that the same thing? Can't you just put an index on scroll ID and subject ID and get the same result? And here we have to dis-fabian because no that's not the same thing at all. Remember an index is a sorted list of data. When you have more than one column that data is sorted first by the first column then by the second column and then the third column and the fourth and however many columns you have in your index. So in this case we had scroll ID and subject ID. You could totally look something up by scroll ID here, right? Because it's sorted. You could go in and say okay scroll number 1027, bam you can find that in a hurry and you can find exactly which subjects correspond to it. But if some patron were to come along and say okay I need everything corresponding to subject number seven then the second column is not sorted by subject ID. And so it's not going to help you one bit. You're going to have to resort to a full table scan to answer that without a specific column. So sorry Fabian, you have to index both of them if you want the two-way query support. So Fabian's like alright then well what is the point of a multi-column index? If it's not saving you anything why use it? I'm so glad you asked Fabian because that's the very next slide. Let's say where I said people.nation was a bad idea because it could return or nation.people was a bad idea because it could return millions and millions of rows. Well you combine that index with another column and suddenly it becomes much more useful. The second column acts to refine the query. So you say nation first is going to be sorted by nation. You say okay I want to find all the Greek authors named Aristotle. Great the database says it goes out and it says okay here we go, Greece, there it is, Greece, Aristotle, Aristotle, Aristotle. It comes back with however many happen to be in there and it's a very fast query. So that index that maybe wasn't useful by itself now suddenly it's useful when we combine it with another column. Does the order matter? Does it matter that we put nation first and first name second? Not particularly but if we were to swap those we might be able to get two useful indexes for the price of one because if we swap it now we can answer questions like I need all of the people named Plato and that's Plato the philosopher and not the toy right? We can find all of them just like that regardless of where they're from or we can say I need all of the people named Plato from Greece and we can answer that query too. So with a little bit of thought you can engineer your indexes to satisfy multiple types of queries. This kind of multi-column thing is especially useful when you're dealing with data that is continuous as opposed to discreet. Discreet like IDs 1, 2, 3, 4, 5 right? Versus continuous like a time value. You're not going to go in and say okay if we had a published at column on our scroll we're never going to have someone come in and say I need every scroll that was published on the 23rd of April 38 BC at 942 a.m. You're never going to have that because it's too specific and it's not the kind of information people keep in their heads about the kind of scroll they're interested in. However, you add author ID to that sort things by author ID first and then publish that and then you can answer queries like I need every scroll published every scroll by this author in order of publication date and your index will give that to you because your indexes are just sorted lists. In fact when it uses an index like that many times I think it will even return the data in that order whether or not you provide an order by author ID which is a nice interesting side effect. Another example of this. Here we have a little query that's going to say give me the 10 most recently added scrolls the scrolls that were added most recently to the collection. Notice there's no criteria here there's no where clause. We're not saying give me all the scrolls where this is the case. We're just saying give me the 10 most recent scrolls. Now without an index on added on what is your database going to have to do? It's going to have to basically reorder the entire table just so it can return the 10 rows that correspond to the most recent. Not a happy thing to do when you've got a million rows in your database. But if you add an index on that suddenly that query becomes trivial because all the database has to do in this case we're sorting the sending so all the database has to do is go to the very end of the index and return the 10 rows at the bottom and you've got the 10 most recently added scrolls just like that. So please don't do like Basil did at the first and forget your indexes because indexes on sort column are as important as pretty much anything too. Make sure that if you're doing sorts in the database that you're indexing the column. So Fabian's like one more question and Basil's like dang it we're so close stop interrupting me. Fabian's like well but how do you know that these indexes are actually going to be used? How can you know that the queries you're going to be getting are actually going to be satisfied by these indexes? That's a head scratcher. Basil thinks about that for a minute. He's like you know let me check my spell book and so he flips through it a little bit looks at the index in the back and he stumbles upon one a special little incantation called explain and explain is an incantation you can and should learn. It's really not hard to master. Let's take a look at how it works. Here we've got a query that says give me all of the scrolls by Greek authors. It joins on the author and the nation table and finds all the nations named Greece and then returns all the matching scrolls. Not too bad, especially if you have an index. But now let's say we want to see what indexes, in fact how the database in general is going to satisfy this query. You ready for this? You just put dot explain on the end of it and Rails will automatically run the explainer. Now a little side effect of this Rails will actually execute the query first and then return the explain. Which I don't entirely understand but I do understand that there's a rationale behind it. If that's a problem like if your query is one that's going to take like three hours to run then you might want to go directly to a database console and key in the explain yourself. But this is a quick way to get familiar with how the explain works. If we were to run this explain without indexes it would spit out something like this. Now this is really arcane output. Very typical of the kind of thing Basil breathed. And this is my SQL output my SQL output as well. It's going to be different depending on the database you're using. But let's talk through this quickly. On the left, well first you see there's three rows in the output. I've kind of had to break the table in half because it was so long. So bear with me on that. The top is the left side of the table and the bottom is the right side of the table. So scrolls, the database is saying first thing I'm going to do is I'm going to look at the scrolls table. And under the type column you see that word all in all caps, that is bad news. That is my SQL saying I have no other recourse but to scan the entire table row by row. A bunch of nulls after that we'll talk about those in a minute. On the bottom there it's saying I expect we're going to look at about 43,000 rows. But that number could be much, much higher if that's the number of rows in your table. If you've got 4 million scrolls in your database that's going to mean that it's going to say we're going to look at about 4 million rows. Not good. So next the database says okay for each scroll that we see we're next going to look at the people table. And this time I have a key to look at because the primary key index is given for you for free. So it says okay I can use that one because the scrolls.author reference for foreign key and I estimate that there's going to be one matching record which you'd expect because each scroll has one author. And then it says from that then we're going to join on the nations table. Again we can use the primary key and we're going to look at the people.nation ID and compare that with the nation ID on the nations table and I estimate there's going to be one row there. So for every single row in the scrolls column the database is saying okay get the person get the nation. The nation grease? Nope. Throw it out. Let's do it again. Scroll. People. Nation. Yep. Match. Keep it over and over and again for the entire scrolls table. And that is what kills performance. So once we add indexes though suddenly the query plan changes. Notice it's not even starting with scrolls anymore. Now the database is saying if I start with the nations table this is going to be a lot more efficient. Now it's suddenly saying oh there's possible keys to use here. The primary key and I have this nations name key and I think we'll go with the nations name key because that lets me use the name column. It estimates that there will be one matching row and it's using where and using index. Using index is a lovely sound because that means that the database is going to be looking at the index to solve it and it can actually use just the index. It doesn't even have to go to the data on disk. Using where because it's actually comparing the information in the where clause to the index in order to come up with it. So it finds every nation named Greece and then for every nation it finds there it's going to use the people table and here it's using the people nation index because now it's going the other direction. It's not using the primary keys now it's looking up people by nation ID and again it says I estimate we're going to find about one. Sometimes it's just an estimate too. Using where using index. And then finally scroll and we'll look up scrolls by author ID and the result comes out pay presto I'm thinking about 107 rows we're going to have to look at to get this and that is a whole lot better than 40,000. So see how explain can help you answer these questions like why is this query going so slow. You can look at it and see oh heck it's doing a full table scan on this table used to have just a few but now has thousands and thousands of rows a useful little spell to know. So this is Basil's new schema. He's ripped out a bunch dozens of indexes that he doesn't need significantly slimming down his schema and he's ready to try again. He summons the genie makes his wish a presto there's a brand new library that gets to work. Scrolls start coming in one a minute he's filling out just a few indexes now for each one. Fabian are answering the queries a few hours go by they're biting their nails but it seems to be working they're actually staying on top of it this time the queries come in they're handling it just fine and every once in a while you get the weirdo that's like you know I need every egyptian scroll by about a century that was written between the 5th and 8th century BC and needs to be in blue ink on yellow parchment and they're like you go wait over there I don't have time to deal with you right now next and then they'll keep dealing with it and the other person then goes wait over in the corner until they have time to deal with them which is usually going to be after the library closes and the scrolls stop coming in and things are calm and they can go in and deal with these one off obscure queries the thing is there's only going to be like one scroll each day so it's totally doable sadly and who knew but alexandrians have this thing about libraries and so on general principle they torched it um basil and fabian you know basil was pretty crushed fabians like whatever I almost died again but that's perfect of course basil but basil has learned that he can do this he's found a system that's going to work and he's going to go now some place where they will appreciate his libraries and he will come down on general principle and he is going to be happy um so now it comes back to you how are your databases looking what's the health of your database how have you been treating your database what indexes are you missing what can you do to increase the speed with which you can respond to customer queries look at your foreign keys look at the belongs to has many has many through look at your sorting use explain to figure out whether or not your queries are actually using indexes it's a lot better to discover that your queries aren't using indexes when the load is still low like when your queries still seem fast that's the time to be checking to see whether or not they're using indexes because you don't want to wait until suddenly these queries are taking four or five seconds to do and the requests are backing up because what do you do to add an index you have to do a migration if you add an index does a full table scan which is not going to help your situation one little bit take my advice add the indexes as soon as you can and if that means adding them at the moment you create the migration for the table please do that is the way to go so if you've enjoyed Bazel and Fabian at all last summer I wrote a little novella online at blog.jamespark.org um I encourage you to go read it it has nothing to do with databases I'm discovering path finding algorithm um and totally unrelated to this at all but because I have the opportunity to do a shameless plug I'm also writing a book about maze algorithms for programmers mazes for programmers and if that sounds interesting to you at all you can go to praggprog.com slash book slash jb maze and it's in beta and I'd encourage all of you to go out right now and buy it but thank you