 Buenos dias. Antes de empezar, quería agradecer a toda la gente de la conferencia, Joseph, Mark, todos los marks de Cogrem y la gente de las camisetas verdes por haberme invitado aquí. Estoy encantado de estar en Barcelona y por más que nada haber organizado esta conferencia estupenda, que lo estamos pasando muy bien todo el mundo. Muchas gracias. So yeah, I was just saying thanks to everyone who works so hard on the conference. Thanks for putting this together. It's really a pleasure to be here. I speak some Spanish because actually my wife is from Spain, not from Catalonia, but she's from Oviedo, Esturias, which is a few hundred kilometers west of here on the northern coast of Spain. Anyway, 20,000 leagues under the sea. How many of you have actually read this book before? Few hands go up. I see a few hands. So yeah, for me, I hadn't read it either. It's one of those classic novels that we've all heard about. We've heard the name, Our Whole Lives, and Jules Verne is one of those famous authors that will never forget. But no one takes the time to actually read the book. So this summer, I was thinking about preparing this presentation, and I decided, I said to myself, you know what, I'm going to actually read the book. And I'm so happy I did. It was really fun. It was just an amazing read and blew my minds in a number of different ways. And so today, I'm going to tell you a little bit about the book as we go. But I'm actually here to talk about, there we go, active record. But first of all, before I get into active record, I want to talk about how active record works internally and how it actually creates SQL statements and executes them. But before I get into that stuff, why active record? Why talk about active record at a Ruby conference? It's not really an exciting new development with Ruby or Rails even. We all sort of know how to use this. The reason why, though, is for me, when I first came across Rails, it was probably in, I think, 2008, so whatever, six years ago now, this is what got me hooked. This is what got me excited. I still remember the day I first typed a line of code into the Rails console, something like this. And it gave me back, you know, converted into a select statement, gave me back an answer, and I thought to myself, wow, this is amazing. This is incredible. I can write this beautiful, elegant line of code instead of writing this crazy, horrible SQL statement. It converts it automatically for me, runs it, gives me the result back as a Ruby object. It's just amazing. And this is when I thought to myself, wow, there's something important going on here. There's something to this thing called Rails and Ruby. Before that, I don't know what I was doing. I was PHP, Java, something horrible. I blocked that part of my life out. And I started learning this. And this is why. It just seemed magical. Like, how does this actually work? But in fact, it's not magic. There's thousands of lines of code and years and years of computer science research behind this single line of code making that work. And so today, I want to go on a journey, just like Professor Aronah went with Captain Nemo on the Nautilus on an underwater adventure around the world. I want to dive into ActiveRecord on the inside, look at how it works. I'm just curious. How does that line of code work? How does it get changed into a SQL statement? And then, why stop there? And then I decided, why not keep going and let's go into the database server itself? Let's go underneath ActiveRecord and look at how a select statement works and how it returns just the information we need from a relational database server. So here's the agenda. Here's what I want to talk about today. We're going to start at the beginning with ActiveRecord relation. This is the public API that we all use every day, probably, if you're a Rails developer. So I'll talk about how that works. Then we'll get inside of ActiveRecord and look at how it generates SQL statements. And then, like I said, we'll go underneath ActiveRecord. I'm going to use Postgres as an example. We'll go inside of Postgres and look at how it works. How does it execute a SQL statement when it gets one? Really interesting stuff. And if we have time at the end, we'll look at the Beatry algorithm, which is the real magic behind how database servers actually work. So fun stuff. Let's get into it. So the example I'm going to use is that first slide. UserWare name equals Captain Nemo. So I have a user model. It's an ActiveRecord base. And what I'm going to do is run that line on the top. So I want to find all the records that have the name of Captain Nemo. So we're going to be looking for Captain Nemo all day long today. So how does this work? So the first thing that happens is I call user.Ware. It turns out the where method is defined in the ActiveRecord querying module. And that actually just redirects it or delegates it to another method called all. And what all does, you probably know, is returns a scope that gives you all the records in the table. And actually what it does, though, is it returns another new object to you, a new instance of a class called ActiveRecordRelation. And this is the class that actually implements the where method. So when I call user.Ware, it's actually entirely equivalent to saying, and actually this is what's happening, user.All.Ware name equals Captain Nemo. And then the where method is actually in that ActiveRecord Relation class. So when I call that, it gives me another copy of ActiveRecord Relation, another instance of this class. And what it does is you can see it saves some information about the query that I want to execute. So where name equals Captain Nemo. Then I call, let's say I want to call first. I just want the first object. And I want all the records that have the name Captain Nemo. I just want the first one. So same thing happens. I call first on that second object. And I get a third object back. Another instance of ActiveRecordRelation. And now you can see it has more metadata, more information about the query I want to execute. So we have, I don't know if you can all read that. We have where, name is Captain Nemo, order, ID ascending, limit, and offset. Where all those other values come from? Well, if we look inside of ActiveRecord, we can find the definition pretty quickly of first. So first is defined like this. If you pass in a limit, it does one thing. Normally, you don't pass in a limit. You just want the first record. It calls into something else called fine nth. So you can see that we pass in zero, meaning the first record. It's a zero-based index. And another thing is offset in case you wanted to start, somewhere farther along. Similar to first, there's also second, third, fourth, fifth methods. You get the idea. So you can take any query, put on dot second, dot third, and get that record back. And they all work the same way they call fine nth. They pass in a different number. And just in case you wanted the 42nd value, there's actually a method called 42. This is not a joke. This is actually in the ActiveRecord source code. So that's one thing you need to learn from today's presentation is to use that as often as possible. So back to the example. So my original example, user where first, is really equivalent to user all where fine nth, zero, zero. So again, zero is the first record, and offset zero. If we dive a little deeper, this actually just calls another internal method that we saw earlier, fine nth with limit. So in this case, the numbers have flipped. The zero is the offset, and the limit is one. Meaning I only want one record back. And it turns out that calls a couple other scopes. So as we call these different scopes, it just internally calls are the ones to build up the query that it needs. So internally, my example is actually the same as user all where named Captain Nemo, order arrow table primary key dot ascending. We'll get to arrow table in a minute, and limit one. So that's where all these values come from in my ActiveRecord relation object. So again, I haven't actually executed the query. What's happening here is this, and it's kind of neat. As you call these different methods, all where, order, limit, and lots of other methods, of course, that you guys probably all know about. ActiveRecord relation doesn't execute the query. It just caches the information that you're passing in. And each time you call one of the methods, it creates a new instance of that class and returns it back to you. This is one of the kind of beautiful things about ActiveRecord that I really like six years ago when I first came across it, and I still like it today. The idea that you can chain methods together, and you can use this technique in your own codes. We can learn about programming Ruby by studying Rails and other Ruby libraries. One of the reasons to look inside of things is to see how they work. So to do this in your own code, you just, each method, return a new instance of the same class, and then that class has all the methods that you can call again. So it's kind of cool and easy to do yourself. The other thing that's really neat about ActiveRecord relation is that it's lazy. So it's not actually executing the query. It's almost like we're using a functional programming language, something like Haskell or Lisp or something. It's smart enough to know you're not necessarily done, you're not ready to execute that query, which is expensive. We're gonna let you have the opportunity to make more of these scoping calls and specify more about the query. So it's not until you actually call 2A that you execute the query and load the data from the database. Internally, if you look at how 2A is implemented, it calls another thing, a method called load, and then returns the records. And finally, load eventually calls post-dress-sequel-adapter.select-all. That gives me all the data by running the actual query. Of course, if you're using a different database, it would be a different adapter. So this is a climactic scene from the novel, actually. So I won't tell you what happened, but this is when Captain Nemo and his crew get into this epic battle with a giant octopus. I think it's like a pulpo gigante, you would say, in Spanish, and you can see the tentacle coming right into the submarine. I won't tell you what happened in case any of you wanna actually read the book. So let's move on to take a closer look at Active Record Relation. So we still haven't figured really very much out. It's kind of cool how this class works, but how does it actually create the select statement? What goes on here? How do I get my information? And we can get a clue about that by looking at this in our Rails console or somewhere else. And you can see that those, the metadata about the query is actually formed by a series of other objects with class names like this, arrow nodes, equality, arrow nodes, attribute. What are these things? What is arrow? What are these nodes? Where do these come from? Dollar sign one, what's that? So if you go Google arrow and you look that up on the internet, you'll find there's a gem called arrow on GitHub. And it turns out that Rails pulls in an entirely different gem in order to do just what it needs to do, which is convert Ruby objects into SQL statements. So how does that work? So to figure out how Active Record works, what we really need to do is figure out how arrow works. But first of all, what does arrow mean? Where does that come from? And you can see the answer here at the top there. Arrow is a relational algebra. It doesn't, that's not very helpful. What the heck is a relational algebra? And where does that come from? The other thing I wanna point out before we get to that is another interesting sentence. Arrow is a SQL AST manager for Ruby in the middle of the read me. I don't know if you can read that in the back. Again, interesting, but what does that really mean? What's a SQL AST manager? What does AST mean? So AST stands for abstract syntax tree. So we'll get to that in a minute too. But first let's start with relational algebra. That was invented or pioneered by this man here. This is Edgar Codd. He's a famous computer scientist from the 1960s and the 70s. He worked at IBM and Oracle and other places. While he was there, he wrote a seminal paper about relational algebra called a relational model of data for large shared data banks. This was written in 1970. So 45 years ago almost. And again, this invented the theory and the mathematics behind how relational databases actually work. And one of the things I wanna communicate to you guys today is take the time to go and read some of these academic papers. It's really cool stuff, really interesting. In this case, he actually used for the first time words like projection and join and cross product and other mathematical concepts that we sort of use every day every time we use a database. And so his ideas led to the invention of SQL, the language that we all know and love to execute queries with relational databases. So it turns out SQL existed before 1970, but it wasn't until Codd came along with these ideas that it took the shape and the form that we know today. You know, with joins and filters and you can select different columns. But still, how do we get to this? How does Aril create a line of code like this, a SQL statement from all of our Ruby objects? So that's where that other idea comes in, the abstract syntax tree. So let's go take a look at what that is. And this is an abstract syntax tree for my example, okay? This is a tree of Ruby objects that represents the SQL statement that we want to generate. So what Aril does, when you call Aril, when you use the Aril gem internally, builds up this AST, this abstract syntax tree, by inserting new Ruby objects into different branches. And you can kinda guess what these different things mean here and we'll go through this in a minute. So Aril actually is a lower level, more detailed way of specifying a query. So on the top is my active record line. So I have userware, name, Captain Nemo first. That's entirely equivalent. I could actually have typed in all the code below into my Rails app if I wanted to. You don't normally ever need to use Aril unless you have a really complex, sophisticated query that you need to run. For example, if you wanna use an OR clause, you need to use Aril. What you would just say users is a new Aril table. Then you say users dot project. Project is back to the paper from 1970. Dot where, users, name, equals, Captain Nemo, order on the user ID column, ascending, take one. Take almost as if it were a list per Haskell. So it's a lower level, more granular way of specifying a query. But again, how does this actually work? So when we call these different methods in Aril, and that's what ActiveRecord does, by the way, which simply calls these other methods for us, Aril creates these Ruby objects and puts them in this tree. And here's the other really cool thing about how this works. Here's how Aril gets to the SQL statement. It uses something called the visitor pattern. So the visitor pattern is one of, I think it's probably one of those gang of four patterns from Java, but it's actually a really cool idea, something else we can learn from studying the internals. So the way the visitor pattern works, what that idea really means is that there's something called a visitor, which is some kind of function, I'm showing it as a blue arrow here. It's some kind of function or class or piece of code of some kind that visits, that walks the tree or array or whatever structure you have, and goes to each one of those nodes and does something. Now, before I get to it, the other cool thing about Aril is it creates a different visitor depending on what type of database you're using. Right, so if you're using a MySQL server or a SQLite database, you get a MySQL or SQLite visitor. I was using Postgres, so I got a Postgres visitor. So another feature of Aril is that it will create a SQL that's customized for the database that you're actually using. So the way this works is the visitor starts at the top, select statement, and so it generates a string as it walks the tree. So it starts at the top, puts select up there on the top left, and then each time it visits a node, it appends or concatenates another piece of the query onto the same string. So let's see what happens. It starts at the top, it goes down to the first branch, select core, that turns out that doesn't really do anything, it's sort of a container for other things. So we continue on and we go over to attribute. So attribute is the projection to use the mathematical term or the list of attributes or columns from the table that I want to query. And so the visitor, when it gets there, writes users.star onto my string. Then we keep going, we jump over to, let's go to the join source. So join source lists all the joins or all the tables that I want to query from. And again, in my example, it's super simple. There's no joins, just one table. So I just have a single table node under there. So it writes from users, adds that onto the string. Then I go over here to the and node. So and the and node is the root of another subtree that represents the Boolean expression that I want to evaluate in my query. So again, I have a really simple example. So I don't have a complicated Boolean expression. I just have one, if you go down equality, I have one equals expression and it's comparing some attribute with a bind parameter. So it adds on where users.name equals dollar sign one. So dollar sign one is the bind parameter. And let's keep going. So we just go over to ascending, not a surprise that adds on order by users.id and the attribute there give us the id column and it's ascending. And finally limit on the right, the far right gives me limit one. So that's it. That's how Arrel generates a select statement from a tree of Ruby objects, which originally was generated by my active record expression. So very cool. So we're done, right? We've figured out how active record works. We figured out how we convert a simple line of code to a complex SQL statement. So back to Captain Nemo. This is actually one of the really cool parts of the book that I knew nothing about until I read it. This is when Captain Nemo discovers the South Pole. And so the funny thing about the novel is it's written in 1870. So at the time, I guess they didn't know that the South Pole was covered by Antarctica. So in the novel Captain Nemo and the Nautilus, journey to the South Pole underwater and under the ice, kind of a cool thing. And of course, it's like an anachronism. It's not actually true. The other thing going on here that's interesting I didn't know either is that there's this whole political subtext or subtone to the novel. And Captain Nemo doesn't have any citizenship. He's a citizen of the world. He's completely independent and lives under the oceans. And so when he gets to the South Pole, he claims it for himself. He puts his big black flag up there with an N on it. It's a kind of interesting political statement. But anyway, let's keep going. I was excited to figure out how we generate select statements from Ruby objects, but it wasn't enough. I wanted to know where does the data come from then? How does a database server actually work? How do I get the names that I want? So let's dive in and take a look. So I use Postgres mostly because I think it is the world's most advanced open source database. Some of you may use it, others use MySQL, maybe use a proprietary database like Oracle. I think they probably at this level all works more or less the same way. So let's go through how database servers work in 15 minutes or less. So it's gonna be a lightning tour. They basically have this four step process. And I'm gonna focus on the last thing, execute today. But I first wanna touch on parse, analyze, rewrite, plan, and then we'll get to execute. So let's start with parsing. How does the database server parse the SQL statement that we give it? Cause that's the first thing it has to do. We give it this complex expression with where clauses and joins and blah, blah, blah. It has to parse that. It has to figure out what it means. So it turns out, I never knew this, but it turns out that Postgres uses exactly the same parser generator engine that Ruby does. So this is a Bison. The name of the open source framework is Bison. It's a newer version of Yak, which is the original parser generator utility from the 1980s. So what Bison and Yak do is they take a series of grammar rules which are typically in an .y file and they convert them ahead of time into a set of C code or C functions that can parse a string according to those grammar rules. And so Ruby and Postgres do exactly the same thing. I'm not gonna get into how parsers work today. I don't really have time for that. If you're curious and interested in my book, Ruby Under a Microscope, actually the first chapter is all about parsers and how they work and actually it's about Bison. It works exactly the same way in Ruby as it does in Postgres. It takes tokens, it shifts them over into a stack and it uses reduces as a shift reduce algorithm that converts them into other grammar rules. So eventually we come up with, guess what? We've seen this before, an abstract syntax tree. So I was kind of shocked to discover that all that work that we just did, an active record just did for us, all those lines of Ruby code, all those methods, they work so hard to go from an abstract syntax tree to a select statement, to a string. What happens when we give that string to Postgres? Well the first thing that it does is it reverses all that work we just did. It converts it back into an abstract syntax tree. It's like why did we bother doing all that work? It's almost as if it would be better if we could somehow conceptually tell the database. I want this column and I'm from that table and I want this filter without having to generate this complicated string in this language called SQL. It makes me think that maybe we should be using a no SQL database. So anyway, food for thought. But this is actually very different than the AST that we saw earlier. Earlier we had Ruby objects, right? These are not Ruby objects. These are C memory structures inside the database server. But if you read them, if you look at them, they express the same ideas, it's the same string. So we have a select statement at the top. There's a target list, a from clause, a where clause. It's the same select statement represented in the same way just with different memory, different structures. Okay, so that's parsing. The next step was analyze and rewrite. So I won't really go into that today. I don't have time and I don't understand it either. So there's a lot of code, a lot of algorithms inside of databases to optimize queries to run faster. And that's really important when you have a select of a select of a select with joins and they can do a lot of work to bring the joins up and put them together and just make the SQL statement simpler. So there's a lot of sophisticated things going on here that I don't understand. But for my simple example, I followed this select statement through the C code in Postgres just to sort of figure out what was going on. It really didn't do much. It created different structures with different names but looks more or less the same. So these internal structures are just more, they're closer to what Postgres needs to actually execute this for us. And then the last step before we get to execute is called plan. This is really interesting. This is when the database server comes up with a plan. Like it's so organized, it doesn't just run the query. It first comes up with a plan, it follows the plan when it runs the query. So it's kind of cool. And this is yet another tree of structures. This one's pretty simple. Again, my query is very simple. And each of these nodes or boxes, these actually represent almost like a separate state machine in each case. Okay? And the way it works is you start at the top, Postgres starts at the top, and it goes down to the bottom. And each state machine calls the one below it in order to run the query. So let's go through this example. So at the top we have limit. Limit is a very simple state machine. What it does is it first calls the one below it and says, okay, give me one record back. And if it's limit one, then it's done. It returns that back to the caller. If it was limit five, it would say, give me one, give me two, give me three, give me four. So each time it calls down, by the way, inside the Postgres code, the lower inner loop is actually called the outer query. It's kind of a weird backwards thing. But I guess in databases, everything's backwards. So as you go down, you start with the inner query, you get to the outer query, the outermost query at the bottom. So limit calls sort. Sort is kind of simple to understand. What it's gonna do is the first time you call it, it actually needs to go to its outer query at the bottom and get all of the records back. So sort is actually unusual and different because it gets all the data because it needs to sort it. So it calls sequence scan at the bottom and says give me all the data and then sequence scan runs. And when sequence scan is finished, it returns all the records back to sort, which then sorts them, uses the quick sort algorithm. So not a surprise, the fastest sorting algorithm. And then returns results back up to limit, which will take the first one. But what sequence scan, how does that work? So at the bottom, sequence scan, you can see over here has a pointer to some expression. This is actually my Boolean expression from the query, the where name equals Captain Nemo. So sequence scan is pretty simple. What it does, it just goes through the table from top to bottom. These are just wacky made up names I got from the Faker gem. I don't know if any of you use Faker, it's just great gem. It makes random names. So it just, so the way the sequence scan works it starts at the top and goes down, but they're not random. These are the names that I inserted into the table in the order I inserted them. And it goes down and says, okay, is that Captain Nemo? No, is that Captain Nemo? No, and eventually we get to Captain Nemo. We found it, right, we're done. We found Captain Nemo, we can just return it, right, we're done, but we're not done. Because if you think about it, the query that we're running doesn't just say find Captain Nemo, it says select all from users where the name is Captain Nemo. So it gets all the records that have the name Captain Nemo. Then it sorts them based on the ID column, primary key, and then it gives me the first one. So in fact, the sequence scan needs to keep running. It needs to keep going through the entire table of names. There might be hundreds or thousands or millions of names. If that's the case, that sequence scan is gonna run all the way to the bottom of the table, get all the names back, even if there's only one Captain Nemo and still has to go through the whole table and give it back to me, and then I sort it and then I finally just take the first one and return that back. So it seems like I was doing something wrong here. There's something wrong with this picture. Like it doesn't seem like a very efficient database server. Why is it doing all this extra work for me? So I wanted to try to figure that out too. So back to Captain Nemo. So this is actually my favorite scene from the novel. This was from the beginning of the novel. So this is right after Professor Arana arrives on the Nautilus and Captain Nemo turns out to be a really nice guy. He takes him and his friends on a tour of the ship and they go all around the Nautilus, they look at the different stuff and the windows and all that stuff. And then they go into the engine room and Captain Nautilus, Captain Nemo, of course Jules Verne explains how the submarine actually works, how the engine works and what the technology is that it uses and it's really neat because this was again written in 1870. There were no submarines then or there were a few primitive submarines but they weren't, they didn't work the way the submarines today works. So this is Jules Verne imagining the future from a 19th century perspective. It's a really interesting read and it's actually known as the most scientifically accurate of Jules Verne's novels. For example, Journey to the Center of the Earth was complete fantasy but this is vaguely accurate so it's super fun to read through this. Anyway, to fix our problem, duh, it's very simple. All you need to do is create an index. Okay, so another rule they learned from today is just create an index if your database is running slow. Seems really simple, write active record migration, add index on the user's table on the name column. I'm done, everything's fixed, everything works great. But I wanted to find out why? Why is an index speeding this up? What is an index? How does an index work? What does Postgres do when I create an index? I had no idea, right? So I wanted to figure that out and this is what I managed to learn. So an index, when I say add index in Ruby it runs a SQL statement which is probably something like create index and when you run that what Postgres does is it creates another new copy of the entire table. So all the data at least of the column that I'm indexing it copies it into another memory space. So here's my original table over here, all the names in random order or whatever order I inserted them in. And on the left is the index and you'll notice that the index is sorted. So an index fundamentally is just a separate copy of the data that's been sorted. And each entry in the index has a pointer back to the original table so I can find my way to get the other columns of each record if I wanted them. So pretty simple actually. And so when I run the query now it's a lot faster. And actually when I ran the, when I looked at the plan and by the way you don't need to hack the C code in Postgres to look at the plan you can just run a query I should have written a slide for this, you just say explain select star.name, blah, blah, blah. So you just use the explain SQL statement and Postgres will print this out for you right on the console as a text string or as like an indented tree you can see the same tree structure. So if you have queries that are running really slowly and you're not sure why the first thing you should do is run an explain statement. And this looks exactly the same except for this one big difference that sequence scan is gone and it's been replaced by something else called index scan. I'm like, oh, so now I have an index I can do an index scan. Okay, what does that mean? So I got out GDB, I went into the C code of the index code in Postgres and tried to figure out what it was doing and here's what it was doing. So imagine now I have a separate copy of all the data in this index. So all the names from that table, from the user's table is now in this separate array really. And it's sorted. So that's really cool because it can help Postgres or any database server find the name I'm looking for very quickly. So I'm looking for Captain Nemo. So what does it do? It uses a binary search. You probably all heard of this algorithm. It's really simple, it works like this. Instead of just going from top to bottom like the sequence scan did the index scan starts in the middle and says, it picks out the name in the middle. I don't know, the one Kendall Ledner, M.D. Probably can't read that but it picks a name that begins with K and it says, is this greater than or less than Captain Nemo? Oh, it's greater than Captain Nemo, alphabetically. So we go up, we search, fire up and now we cut the remaining names in half again. So we go to the 25% point and we say, is this name greater than or less than Captain Nemo? And it turns out in this example it's still greater than Captain Nemo. So we take the remaining names and we go cut them in half again. So we go up to the top now we're at 12.5%. And we say, again, is this name equal to Captain Nemo? Greater than or less than. So you can see that it very, very quickly, exponentially narrows down to exactly the name that we're looking for with only a few comparisons. Even if we had millions of records it would very quickly, by dividing in half over and over again with find exactly the name I'm looking for. This is basically why indexes are fast and why you should index your data if you need to search it. Because it can do a binary search instead of a sequential search. But there's a couple of big problems with this. This is an oversimplification and there's a couple of theoretical issues with how I'm showing this. One has to do with threading and concurrency. So I'll talk about that in a few minutes. The other one has to do with the way that database servers manage memory. Okay, so if you're running Facebook or Twitter or some other SoundCloud or some other super successful company that has millions of users you can't just stick all those names in a single array in memory and search it with the binary search algorithm. Database servers don't work that way. What they do is instead they manage memory with a series of pages. This is the part of Postgres where my understanding is a little fuzzy but basically divides up the available memory into pages and then swaps them out the disk and swaps them back according to some algorithm. So in fact you can't fit millions of names into a single page. It can only fit about, it turned out when I measured it, about 200 names. So if you've got one, 10, 20, 30 names it just sticks them in a single array and it works fine. But if you have 200 names something has to give. It doesn't work anymore, the same way. So imagine that we had 200 names. We started with whatever, Amina and we went down to Zora, okay. Not a lot of names, everything was fine. But now let's say I insert that 200 first. I think it was actually 206 names would fit. So I insert that 206th name Captain Nemo. Let's say I'm inserting Captain Nemo now. Where does it go? Postgres doesn't have room for that name in the same page anymore. So it has to do something else. And this is where that B-Tree algorithm comes in. And B-Tree actually stands for balanced tree algorithms. This is another classic computer science theoretical algorithm. And what it means is you divide the page into two and you put the first 100 names into that first page and you put the second 100 names from the original 200 into the second page. So now each of the new pages has lots of free space and we can stick Captain Nemo up into that first page because he begins with a C. And then there's plenty more space for other new names to be inserted. So the idea is we can create this tree structure of pages that have all the data spread out in this sort of equally distributed fashion. That's why it's called balanced tree because we balance the amount of data in each of the pages. And the reason we wanna balance it this way is so that we can quickly search it. Now what I'm gonna do is I need to find Captain Nemo. So I can say, these are actually repeated names in the middle, but I can say, okay, where is Captain Nemo? I go to that parent page. Notice the parent page has a copy of the first name in each of the child pages. So we have a null for the top one. That's where the first few names are. We have Julius in the middle and so that second group should start with Julius. And the bottom we have Monte Nicholas. So we put him down at the bottom. So by recursively searching through here, we can find any one of the names that we need to. And so this balanced tree algorithm is really neat because you can add millions and millions of names and just creates this huge tree of pages that can go on and on and have different levels of hierarchy and it works very quickly when you're searching it. So that's how Postgres solves the problem of paging and memory management and how it finds space for all these names. So the one remaining problem I wanna talk about because it's really kind of a neat trick is how it deals with threading and multi-threading and concurrency. So there's a big problem with this which is that you can't simply lock the index if you're inserting a new name, right? So what happens if I try to search the index at the same time someone else, another thread and another client is inserting a new name and splitting it up? It's not gonna work right. So before 1970 or actually before 1980 when another algorithm came out, what database systems actually did was they locked the whole tree. They just said, you know what? If you're inserting into the index, no one else can use that index now. But that's not okay. It's not good enough. Database servers need to be smart enough to allow one client to insert into an index while another one or many other clients are searching it. You can't just stop your website whenever you insert new data. So what did they do? So back to computer science theory. Again, go read these papers. They're really fascinating. This one has a lot of interesting diagrams and pictures in it. So this is from 1981. This is by Lehman and Yao. And I found this actually because the C code inside of Postgres is super easy to follow and well explained. It has a lot of great comments. So that's another resource. If you understand C code, go read the C code for Postgres. So they explained that they use this algorithm and this is a really neat trick. So let me try to explain this in just a couple of minutes and get across how it works. So let's imagine that we're inserting and searching the same index. So one thread is trying to insert a new name at the same time as another thread is coming in to find it or to find some other name. So at the moment, the problems happen at the moment when we do that split. So when we split the page into two pages, everything goes wrong. If at the moment we do that, we stop, another thread runs and it tries to search because that second page doesn't have that parent pointer. It's sort of like this orphan page at that moment for a few microseconds or whatever. But again, if you're running a database server, you probably get tons of connections, tons of activity. So what happens if another one thread does this at the same instance, another thread starts and searches for a name using the tree, but actually the name it's looking for, maybe it's down in this other second half of the names that were just split. It's not gonna find them. So Layman Yao in their paper described this pretty cool trick. They came up with a variation on that balanced tree algorithm and they describe it in their paper. And the crux of it is it's more complicated than this, but the general idea is that when you split the page, you add this special pointer at the bottom of the first child to the next child over to the sibling or brother or sister page. So in fact, and this is the algorithm that Postgres uses and you need to sort of know this before you try to read the code and figure out what it's doing. Because so the indexes in Postgres actually look more like this. So the tree has the root or parent page at the top and then you go down the tree and then each of the child pages has a special pointer over to the next child. And why? It's because of this threading problem. So now Postgres can split any one of these pages. That pointer will still point to the next child over and even if you're searching the index at the same time that you do that, it'll find the name with no problem. So super cool stuff. I thought this was great, but let's just take a quick recap. What have we learned today? We've learned that ActiveRecord has a 42 method. That's pretty cool. We've learned that always create indexes on your database tables. Okay, you didn't need to come to Barcelona to learn that. What I really want to get across to you today is dive inside the tools that you're using, try to learn how they work and try to understand them. You don't need to know all the nitty gritty details. You don't need to know all the pointers and all the structures and everything that's going on. What you should know is something about the computer science, something about the algorithms that are at work, that you're using every day when you go about your work. And the reason why is it'll make you a better engineer and better developer and you'll be able to make smarter decisions. But more than that, it's just a lot of fun. I had a blast figuring this stuff out. If you want to learn more about internals and low level stuff, I wrote this book called Ruby Under a Microscope, which is a similar sort of conceptual treatment of how Ruby works itself. What happens when you run a Ruby program on the inside? And it was super fun to write this and I hope it's fun to read too. So that's all I got today, thanks a lot and I don't know if we have any time for questions. I probably won't know the answer anyway, but thank you.