 All right guys, let's get started. Again, round of applause for beauty drop tables. Get down. How are you? Things are okay? Yeah. Okay, awesome. All right, before we get started, I want to go through some comments and feedback we've had on the course on the most urbane of all the places we could get feedback about the course, and that's YouTube. So here's some of the comments we've gotten so far on YouTube. And it's this sort of standard stuff, right? Andy's the worst professor. I don't learn anything about databases. That's okay. He wasn't hoping to get cancer soon. This guy says, all my friends go to CMU. They say this professor has the worst hygiene in a big young school. He smells like old boiled eggs. So that is true. I did used to have a hygiene problem, and now I use like a special shampoo, but if it's overpowering or it's still an issue, let me know. And we've got feedback about you, right? Yo DJ drop tables, beats are so fresh I had to take my shirt off. I'm not sure what that means. And they want us to feature you more. So it's normally this kind of crap. But there is actually one mistake I made last class that people correctly pointed out. And when we were talking about hash functions, he named SHA-256, and I think he named MD-5. And I incorrectly said that SHA-256 is not a symmetrical meaning you can't reverse it. And this guy says, take down the whole video, but whatever. But my point still stands, right? So SHA-256 has cryptographic properties that we don't care about when we're doing in our hash table. So we would never actually use that. The XXX hash or the CDS farm hash stuff that we talked about before. That's the kind of hash function we don't want to use. So in theory, you could still use SHA-256. It's not reversible, but nobody does that because it'd be too slow. OK? All right. So the other things that, again, just a reminder for everyone what's on the docket. This week, on Friday, Project One should be done. That's due at midnight. So most of you have already finished. But again, post on Piazza if you have questions as you go along. And we've been updating slightly this pin post that provide clarifications on Piazza that help guide you along if you have questions about different aspects of it. And then homework two, we released on Monday this week. And that'll be due the following Monday after the first project is due. OK? So any high level questions about Project One? Who here has not tried to get it running on their local machine or their development environment? At least try that. OK? Good. Awesome. OK. All right. So recall from last class, we started talking about different kind of data structures we could have inside of our database system. And we spent the entire lecture talking about hash tables. And we talked about how hash table or data structures in general, but especially hash tables, can be used in a variety of places inside the database system, like using it for internal metadata, actually storing the underlying tables in our database, and also temporary data structures, like building a hash table to do a join. So for the first three uses of data structures, for a lot of these cases, the hash table is going to be good enough, right? For internal of the system, it's not very often you're going to need to be able to do range queries. Most of the time you want to say, go give me a single key and for a given key, give me the value. And you're doing point query lookups. So the thing we want to talk about now, though, is table indexes. And this is where we may want to actually run queries that want to do range scans. And therefore, hash tables are going to be insufficient for us because you can only do single key lookups. So everyone here should be roughly aware of what a table index is, but I just want to provide sort of a more formal definition so that we have a basic understanding going throughout the rest of the lecture, what we're talking about. So a table index is essentially a replica of some subset of attributes in our tables and that we're storing in a more efficient manner that allows us to do efficient lookups to find the thing that we're looking for. So in the worst case scenario, if we want to find a particular key in our table, we just do a sequential scan for that. But the idea of putting into a table index would have to this auxiliary data structure that we can traverse or do a lookup into and find exactly what we want more quickly than having to do a sequential scan. So the key thing to point out here is that the index is going to be a replica of the table. So that means that it has to be synchronized with the table. Meaning if we modify a tuple in our table, we want that change to be reflected in our index because we don't want any false negatives or false positives. We don't want to add something to our table, not put it in our index, and then we do a lookup to find that tuple and assign our index and come back with a negative result. So the database system is going to be responsible for maintaining these indexes and keeping them completely synchronized with the underlying table. And this is completely transparent to you as the application programmer. I don't know when I insert. I don't have to say, oh, insert into this table and update these other indexes. The database system, at least in a SQL database system, would see the insert query and know that not only do I need to update the table, I also update any indexes I have on that table. So there's this trade-off now in our system between having lots of indexes and make queries go faster and then the cost of maintaining them. And we'll see this as we go along today, and we saw this actually with hash tables last time. Inserting something into an index, sometimes it will be really fast and sometimes it could be really expensive depending on whether, you know, wherever we want to insert a given key, there's something already there or not. So again, when we have a query show up, the database system is responsible for figuring out what's the most efficient access method for me to use for the system to use to answer the result of your query. And again, this is transparent to you as the application programmer. I just write my select statement. I don't specify normally, in some cases you can, I don't specify normally exactly what index I want to use. The database system can figure that out for me. And again, going back to the very first lecture, this is one of the benefits or the advantages of the relational model and the declarative language like SQL. If I now, on my table, I write a bunch of queries in my application and then later on I decide to add the index, I don't have to go back and rewrite my SQL to now use that index. The database system can figure that automatically for me. In theory. It doesn't always get it right. This particular step of actually taking a query and figuring out what index to use, this falls under the umbrella of query optimization, which is a super hard problem. We'll cover this after the midterm. But this sort of thing, it's like an optimization problem to decide what's the best way to execute a given query amongst all these different choices I have. So we'll cover that later on in the semester. But for now, just assume that we know what index we want to pick when we do lookups. So of course, as always in computer science and databases, there's this trade-off between doing one thing a lot versus doing not at all. So if you have a lot of indexes, that'll make your queries certainly go faster to do lookups on them. But now you have this additional cost of having to store those indexes and actually maintain them. So again, indexes are going to take up pages. We're going to store that in our buffer pool. We'll have to write that at the disk. So that takes up space. But then now, as I said, when I do updates to my tables, I have to then also update all my indexes to reflect those changes. So if my table has 1,000 indexes, which would, in practice, people do that kind of stuff, if I now do an insert, I have to do 1,000 updates to all those indexes. And my update operation or insert operation isn't considered done until I've modified all my indexes because they have to be always synchronized. So again, we're not really going to discuss how you decide what indexes to pick, but this is another hard problem in databases as well. They have tools to do recommendations for you to decide what indexes you want to pick, or you pay a lot of money for human DBAs to do this for you. So the things we're talking about today is just an overview of what a B plus tree is. And then we'll spend some time discussing the implementation details we'd be concerned of when we build out our index. And then we'll finish up talking about some additional optimizations that real systems actually do to actually make this thing be useful in practice. Okay? So the first thing we need to address is this, what is a B plus tree and how does that relate to a B tree? So this is sort of the downside in databases is that a lot of times the same word is used to reflect different things, and it'd be quite confusing for someone trying to get started to understand what's the actual difference with these things. So first of all, there's sort of this class of data structures called B trees. And then within that there is a specific data structure that is a B tree. So oftentimes people use the B plus tree and B tree interchangeably, but if you go back to the literature back in the 1970s, these were actually distinct data structures. And Wikipedia has distinct data structures today. So the first B tree came out in 1971. Then the B tree, B plus tree came out two years later in 1973. There's no paper that describes what the B plus tree is. There's a 1979 survey paper that says, here's all the B plus tree or B trees that are out there. And by the way, IBM invented the B plus tree in 1973. And supposedly there's a tech report that says, describes this, but you can't easily find it on the internet. And then during the 70s and 80s, there's a bunch of these other ones that are variants on this. The B star tree is a variant on the B tree. And then actually the B link tree came out in 1981. And actually this was invented here at CMU. This is the paper that describes it. So this was written by Phil Lehman. That dude still works here. He's in the dean's office. So if you love this lecture, you can go talk to him. Every time I see him, I always say, oh, we discussed the B link tree in my class. And he's like, oh, that paper is so ****, right? So 40 years later, it's still around. So the reason why I show these other trees is because we're going to focus on the B plus tree. But we're not going to, in a modern system, we're not going to use it exactly the ways described in like the 1970s. We're actually going to borrow bits and pieces from all of these other trees that have existed before. But now we're just going to call that the B plus tree. And again, a lot of times you'll see in database systems, they'll say we're using a B tree. I can almost guarantee you, or at least I've yet to see one system where they say they're using a B tree. And it's not really actually a B plus tree. If you go look at the Postgres source code, Postgres documentation, they talk about using a B tree. But as much as I can tell, looking at what it's actually doing, it's really a B plus tree. So again, these words are used interchangeably. I'll try to say always B plus tree. I'll briefly mention what a B tree is later on. But in practice, this is what we care about. This is what we want to use in our system. So a B plus tree is a self-balancing tree data structure. In B plus tree or B tree, it means balanced. And the idea is that it's going to keep data we insert into our data structure in sorted order. And that's going to allow us to do efficient searches, sequential scans, along the leaf nodes, insertions into leases. And we can do all of this in log n. Again, contrasting this with the hash table, the hash table in the best case scenario was O1. Worst case scenario was On. In a B plus tree, because it's balanced, it's always going to be log n. And that means essentially, no matter the distance from the root to any key in a leaf node, it's always log n, no matter how many times we delete and insert and change things around. So the B plus tree came out in the 1970s because they were trying to build a data structure that would make it efficient to do index lookups on hard hardware where the disk was super slow and memory was limited. So the B plus tree has this nice advantage of, compared to like a B tree, is that you can just scan along the leaf nodes after you traverse to the bottom and you'll read everything in sequential order, doing sequential scan along them. You don't never have to go back up in general. Again, even though this was designed in the 1970s, it's still widely used today. And actually, even for faster disks and for in-memory databases where there is no disk, the B plus tree actually outperforms a lot of things and it's still very, very useful. So this is the original paper. This is one that everyone cites, the ubiquitous B tree from 1979 and it's here in this paper they described or they'd mentioned that, oh, yeah, there's this thing called the B plus tree from IBM and it came out in 1973. And this is what numerous people cite when you want to cite a paper for the B plus tree. So what are the properties we're going to have for the B plus tree? So it's considered an M-way search tree, meaning within every node in our tree, it can have M different paths to other nodes or up to M paths, not always exactly M. Again, it's perfectly balanced. The data structure maintains the balance over time as you modify the tree. And so by balance, I mean that the distance from one leaf node or any leaf node to the root is always going to be log N. The other thing I have to do is that we have to maintain this guarantee that each node is at least half full. So again, for the number of keys I can have in my node, I have to have more than half minus one, have the number of paths in my tree, and then I have to have less than M minus one. So M minus one would be completely full node. So I always have to be at least half full. And then we'll see this when we start doing deletes. If I'm not, then I have to start moving data around so that my node is half full. And again, that's how they're going to guarantee this first one that the distance is always the same. And then the simple one is every inner node, which I'll describe in the next slide, if you have K keys in store in your node, and you can have up to M minus one, if you have K keys, you're going to have K plus one non-null children. You have K pass or pointers to, K plus one pass or pointers to children below. Actually, a quick show of hands. Who here has seen a B plus tree before? Very few. Good. Okay, good. Again, this is the best data structure for databases, so this is why you're here. All right, so this is the basic B plus tree. All right, and the layout is that, again, along the bottom we have our leaf nodes, and then any node that's not a leaf node is considered an inner node. Now this tree has a height of two, meaning it has two levels, so the inner node is also the root node. All right, there's always me one node at the top because that's how you enter into the tree. And then down here in the leaf nodes, we're actually going to have sibling pointers. So this is something that came from the B link tree. So any inner node won't have sibling pointers, but any leaf node will, so now I can traverse to the bottom and scan along in any direction that I want to keep finding my neighbors, getting more data. So in the inner nodes, it's going to be this combination of keys and pointers. And so for the inner nodes, the pointer is always going to be to another node or null if there's nothing there. And then the key is just the, whatever attributes we're building our index on, whatever we're trying to store in this. And then these keys are then used to determine which path you should go down as you start doing a search for a given key. So in this case here, for this first key five, so the path to the left of it going this direction will be for any value, any keys that are less than five. And then for the one that comes after it, it would be implicitly anything less than nine or greater than five. So if I'm looking for something, a value, a key that's less than five, I would look at this and say, well, I'm looking for key one, one is less than five, so I go down this path and now I find my leaf node and I try to find the thing that I'm looking for. The leaf nodes, the key value pairs are just the key, the same way they are up above in the inner nodes, but then the value can differ. We'll see this in a second. It could either be a record ID to a tuple, it could be the actual tuple itself. It doesn't matter. It's just that the inner nodes have pointers, the leaf nodes have data. So again, this is just to repeat what I just said, but then the way to think about it in each node, it's an array of key value pairs and you're using the keys to determine whether if you're in the leaf node, whether it's the thing you want or if you're in a node, whether you go left or right. So in general, but not always, the keys are always in each node are always sorted in whatever the sorting order you want, the correlation you want for that node. So in my example here, we just sorted in a numerical order and so that's going to allow us, when we jump into a node, potentially, depending on how it's implemented, we can do binary search in each node and try to find the thing that we're looking for rather than just having to do a linear search. But sometimes linear search is good too. So the contents of, again, of what these values are in the leaf nodes can vary depending on the database system. Again, it could be record IDs, it could be the actual tuples themselves and we'll see some examples in a second. All right, so let's actually look to see how these leaf nodes are actually implemented. So, again, logically, you just sort of think of it like this, that you have this array and you alternate with key value pairs and this is typically how a lot of textbooks show what a B plus G node looks like. So the first thing to point out is since this is the leaf node, we have pointers at the end of the beginning of our array to our siblings, right? And this would be a node ID, our page ID to allow us to go in either direction. Or if we're at the right side of the tree or the left side of the tree, it would just be null. Again, nobody actually stores, no real database system would store their internal key value arrays for a B plus G leaf node like this. All right, and these are just key value pairs and these are just pointers. Typically, it's stored separately. So just like in our slot of pages, we would have a header that tells us metadata about what's in our page. So in this case here, we can say what level in the tree we are, essentially how many steps away from the root we are, how many free slots that we have remaining in our node, and then the previous and next pairs. And then now you see that we separated out the keys and the values. And we think I guess why you'd want to do something like this. Yes? He said so that for a given page on a B plus G node, that all the keys can fit in one page and then the values can fit in another page. No, the keys and values are typically always stored in the same page. Yes? Because they are not of the same size. Exactly, so they're not of the same size. Furthermore also too, if you're doing binary search on this, you want everything to fit in your CPU caches. So if you're back here with all this intermixed, in order to do binary search, I actually don't need the values at this point because I'm just trying to find the key that I want. So if you break it up, depending whether it's fixed length or very length, you can jump through the keys much more efficiently. The values typically are always fixed length because they're either 32-bit or 64-bit record IDs. If they're two-bolt, that's a little more complicated. But in general, you always want to separate them. And again, the way it works is just whatever offset you are in the key array corresponds to some offset in the value array. So if I find a key I'm looking for, I'm going to offset 4, then I know just to jump to offset 4 in the value array and that finds the thing that I want. So as I already said, the values can vary depending on the system. Some systems will destroy the record ID. This is probably the most common implementation that people use. This is what Postgres does. This is what all the other commercial database systems do. What's more complicated, and we can talk about next class, is what does it look like when you actually store the tuples in the data? So think about this. Instead of having a table heap with my tuples and then a B plus tree that stores my primary key, and so they're trying to keep them sync, they were just merged together and the leaf nodes was actually the tables, the tuples corresponding to a primary key. So now when I want to do a traversal to find a particular key or particular tuple, instead of having to do, in the first case, I traverse the index, get a record ID, then do a lookup in the page table and find that and then go scan inside that block to find the tuple that I want. As I do the traversal, when I land in the leaf node, there's already the data that I want. So my SQL and SQLite are probably the most two famous ones that do this. In case like Oracle and SQL Server, I think by default you get the one at the top, but you can tell it to do this at the bottom. So now I want to distinguish, because we understand the basics about B plus tree, let's distinguish it from the original B tree. So the major difference is that in the original B tree, the values of the, of starting the index could be anywhere in the tree, meaning any inner node could also have a value to like a record ID or the actual tuple themselves. In the B plus tree, the values are only in the leaf nodes. So what are the implications of this? Well, one, in the B tree case, I don't have any duplicate keys, because I can guarantee that each key will only appear once in my tree. In the B plus tree, because I have all those guide posts up above in the inner nodes, I'm basically duplicating keys. Furthermore, if I delete a key in a B plus tree, I would remove it from the leaf node, but I may not actually remove it from the inner nodes, depending on whether I rebalance or not, I may not have a path going down to it. All right, sorry. If I delete it from the leaf node, I may keep it in the inner node, because that's how I figure out what path to go down if I'm looking for other keys. So a B tree is going to be more economical in how much storage space it occupies, because it's not duplicating keys. But the downside is going to be, and this is why that nobody is actually using this in a real system, is that it makes doing updates more expensive when you have multiple threats, because now you could be moving things up and down. The tree, I have an inner node, I modify something, and I may need to propagate a change below me and above me, and therefore I have to take latches on both directions, and that causes, as we'll see next class or next week, that can cause a lot of issues. In a B plus tree, I only make changes to the leaf nodes. I may have to propagate changes up above, but I only go in one direction. So the question is, can I repeat what I said about duplicates in a B plus tree? So going back to this guy here. So this is a B plus tree. So the keys that I have, that I'm trying to index, are 1, 3, 6, 7, 9, 13. But if you look in the root node, I have a 5. 5 does not appear anywhere in the leaf node, meaning it probably got, in this case here, it got inserted, and then it got deleted, but I didn't reshuffle or reorganize my tree, so I left it in the inner node. In a B tree, that'll never happen. Each key only appears once, and if it appears in the tree, then it appears in our key set. Does that make sense? A question or statement is, we leave in here for searching purposes, and it's still stored physically in our nodes, but if I ask this tree, do you have key 5? I would say no, because I always have to go to the leaf node, then I try to find 5, and I'm not going to find it. So it'll still be there, but it's not actually a real key. Yup. Yes? How do we deal with inserts here, like we fill up one of the leaves? Okay, so the question is, how do we deal with inserts when we fill up one of the leaves? We'll get that in a second. Yes. That's the next topic. There won't be any duplicates in the leaf nodes? This question is, will there not be any duplicates in the leaf nodes? Yes and no. So we'll see this in a second. So this would be considered a unique index, a unique tree, or unique keys. You can have keys that have non-unique values. We'll have to handle that. We'll get to that in a second as well. Okay. So I think the next topic is what he asked, is how do we actually modify this? Absolutely, yes. Inserts. Okay. So the way we're going to do an insert is that we want to traverse down and figure out what leaf node we want to insert our new key into. So again, we use those guide posts on the inner nodes to decide whether we go left or right, depending on whether a key is less than or greater than what's stored in those key rays. And then as we traverse down, eventually we will get to a leaf node and then the leaf node is where we want to start a key. And so if the leaf node has space, then we just insert it in for keeping the keys in sorted order. Maybe we sort them, but there's enough space, we just insert it. If there's not enough space, then we have to split the node. We have to split the leaf node we just inserted into. And so the way we're going to do this is we're just going to take a halfway point in our key space, put all the keys that are less than the halfway point in one node, all the keys that were above that in another node, and then we update our parent node to now include that middle key and then we have an additional pointer to the new node we just added. And that may happen, you say, all right, well, this is actually a recursive thing because if now my parent, as I try to insert the new key into the parent, if it doesn't have enough space, then we have to split it and then propagate the changes up above. So for one insert, we may have to reorganize the entire tree. And this is what I was saying before, just like in the hash table. If I insert into an index, or through the hash table and then nothing's there, it's really fast. But if I have to scan a long, long time to find the slot I can go into, that can be more expensive. So sometimes we would insert into our tree and it's going to be an expensive operation because we're reorganizing the entire data structure. And other times it'll be super fast and we don't have to worry about it. All right, so let's do a demo of this. So this is using, this is a, rather than me doing animations in PowerPoint, this is from a professor at University of San Francisco that has a nice, you know, little web-based visualization we can use to, in theory, yes, okay. And I'll type remotely. All right, so we'll do a max degree of three. So that means that the, the max number of nodes we can have is two, or sorry, keys in our each node is two and can have at most three paths going down. So we insert, can everyone see that? You insert two. That's a visualization. I mean, yes. Repeat again for the degree part. So the degree says the number of paths coming out of it. So it's a degree of three means I have, I have most three paths coming out of me if I'm an inner node. And therefore I have to store it, I can store at most two keys. Because, again, so I mean, going back to what we, the showed in the very beginning, I mean, question, why do I set it to three or why is it that way? So again, so this, so this is, this is a, this has a degree of four. So it's always the number of paths is the number of keys plus one. So I can store one, two, three keys, and this guy has to have a, a right pointer and a left pointer, right, and he has to have a right pointer but that's shared on at the end. So there's four paths coming out for three keys. Okay? All right. So is there a way to make this look better? Well, let's just keep going, see how it goes. So it's, it's down over there. So I've only inserted, I guess the demo, I've only inserted two keys or one, sorry, one key. So right now it only has one, one entry in it. So now I'll insert, I don't know what else. We insert six, right? So again it just, it had space in that node so I was able to insert it and now I insert four. And at this point it has to split because it can only, it can only store, it can only store two keys. So it's split in half, put two over here, four and six in its own node and then they took the middle key four and moved it up as the new root. And again I have pointers going down to both of them. So now I'll do it in certain five. Right, that can fit over there, accommodate just fine. So now I insert five. What should happen? Right, it'll say, well five is greater than four. It's greater than or equal to four so I no need to go down this direction. But I can only sort, I can only, I can only store two keys in this node so I'm not going to have to split this guys and then rebalance everything. So hit enter. Right, four goes down there, puts five there. Right, it's split the node, put four in the middle over here, five and six over here and then put five up because that was the middle key and now we have pointers going to this node, the middle node here with four and that one five. Right, so again this is recursive. As I keep inserting more stuff and I keep splitting, I keep splitting the changes up. Yes. So what if we have duplicate keys? So he says, what if we have duplicate keys? So actually I don't know whether this will matter so I sort of insert four. Yeah, it did that. So there's different ways to, sorry, how do I, the resolution is jacked. F11, no. How do I do that? Sorry. There we go. Okay, sorry. So this is just sort of a toy diagram in a real system you could store four together and just maintain multiple entries for all the unique values of that, that you have the same key. Yeah, and in this case, I mean the worst case would not be log n anymore because we have all the, I mean all the keys to be four and the search to be log n. So okay, your statement is if all my keys are the same, it's four, four, four, four, four, four, then if I'm looking for an exact key value pair, then it's, well it's n because I do a sequential scan. Yes, so yeah, we can pop up Postgres. We can make a table that has a billion rows and for one column we set the value to one and we can call create, so every one billion row has the same value for that one column and Postgres will let us build an index on that column. It's a stupid index to build because as you said, they're all the same. So how do I say this? People will do stupid things in general. Don't be stupid and don't build indexes on things that you shouldn't use. There's all the types of indexes we'll see. So hash table, there's all the things like inverted indexes we could use that could be better if you had a lot of duplicate values. But think of like email addresses or think of like phone numbers or things that were like it's going to be vastly diverse then we won't really have that problem. Or a primary key, a primary key has to be unique. That would be great for this. So again, is this clear? Okay. So let's go back. So to do deletes now we have the opposite problem. Again, inserts, if we got too full, we run out of space and we have to do that split. If we delete, then it may be the case we end up being less than half full which would violate the guarantees we have to have in our B plus tree. And then therefore we have to do the opposite of a split which is a merge. So delete something. Again, I just do my traversal. I go down to the tree, try to find the key that I want to delete. I'm always going to land a leaf node. If my leaf node after deleting that key is still at least half full then I'm done. I just remove it, maybe reorganize my sort of key raise but then that's it. But if I'm less than half full then now I have to figure out how to get rebalanced. So the sort of one easy trick we could do is look at our siblings in other leaf nodes and that's why we have those sibling pointers. We could look at them and try to steal one of their keys to make ourselves balanced. As long as our sibling has the same parent as us then it's okay for us to steal this because that doesn't require any rebalancing up above. So if we're not able to steal from our sibling then we have to merge. Then we gotta go take one of our siblings, combine all our keys together and that may actually end up being too full as well but then we could split that as well and that's the same thing as just copying this. But we would merge, delete a key up above and then now everything's balanced again. Again just like in splits where I may have to go propagate the change everywhere when we merge and we're deleting keys our parent now may become less than half full and it has to merge and therefore we have to maybe restructure the entire tree. So let's go back to our example here and do our demo and now I gotta figure out how to get to the top right corner. So we just maintain the siblings in the leaf nodes only? Correct, the question is do we maintain the siblings only in the leaf nodes? Yes. So let's do delete four. Let's start with delete five and we scroll down and hit enter so we can see this. So does it traversal? That was insert, sorry. Delete five. Yeah that's insert, that's delete. Again sorry for the low resolution. Alright so let's delete five. In this case here it should find both of them. I guess I only found one of them. So let's delete the other one. Goes down. That's fine. Again at this point here both these nodes are still more than half full so that's fine. So now let's delete four and I suspect it will try to delete the one that's farther on that side. Go down, found that, delete that. Again that node is now half empty and it has to have at least one. And because it was empty it merged everything and decreased the height of the tree. Yes. So only the leaf nodes have sibling pointers and how would that work? This question is if only the leaf nodes have sibling pointers how do you actually do this merge? So the way it works basically think of the think of a thread going down it can maintain a stack of what nodes it visited as it goes down. And we're actually going to need to do this when we do what's called latch crabbing or coupling. As we go down we take latches in case we need to reorganize everything and so I have to know what I have to hold latches when I go down somewhere I have to hold latch my parent in case I need to reorganize what I'm doing down below so I don't release it until I know I'm safe. So I know how I got there. Yes. There are two siblings so there are two siblings to the left and to the right. This question is if there's two siblings to the left and to the right which one do you choose? It depends. Typically you choose the one that has the same parent as you. Actually I think you have to actually. But if you were in the middle yeah these guys have the same parent so say you want to reorganize this then you choose either left or right. It doesn't matter. Let's see what this one does. So if we delete four that should take it out of the middle and then now I delete two and it's going to pick either left or right. Actually it only has it can only have one or two or one or two so it went empty and we can increase the degree of the tree but it doesn't matter. It's still be correct. And so this one also this diagram shows the sibling pointer is going in one way and then have it go in both directions. You have to do extra work to make that happen but like you can do that. A lot of times again for simplicity you could just have it go in one direction but then you can't do order by and descending order and go the other direction if you want to do scans. Right? Pretty straightforward of course getting the details of the deletes and inserts doing that split and merge is actually very difficult in practice and we'll see in next week how to actually make sure make sure that we're reorganizing the tree that were thread safe and we don't have any integrity issues. All right, so in practice the research shows that the typical fill factor for a real tree on real data is about 67 to 69 percent meaning the amount of data storing in your nodes that's actually real is up to 67 percent is actually useful data. So typical capacities you can have for the eight kilobyte pages with a, this number of pages at four levels you can basically store 300,000 key value pairs. Right? So you can index and get in log n time to any one of 300 million keys very, very quickly and most of the data is again going to be stored on the leaf pages as you would expect. Right? Because as you add more keys you start to fan out and most of the data is going to be stored in those leaf nodes. All right, so let's talk about some other things you can do with these indexes. So there is this concept of this notion of what we call clustered indexes and so I said in the beginning that the table heap for a database is unordered meaning we can insert tuples into any page in any order. We don't have to follow the you know the temporal order of how things got inserted but there may be some times where we actually want to have the data sorted in a certain way like for example like the primary key. So these would be called clustered indexes so you can define an index when you create a table you can define a, what's called a clustered index and the data system will guarantee that the physical layout of tuples on pages will match the order that they're sorted in the index. So this is useful for certain things like you know if I'm doing a lot of lookups within exact ranges of the primary key if I know my tuples are stored in the same order of that primary key now when I you know traverse the leaf node within a small number of pages I can find all the data that I need. If I'm not sorted on my the key I'm doing my lookup on then every single record idea I could have could point to another page and I could be doing a bunch of different random IOs to go read the data that I want. Not all databases support this some data systems you get this by default like MySQL by storing the tuples in the leaf nodes themselves it is a clustered index so it's guaranteed to have on the pages on disk the tuples are sorted in the primary key order. In the case of MySQL if you don't define a primary key they'll make one for you they'll have a synthetic like row ID or record ID that's transparent to you but that's how they use to figure out where your tuples actually located. In case of Postgres we can do a demo next time but they have clustered indexes you can define one you can say cluster my table on this index but it won't actually maintain it in that order meaning it does the sorting once stores it on disk but then over time it can get out of order because it won't do it for you automatically and when we talk about multi version concurrency it will become very clear why this is the case for them so let's talk about how we can do some lookups on our B plus tree so again because things are in sorted order we can do fast traversal to find the thing we're looking for but one advantage you can do with a B plus tree that you can't do with a hash table is that you don't need to have the exact key in order to do a lookup you can have actually some part of the key so let's say real simple simple example I have an index on attribute ABC so I can do lookups like this where A equals five and B equals three where I have I don't have the C but I have A and B and I don't need to have the C and I can still find the things that I'm looking for you can't do that in a hash index because think what would happen I would take this five and three try to hash them together without the C and that's going to jump to some random location that's not that's not what I'm looking for you can also do queries where you only have maybe the middle guy you don't have the prefix you don't have the suffix you just have the middle key again you can't do that in a hash table so not all databases support this pretty much everyone supports the prefix one where you have at least the keys in the order as they're defined for the index not everyone can do this middle one here actually I think maybe only Oracle and SQL server can do this so that's a little more concrete example so let's say we have an index that is defined on two two columns or two attributes so this would be called a composite key so instead of being on for one column it's actually two columns combined and the order of how we define our index will determine what kind of queries we can do on them so again if I'm trying to do a look up on say to try to find key A B on that case I have both attributes that I've defined in my key so now I can just do a straight comparison of look at the first key and then look at the second key and then determine whether I want to go left and right so in this case here A is less than equal to A so I know that the starting point for what I'm looking for it has to be down in this direction so I go down here but now I'm going to do a sequential scan across my node and going across the starting point for what I'm looking for I go down this path do whatever search I want to do in my node and then I can find the entry that I want let's say though now I want to do a prefix search where I only have the first element of my composite key but not the second one so I'm going to go down to my node and going across the leaves to find all the entry I want up until I reach a key that is less than or equal to my key A so in this case as soon as I find one that starts with B I know my search is done and there's not going to be anything else remaining in the leaf nodes that would satisfy my predicate so this one again this one's pretty easy or not easy but a lot of data systems can support this one so the way you actually end up implementing this is you try to figure out at least in the in the top in the root node which which portions of the tree do I need to look at could be something that there's something could be there so in this case here I know that no matter what I have for the first value it's always going to have to be less than C for the second attribute the second value so I don't need to look at this guy over here I only need to look at these other ones so what you just do is you end up doing multiple index probes or multiple traversals and substituting different values for the thing that you don't have so we look at the top and say well I know I have an A I have a B and I have a C well there's nothing for the C that I would find over here so I can skip that so let me now do a lookup in these guys and I substitute the star with an A and each one of those is a separate lookup and then you combine them all together and produce the final result so Oracle calls yes yeah yeah yeah yes you're right that's wrong but yes so you would include that but each one of those is a separate traversal and you're just feeling in the values whereas like in this one here the main point I'm trying for this one and the first one I had to do one traversal and then I found the thing I was looking for this one is you have to probe down multiple times and you fill in the values thank you I'll fix that okay so let's go get to the good stuff so we know what a B Blastree is now let's talk about actually how you want to build it it makes this thing actually useful so there's this great book which I think is free at least if you google it it shows up free I don't know whether that's true or not this is a great book so by Gertz Graffi who's a famous database researcher he's going to talk about a lot of stuff he's done for queer optimization later on but he basically he wrote this book as like all the modern techniques and tweaks and optimizations you can do in a B Blastree in a real system so we're going to cover some of these things and actually it's a really light read and like it covers all the really important topics in a way that's easy to read so we'll talk about how to handle different node sizes how to do merging that's what they asked about and then inter-node search how to do better searches inside the node so in general the the you can think of a node in our B Blastree it's just like a page in our table right so the size of the node could be the same as a page size in practice though it doesn't have to be and depending on what kind of hardware we're storing our database on we actually may want to have even larger page sizes or smaller it turns out the research shows that the slower the disk you have you're just storing your index on your tree on the larger the node size you want and it should be obvious right for every disk I.O. I do I'm bringing I can read the node sequentially all the pages for it and that's gonna be much faster than you want to random I.O. to different different nodes if my node is a smaller size so if you're SSDs are roughly 10 kilobytes which roughly corresponds to the the node sizes or page sizes that real database systems use but then if you're an MME database you actually want to go low as 512 bytes and so the this is another good example where we talked about how in our buffer pool we could have one buffer pool in our system for index pages and one buffer pool set for for data pages and we could set them to be different sizes so I could set my B plus three pages and have them be one megabyte whereas my data pages I'll keep them at 8 kilobytes or 16 kilobytes the optimal size can also vary depending on what kind of operations or queries you're doing on it so leaf node scans we're doing long sequential reads those are typically better to have larger node sizes because I can do more sequential I.O. whereas if I'm doing a lot of lookups a lot of traversal that's a lot of random I.O. so therefore I want to have smaller node sizes so the next thing we can do is to violate the very thing I said in the beginning about how the we always have to merge any time the any time we're less than half full and the demo I did it was sort of simple it would do exactly that but in practice you may actually not want to do this immediately when you're less than half full because it's just like when we saw on the hash table when we did leaps with linear hashing at the end I may compact something I may merge something less than half full but then the very next operation inserts into that node and now I have to just split all over again so the the merging operation is expensive splits are also expensive but it splits we have to do because we ran out of space in our node but the merge is we can actually relax that that requirement and not merge things right away so it gets slightly unbalanced over time and then in the background we can have like a whatever what's oftentimes the case people just rebuild the entire tree from scratch and that fixes all these issues so a lot of times you see this in high end commercial enterprise systems they'll shut the database down over the weekend because they're gonna rebuild all their indexes and that's essentially what they're doing they're rebalancing everything because it wasn't always merging correctly anytime you see like a bank says they're down at 3 a.m. we're gonna talk about how we actually wanna handle forever-length keys so again everything I've shown so far we assume that the key is a fixed length and the value is always fixed length and in practice the values will always be fixed length so there's four different ways we can handle this so the first approach is that rather than storing the key itself in the node we just store a pointer to the actual attribute or the tuple where we can do a look up to find what the key you know if I have an attribute that's a very length varchar instead of storing that varchar in the node I have its record ID and then when I wanna figure out whether the key I'm doing a look up on matches that key that's stored in that B-tree I follow the record ID go get the page and go look at what the real value actually is so this is obviously super slow it's nice because we're storing less data because now we just store the pointer instead of the actual key in the node so it's expensive to do that look up as we're traversing people tried this in the 1980s for in-memory databases because memory was really expensive but nobody actually does this anymore everybody stores the keys always hole in the node the next you can do have variable length nodes this basically allows the size of a node can vary based on what's stored in it but we said this is a bad idea because we want our page sizes to be always the same so we don't have to worry about doing the thin backing problem to decide how to find free space to put in what we want to store so nobody does this one as well the next approach is to do padding basically we say you look at what the attribute is and you're trying to index on and we say that whatever the max size it could be no matter what key you give us we'll pad it out with either null bits or zeros to make it always fit exactly aligned so some systems actually do this I think Postgres does this and we can look at that next time but again it's a trade off I'm wasting space in order to store things so this why is also too it's super important to make sure that you define your schema correctly like if I'm storing email addresses which are maybe 32 characters or 50 characters but I set the var char size to be 1024 if I'm padding it out up to 1024 even though most of my emails aren't in space yes what do you do if you're just keeping strings in general here because you define the maximum size of strings always you just define in the schema as opposed to like so for that let's say it again sorry just in general because these aren't necessarily C strings or they have to L and in the ultimate you always define the length yeah so like when you call create table you can define var char you define the length in it you don't have to put you always want to say this is the max size of what I actually can store right and then so var char is supposed to be variable length so even though say the max size can be 32 if you give it a 16 you know character string it could in theory store that more compactly some systems are different things some systems actually if you say it's a char where it's always going to be that size and it's always padded under underneath the covers they can do different things and my sequel is always the worst offender so if you say the max size of a string is like 16 and you give it a 32 character string it'll store it it just truncates it silently for you right so postgres and other systems will throw an error but the database system should enforce that correctly same thing for index we like to build an index we have to be told here's the attributes in our tables your indexing so we know what their type is and we can pad out as needed alright what's probably more common is to use an indirection map where we'll store pointers for our keys inside of our sort of key array but the pointers are just actually two offsets in our node themselves rather than to some arbitrary page so it would look like this so we have the sort of key map so again this is sorted these are just pointers or offsets to down here but these are based on the values of the keys so it should be very clear the keys themselves not the keys corresponding value but the actual string that we're trying to store right and so just like in the slotted page layout for tuples we're going to grow from the end to the beginning and this side grows from the you know from beginning to the end and at some point we get too full actually I think this has to be fixed size because we have to set a degree ahead of time but if I don't run out of space for what I'm trying to store I'm going to have an overflow page that's that's chained to this so again this is just just an offset to whatever the key is so now if I'm doing binary search as I'm jumping around this array I jump down here to see what the actual key value is so what's a really simple optimization we can do to make this go faster in the back it's a statement is is it a statement or a question do we store this as an an array or a link list it's always stored as an array okay so his statement is I'm storing this as an array or a vector a vector is just a wrapper or an array if I now do insertion or deletion that's going to take on or yes but again like this is this is just within the node itself so the size is not that big right so fan out of like maybe 32 so I have 32 elements I need to keep sorted I can do that in cache that's very fast so so like when you're moving down always you compare like you when you have to compare with different keys you like compare that key first go into the pointer place and then compare correct yeah so say I'm doing binary search so in this case binary search is just actually just do linear search so I just scan along I want to see does the key I'm looking for match what I have so I have to follow this pointer but again it's just an offset it's in the same page so it's going to be maybe 16 bits I follow that offset to jump to where this is and then I do my comparison and if it doesn't match then I jump back and do the same and jump down here and so just like in slotted pages where the tuples don't have to be sorted out in the page in the same way they're sorted at the slot array this variable length data at the bottom down here can be in any order that it wants I just know how to you know I know how to jump to it based on this this question is for non-leaf nodes do you do the same thing yes again this is for variable length data and any note so this is sort of a micro-optimization going to disk is always the most expensive thing but a really simple thing we could do is recognize that since this is only 16 bits in general I have a lot of space up here so maybe I just take the first character of every string and just embed it inside up here so now when I'm scanning along trying to find the thing I'm looking for if my key doesn't match exactly the first character then I know I don't need to traverse down and find it again this is like this is this is all going to be in memory this is like avoiding cache misses in making the binary search and making the search on this one faster again this is a micro-optimization avoiding disk is always the major thing we care about in this course but this is a really simple trick you could do to speed this up yes what if there are two persons whose names start with the same letter again it's like what if there's two persons whose names start with the same letter again you'd have to depending on what you're looking for you if you want to find exactly one you'd find the first one and you're done if you need to find anybody you have to go to both of them right the same way here right for this one here I'd have to I'd have to if I'm trying to find everyone's different here but if there's like Paul and Prashant who's my PG student I would scan down here find Prashant then actually go to the next one to make sure that that one doesn't have the same you know doesn't have the same thing as well sir but this will work only if your sorting is based on the names like you can define sort functions also right like which map the strings to certain numbers and those are used okay so his he's talking about collations so there's so I'm showing lexographical ordering alphabetical ordering for this they're in high end database systems you can actually define arbitrary sort orders everything still works the same then what will you do with this you're talking about like dictionary codes we're not only talking about that I can have different you know sorting based on whether it's unicode or what language I'm using for that one you have to again the data system would know this is how the sort order is so what you know it would know what what prefix if it wants to store up in here again the high-level high-level ideas it's still the same yes so he said so he said if you have K keys you have at most K plus one pointers to other things and then you explore all the K keys to find out the potential one that you want to go through not necessarily for simplicity yes you just scan along the keys and do linear search and then so the time complexity should be like K times log n instead of log n so his statement is that really the complexity should be K times log n yes the that's a constant we can throw out because the log n is the maximum number of page IOs I have to do to traverse that's always orders of magnitude faster than doing the cache line lookups here remember I said in the very beginning here's the storage hierarchy anything above memory we don't care about we can throw away is the disk IOs is the real color we gotta avoid that we'll get to that in a second okay so now to get to the other thing that people asked about is how we handle non-unique indexes well this is the same thing we talked about in hash tables there's two basic approaches you just duplicate the keys and be mindful like in our example here that duplicate key split over to another node we have to be mindful that that could occur and make sure we read everything we need to read or we destroy a value list where we store the key once and we duplicate the value or have a separate space in our node to store all the values for that given key so it looks like this so if I duplicate the key I just have the key multiple times and again it's just like before the offset points down to wherever the value is and if I insert a new one then I insert it here and insert a new K1 I insert it here and move everything over until it still works the value list since you just look like this I destroy the key once but then now I also have a pointer or an offset to somewhere else in the node where I have all the values that correspond to that that given key the first approach is more common I don't know who actually does this one here yes so her question is can I assume for that duplicate keys will always be in the same node now so the an example I show from that demo it actually moved it over as a sibling key that's one way to do it other systems actually would have an overflow chain that would say for that given leaf node oh by the way here's some other pages or other nodes down below you that have the keys that correspond to what you're actually storing up above the question is if I'm searching for a given key how would I know what key to follow so going back to that example I would if I'm looking for a greater than equal to 4 I have to go down on this side and find the the first entry point for 4 then I keep scanning along the leaf nodes until I find something that's not equal to 4 and then I know I've seen everything again the data system knows whether the keys are unique or not so it knows whether it has to do that so it knows that oh this is a primary key or this is a unique index so the thing I'm looking for should only appear once and therefore I just get to the one leaf node that has what I want if it's not unique then you have to account for that and either again if it's just duplicated across leaf nodes I scan along siblings if it's an overflow I just find the first leaf node and then scan down its chain yes so the size of the key is not always the same how about the value the question is the size of the key always the same or the size of the key is not always the same or the value is always the same again I'll show this next class if the value is just a record ID for a tuple always the same it's either 32 bits or 64 bits depending on the system if it's the actual tuple itself like in my sequel then you've got to deal with overflows that way and that's more complicated we'll discuss that next class all right again we've already discussed this briefly but I'm going to show that there's different ways to do searches within the node again as I'm traversing the nodes traversing the tree I have to do a search on the key array to find a thing looking for it to decide whether there's a match that I want or whether I need to go left or right so the most basic way to do this is just a linear search so I'm trying to find key 8 I just start at the beginning of my sort of key array scan along to find what I want and I'm done worst case scenario I have to look at all key keys binary search if it's sorted then I just find the middle point jump to that figure out is that less than or less than or greater than the key I'm looking for or the one I am looking for and that tells me whether I go left or right in this case here 8 I land on 7 I know 8 is greater than 7 so I can jump over here then I take the halfway point of that I get 9 then I go in this direction and I get 8 and I find what I want one thing though that is kind of cool you can do if you know what the values actually look like for your keys actually where the keys look like is that you can use an interpolation technique where you can approximate the location of the key by doing some simple math to figure out what your starting point should be for your linear search so rather than in case of linear search you start from the beginning and go all the way to the end if I know that my keys or in this case integers and I know something about their distribution then I can do a really simple simple math and say well I know I have 7 keys in my array and the max key is 10 and I'm looking for 8 so if I take 10 minus 8 and get 2 and then 7 keys minus minus 2 and get 5 I know I can just jump to the fifth position and that's at least a starting point for what I'm looking for so this one this obviously works because they're always increasing the Montana order if there are floats this is hard to do if there are strings I don't think you can do this but this is another technique you could do to make that search go faster this one I don't know how common it is the binary search is I think what everyone does but again there's this trade off now in order to make binary search I have to make sure my keys are in sorted order if I'm doing the linear search I don't have to do that so therefore as I update the nodes I don't pay that penalty of maintaining the sort order alright so let's finish up real quick let's talk about some optimizations we can do to make it go better so these are the kind of things that are like again a real data system would actually do to make B plus trees go faster so the first two things we're talking about are different ways to compress the data so the first kind of compression scheme we can do is called prefix compression and this is based on the observation that because we're keeping the keys in sorted order it's very likely in a lot of data sets the keys that are stored in a single node are actually going to be very similar to each other because we end up sorting them right so in this case here I have a node that has three keys Rob, Robbing and Robot well all three of them share the same prefix ROB and so rather than me duplicating or storing that redundant ROB over and over again for every single key what if I extract that out destroy the prefix once ROB and then for the keys I destroy the remaining parts that's actually different so this is very, very common these are called sometimes prefix compression, prefix trees this is why we use in a lot of high end or a lot of large database systems because there's so much duplicate data so Facebook uses this for all their internal MySQL stuff and it makes a big difference for them they save a lot of space so this is just sort of one way to do this those other optimizations you can do like if again if I'm doing a clustered index where I know all my tuples the tuples are on on disk or on pages in the same way that they're sorted in my index then it's very likely that tuples in the same node their record ID will have the same page ID because they're all going to land in the same page so rather than storing that page ID over and over again for every single for every single tuple in a node I just store the page ID once and then store their offset or slot separately I'm sorry if you're doing something like this then how does the system choose what prefix to use because for example here you have two things that start with ROBB I know in this case it's like three things so it's kind of obvious that I didn't choose the question is how do we actually decide what to do right so you basically can say every single time I insert I figure out what the common prefix is and that's what I'll store you could say anytime I do like a compaction or do like a re-optimization then I decide what the best thing for that for the for my keys right then and there in practice also think of it like in a lot of database systems the the newer keys might get inserted in always on the one side of the tree like it's always increasing in value and so therefore the a lot a large portion of the tree on the other side is going to be static it's going to be you know mostly read only so at that point I can make it a hard decision like here's how I want to do compression compaction different trade-offs you can do it online or offline yes so the question is what happens if someone inserts the word sad right it ends up in this node then yeah you have to you have to account for that you have to maintain it on the fly correct yeah or you can say source mesh and digital metadata say this prefix is only used for the first three keys not the other ones right there's a bunch of tricks you can do so the opposite of prefix compression is suffix truncation and the basic idea here is that we can recognize that we don't maybe need to store the entire key in our inner nodes to figure out whether you want to go left and right so in this case here we have ABCD up to K for one key LMNO up to V for another key but if I'm just trying to see whether I want to go left or right I can probably get by just looking at the first in this case here first character so instead of storing the entire key in the inner node I'll just store a uniquely distinguishing prefix of it and then throw away the remaining suffix so in this case here I could have just stored L and A and L and that would have been enough but I'm sharing ABC, LMN right and then down below I still have to store the entire key because I need to go be able to have that and be able to say in my inner guidepost I don't need to have this have the full key and of course again you have to maintain this if somebody inserts something that would violate these this we have to reorder it or reorganize it but in practice if the data is not changing a lot then this could be another big win so as far as I know prefix compression is more common than the suffix truncation alright the last few things I want to talk about is how to handle bulk inserts so in all the examples I showed so far we assume that we're incrementally building out our index we're inserting keys one by one but in a lot of cases you have all the keys ahead of time so it's a very common pattern that people do in databases is that say I want to bulk load a new data set I collected data from some other source and I want to put it into my database a lot of times what people do is they turn off all indexes bulk load the data inserted into the table and then they go back and add the indexes so that way as you're inserting the new data you're not trying to maintain the index which is expensive so in this case here if you have all the keys ahead of time a really simple optimization to do to build the index is rather than building it top down like we've done so far you actually build it from the bottom up so let's say these are the keys I want to insert the very first thing I do is just sort them and we'll see in a few weeks there's an efficient algorithm we can use that can sort data in such a way that max is minus the amount of sequential I.O. we have to do so we can sort it that can be way more efficient than actually building the index one by one and then we just lay it out along leaf nodes have everything filled out correctly and then going from the bottom to the top we just fill in the inner nodes and generate our pointers so again this is this is a pretty standard technique that any major data system will be able to support when you call create index on a large data set that already exists and then once it's already built then I can maintain it or do any changes I want just like before there's no real difference to it the data system doesn't know whether you did the bulk insert versus the incremental build to build the index everything's still the same in the back yes so what if we want to merge the more complex so this question is what happens if you want to merge a small people street into a large people street like this but I would say in general building indexes with bulk inserts very fast is a very very hard problem and it's at least in academia it's underappreciated this is very very common so having a data system this as fast as possible is super important so let's talk about that afterwards the last thing we'll talk about is called pointer swizzling so again I talked about how the way we figure out how to traverse the index by having these pointers from one node to the next in actuality what we're storing is not raw memory pointers we're storing page IDs and whenever we want to do traversal to save when a fine key greater than three we start here and we say I want to go to this this node down here well how do we actually get there well in the root node I'm storing a page ID for this index and now I've got to go down to the buffer pool and say hey I have page two if it's in memory if it's not in memory go get it for me and then give me back a pointer to it so then I go get now my pointer to it and now I can do my traversal same thing as I'm scanning along here I want to get to my sibling this is my sibling is page three because that's what's stored in my node I got to go down to the buffer pool and say give me the pointer for page three so as I'm traversing I keep going back to the buffer pool manager and saying do this conversion from page ID to pointer and this is really expensive because I got to you know I had to protect my hash table and my buffer pool with latches and therefore you know I'm going to much of steps just to get this pointer so with pointer swizzing the idea is that if I know all my pages are pinned in memory meaning I know it's not going to be going to be invicted well instead of storing the the page ID I'll just replace it with the page pointer because I know if it's pinned it's never going to move to a different address so now when I do traversals instead of doing that look up to the buffer pool I have exactly the page pointer that I want and I can go get exactly the data that I want and I don't have to go ask the buffer pool of course now I've got to make sure that if I evict this thing I write it at the disk I don't store the page pointer because when it comes back in that's going to be completely different so you don't blow away the page ID entirely it's just you have a little extra metadata when we actually would be pinning these pages in memory well maybe not for the leaf nodes but at least for the upper levels the root and maybe the second level those things are going to be super hot because I'm always going to have to go through them to get down to the leaf nodes so maybe it's not that big of a deal for me to pin those pages and then they're going to be relatively small compared to the size of the entire tree and then I can use this optimization because I know my pointers are always going to be valid so this one is actually very common pointer swizzling okay alright so to finish up the beat registry is awesome hopefully I've convinced you that it's a good idea to use this for your if you're building a database system next class we'll see some additional optimizations for this and maybe do some demos with Postgres and MySQL but then we'll also talk about two other types of tree-based indexes we may want to use tribes or radix trees which are going to look like beetrees but slightly different and then inverted indexes will allow us to do full key searches any questions? hit it