 This is a reminder where we're at in the course is that, again, we're going from the bottom at the underlying disk and then building up layers on top of that inside of our database management system. And we're doing this because providing an abstraction between the different layers allows us to be more flexible in how we decide we want to build our different components. So right now we're smack dab in the middle and we're talking about access methods. We're talking about how the database system is going to be accessing our underlying tuples. And the data structures we're talking about here can be used to sort of speed up this access, trying to avoid having to do a complete sequential scan. So as again, reminder, the hash tables and the tree data structures today can be used in different scenarios inside of our database system. So for example, they can be used as to keep track of the internal metadata or information that the database system is using to know where pages are on disk or what transactions are running, who holds what's locked. Again, the first project for you guys is building an extendable hash table which is then used as the page table inside of your buffer pool manager. Then there's other things like core data storage which we'll touch on a little bit today and maybe more next class. And this is the overarching data structure that the system is going to use to actually store our tuples. And then there's temporary data structures that the database builds on the fly as it runs queries. And then there are table indexes that it uses for keeping track of the quickly identified tuples based on when we exit your queries. And so for today, these tree data structures, these are super useful and very common when using for table indexes. You don't see them that often when used for internal metadata. And the reason why is because the hash tables are only going to support a quality predicate looking up things when you have the exact key. The order preserving indexes or trees that we're talking about today, they can be used for range queries. We can scan along and find multiple entries within some range. And based on what data structure you're using, you actually can go in both directions which is kind of nice. Okay? So, we didn't really define this before, but we should define it now, or what I mean by a natural table index. The table index is essentially a copy or replica of the data in a table. And it's going to contain just a subset of the columns that you specify you want to use for your index. And so this is going to allow you to then quickly search and find the thing that you're looking for, right? We said before that our heaps are unordered blocks of pages of tuples, right? So that means if we want to find something on a particular attribute, we have to scan every single page and look at every single record to see whether we have the thing that we're looking for. But if we build a table index, this is auxiliary data structure that can allow us to quickly define just the exact thing we're looking for. And again, the metaphor that people always use is, you have your textbook in this class, right? If you want to find all the places where the word B plus tree exists in the textbook, literally just scan every single page and try to find it. Of course, that would take you a long time. But instead, you have that glossary in the back where you just go and if that's an alphabetical order, it allows you to define just what pages have the keyword B plus tree, and then you can jump to those pages. That's essentially what a table index is going to do for us. And as I said before, doesn't matter whether you're actually using, well, it does matter. What the database system is going to use on the inside for the data structure for your index depends on the implementation. And in some cases, it depends on what hints you tell it, right? If you tell you want a B plus tree or you tell it you want a hash index or hash table, you'll choose that. But the really nice thing about indexes that was sometimes overlooked in some of the NoSQL systems is that, at least in the early days, is that it's the database system's job to ensure that this table index that you're going to build on your table will always be in sync with the actual data table. So it's not like you're building a separate table where you're going to have pointers to the tuples that you want, and it's up to the application to maintain these things. That's actually a design pattern you see show up a lot when people sort of first build simple applications. They end up trying to build their own glossary index manually. And you don't need to do that because the database system will do this for you. And you can have an arbitrary number of indexes. You can have an arbitrary number of columns in there. And the data system knows how to keep those things always consistent with each other. So the thing we're talking about more on next week is now how the data system is going to decide which index to use, if at all, when it wants to process your query. So again, in your application, you just call create index. You say, for this table, create these index on these columns. And then when you write your SQL queries, you don't have to, with some SQL queries, you can do this. But in general, you don't specify any information about what physical index you want to use. You just say, here's my select statement. And then the database system has this thing called the query planner, the optimizer, that can then look at my table, look at your query, and say, well, what indexes do I have for this table? Which one would actually be the best one for me to use, and then picks that one? And some types of queries, you actually want to pick multiple indexes. And you actually scan them or probe them in parallel and then combine their results. But again, you as the application programmer don't have to specify any information about this. And again, I think in my SQL, you can give hints in your SQL query to say, I really want to use this index, and the data system will do that, regardless of whether it's a good idea or not. And oftentimes, in the commercial database systems, they have tools that allow you to extract the query plan for a query, and then you can actually, the DBA can mainly pick what indexes you use, what zone ordering to use. And you typically do this if you have a really complex query where the query optimizer just can't figure out a good plan. But in general, we don't want to do this, because whether we use, regardless of what indexes we have, we don't want to have to maybe change our application. So another rookie mistake that people often make when they're start managing their database, and this is sort of the application level, is they start adding whatever indexes they think they need. They have a bunch of queries, and they pick the best index for each one of those queries. And typically, this is almost always a bad idea, because there's this trade-off in the database management system with the storage overhead of all these different indexes. Because now those index pages are now taking our occupying slots in your buffer pool, or frames in your buffer pool, and that means you're storing less data from your tables. So there's a storage cost to these things, but then there's also now a maintenance cost. Because as I said, the database system is going to keep the index in sync with the underlying table data. So that means that every single time you do an insert, update, or delete to a table, the database system has to go look and say, well, what was this change, what columns did they modify, and what indexes would be affected by them, and it has to go and update all of them. And that can, again, if you have a lot of indexes, that becomes really, really expensive. So there's a bunch of research done. We won't talk about it in this class. There's a bunch of research done maybe 15, 20 years ago at Microsoft and a bunch of other companies for coming up with algorithms to automatically select the best indexes. Think of these as advisory tools now. The DBA gives us some information, and the tool spits out, hey, we think you need these indexes, or you think you need to drop these indexes. There's a bunch of research that we're doing here at CMIO, it's sort of the same thing. But in general, this is what you sort of pay a lot of money for DBA to do, to sort of pick what the right number of indexes you actually need. And actually, well, we can talk about this later on maybe when we talk about OLAP systems, but there's some data systems that actually don't allow you to pick any indexes at all. So Vertica is one of these systems that are designed for doing analytics. They have no notion of B plus trees, at least the last time I checked. And you can't actually call CreateIndex. They pre-sort all their columns, and that way it's essentially the same thing as an index. So for today's agenda, we're going to focus mostly on the B plus tree, because again, that's the most common data structure you'll ever see inside of a database system for table indexes. But then I want to spend time talking about two other alternative data structures, like tree-like structures that are less common, but have some interesting properties that I think they're worth discussing in this class. But we're mostly going to focus on the B plus tree again, because this is what you'll be implementing in the second project, and this is what you see most often in real database management systems. So the first thing I need to address, which is often confusing for people, is what a B tree is versus a B plus tree versus all these other trees. So there is a specific data structure called a B tree, but the problem is it's also generally used as a class of algorithms, a class of data structures, called B trees. So there is an instance of a B tree, we'll cover a little bit in this lecture, but B plus tree is a type of B tree. And then there's all these other ones, B link tree, which was developed here at CMU in the 1970s, and the B star tree, there's a bunch of these other ones. So what's sort of confusing is that I always guarantee you, if you have a database system and they're saying they're using a B tree, they're not. They're using a B plus tree. To the best of my knowledge, when I look at the manual for Postgres and I look at the actual source code, what they're describing is a B plus tree, but what they reference as how they do concurrency control is from a paper that talks about a B tree. This is all stuff, again, from the 1970s. So again, if a data system is using a B tree, they're probably using a B plus tree. And I'll explain what that looks like as we go along. And then what sort of the confusing thing is also about the B plus tree is that the modern version of it that everyone uses borrows bits and pieces of these other ones. So it uses sibling pointers. And that comes from the B link tree. The way it does split the merge comes from the B tree. So again, I'll try to use B plus tree throughout this lecture, but again, just be mindful of what someone says they use a B tree. Again, Postgres says this, the best of my knowledge, they're actually using a B plus tree. So a B plus tree is a self-balancing tree, or self-balancing tree data structure. The B in B plus is sensor balanced. And this is a balanced tree that's going to allow for efficient searches, sequential access, and searchings and deletions. And all these operations will be done in log n time. And it's different than a binary tree that you maybe learned about in interest CS classes in that in a binary tree, for every node, it can only have two children. In a B plus tree, you can have an arbitrary number of children at each node. And the reason why people actually use B plus trees is that they're awesome for doing sequential access on the database. Again, remember I said in the beginning that all these databases designed in the 1970s and 1980s had to deal with an environment where memory was expensive and very small, so your database was primarily stored on disk. And the spinning disk hard drives were much faster if you did sequential access versus sort of random access, because there's that arm that's moving around the jumper along the platter. So a B plus tree can be organized on disk very nicely to support sequential page access. And that's why people actually choose to use this. Now, what's actually surprising and what the current research shows is that even if now you have an in-memory database system, the B plus tree is still very, very good. It's still actually almost the best one. I'll show some numbers at the end that we've run. So the B plus tree was originally designed from an era where it's really efficient access on disk. But even now it actually still is actually really important. So the B plus tree is this venerable index that is not going anywhere. So there is actually no paper, what the confusion is for the B plus tree is that there is no paper that defines the B plus tree. What people normally cite is this 1979 paper, which is a survey. And the title of this paper is the ubiquitous B tree. But again, what they're describing is essentially a B plus tree. But what's crazy to think about this is like, in 1979 they're talking about this index being already widespread and widely used. But it was probably invented in early 1970s, like 71, 72, 73. So even after six or seven years after it was invented, they're saying this thing is so good it's huge everywhere. So in this particular, this paper from 1979, the only citation they have for the origin of the B plus tree comes from some IBM tech report where they talk about using a B plus tree to organize their database. As I said, there is no standard B plus tree paper. That's the first one you can read. Everyone usually cites this one. So the properties that we're going to have in our B plus tree are the following. So we'll define it as an M-way search tree. An M, again, defines the number of branches you can have in a particular node. So a binary tree would be a two-way tree. But then in the case of a B plus tree, again, it's arbitrary. And it's going to have the following properties. So first is that, we're going to say it's perfectly balanced. And that means that for every leaf node we have in our index, there's always going to be the same distance to the root of the tree. So that means looking at a single key, no matter what key it is, will always have the exact same cost. And this makes it very deterministic. It makes it very stable. And the data system can understand if I'm going to use this index to go find a single key what the cost will be. Then we're going to have another requirement that says that every inner node other than the root will always have to be at least half full. So you can define this that they say the number of keys that you can have in the node always has to be within the half as defined by M. So you can't go less than half or relax that later on. But you don't want to go less than half. And if you have to go less than half, then you want to merge it from another node to combine it together. So that is always balanced. Again, when I say that, we'll relax it later on. In some actual commercial systems, they don't actually always follow this hard fast rule because sometimes it's actually better to delay doing a merge until some later point. But I think for you guys in the second project, you just follow this rule that you always have to be at least half full. And then every inner node will also have, with K keys, always have K plus 1 non-null children. So you have K keys, then you have K plus 1 pointers to children. So just to give a sort of high level architectural overview of what this looks like. So this is an abstract representation of a B plus 3. This is not actually how you actually want to store it in disk or in memory. But for our purposes, it'll serve as an illustration. So the terminology that we want to use is that any node that is not at the very bottom of the tree considered an inner node, and then we designate the guy at the very top as the root node. And then the guys at the bottom here are the leaf nodes. So now in the leaf nodes, what you'll see is that we have these pointers now that allow one node to point to its sibling. And then there's a pointer to go in the other direction as well. So this actually comes from the B-link tree. The original B-tree does not have this feature. And so what this will allow us to do when we talk about query execution later on is that, say I want to do a range scan to find all the entries greater than 3, or all the keys greater than 3, I can traverse down to get to this first node here on the left, and then just scan across the leaf nodes following the pointer to get over to find all the elements that I'm looking for. And that's another example of having really efficient sequential access for those. So the way to think about these pointers going from the inner node down to its leaf nodes is that the boundary of what we stored on one particular side of the index is specified by the key in the parent node. So in the parent node here, I have key 5, key 9. So this path here down to this leaf node here will only contain entries or keys where the key value is less than 5. So this makes, again, really efficient for search, because if I need to go find entry 3, I first look at 5. Am I key in to say, well, I know 3 is less than 5. So I want to follow down that pointer to get down to the entry that I'm looking for. The way to think about how I'm representing the nodes here is that the first little slot here represents the value to define in the next slide or so. And then the thing to the right of it is its corresponding key. So if I'm looking for key 1, I would, again, start at 5. 1 is less than 5, so I'd follow down. And then I'm at the leaf node. And then I find 1. I find my match. And then I know I look to the left of it. And that's the corresponding value that I want. Again, this is not how exactly you would actually really store this in memory or on disk. But this is how people would normally represent this in textbooks and things like that. So the nodes in all the B plus tree are essentially going to contain key value pairs. And the keys are just the columns that we specify we want to build our index on. So if we have an index called on emails, then the keys will correspond to the values of those emails. But now the contents of the values in the nodes will differ based on whether it's an inner node or a leaf node. So if it's an inner node, then it's just a pointer to another inner node or leaf nodes. But then if it's a leaf node, there's nothing below you. You can't point to other nodes. So instead, it'll be either the record ID or the actual tuple content. And I'll define that in the next slide. But the key thing to point out is that the arrays at every node will always have to be sorted. So because what happens is when you land at a node, you just do binary search on those keys to see whether you have the thing that you're looking for. Otherwise, you have to do a linear scan to look at all the elements. So there is a slight performance penalty, a performance cost for keeping the arrays always in sorted order. But it's usually not that big of a deal, not a major overhead, because going to disk is always the most expensive thing. And then it speeds up during searches because now you can do binary search instead of the linear scan. So it's sort of a one-time cost to keep them sorted. But then you get faster lookups. So as I said before, the values, they can be either two different things. So the first approach, the first way you can design this to your data system is that the values are essentially just record IDs to tuples. So again, if you say you have an index on emails, the key would be one particular tuple's email, and its value would be the record ID for the tuple that has that particular key. And we said before, the record ID is typically a combination of the page ID and the slot number or the offset in that page. The other approach is to use what are called index organized tables, where you actually end up storing the entire tuple itself in the leaf node as its value. So in these particular data systems, there's no heap at all. You just have the leaf node pages where you're actually storing the tuples. So when you go to your search in your index and you land it to leaf node, and that leaf node will have all the data you need for that particular tuple. The tricky thing, though, is you can't do this if you have multiple indexes, because otherwise you have multiple copies for the tuple. So typically, this second approach is used for the primary key. And then in the secondary indexes, you just end up storing the record ID, or whatever the primary key ID is. So in that case here, so you say you do a secondary key index lookup. You would do the search in the B plus tree. You'd land to the leaf node. You'd get the record ID of the tuple. But then now you've got to need to probe the primary key index using that record ID, but then find the entry that has the actual tuple. So the first approach is used in Postgres and DB2 as well SQL Server and Oracle. And then the second approach is used in SQLite and MySQL. But it's also used in SQL Server and Oracle. So this is a good example where the commercial systems provide a bunch of knobs, a bunch of things you can modify or tune in your database to do whatever you think it is the best thing for your application. So in Oracle and SQL Server, you can specify that you want to have tuples stored in the unordered heaps, or you can specify that you want them to be index structured or stored in the actual indexes themselves. Postgres always uses record IDs. DB2 always uses record IDs. And then SQLite and MySQL are always using the tuple data approach. And I think I talked about this before. In the case of MySQL, for example, if you don't specify a primary key, then they generate an internal synthetic one as the record ID or the real ID. So let's look now in the case of storing record IDs what these leaf nodes actually look like. So the way to sort of think about it, again, is how most people represent it in textbooks, is that you just have this array, a single array inside your page. And on the far right and the far left, you have some pointers to your siblings. And again, we're not actually storing memory pointers in stable store page IDs. Again, this sort of fits in with you guys doing your first project, because you use these page IDs to then go to the bufferful manager and say, give me this page, and it gives you back the chunk of memory that has that, and then you can use that to find the thing you're looking for. And this is another good example of, in the case of B plus trees, and they're designed to be able to have things swapped out to disk if need be. You don't have to keep the entire index in memory. Yes? Do you need the offset for that as well? Say it again. So his question is, do you need the offset for these page IDs? Why would you need the offset? That's correct, yes. Yeah, so his question is, the statement is that these pointers here are the page IDs. And typically what happens is the size of a single page corresponds to a single node. So if you have 4K pages, then your B plus tree nodes are always going to be 4K. So the page ID just jumps you to some page, and then the system knows that, oh, I'm dealing with a B plus tree node, whether it's a leaf node or inner node, it knows how to interpret the bytes to find the thing you're looking for. So in this inner array here, we have alternating key values. So you have key one, and then immediately after that, you have its corresponding value. You can store it like this. Again, the values could be either pointers or the actual tuple contents. Oftentimes, you see things organized like this. So there's much extra metadata we have to store, like what level we're in, how many free slots we have left in our node, and then we store our previous and next page IDs for our siblings. And then we have one array for all the sort of keys, and then another array for the values, and then the offset in the sort of key array corresponds to the offset with its corresponding value in the value array. You can store it in the alternating way. I don't think it makes a big, big difference unless you have variable length values, interleaving it makes it more tricky. Because if you store the keys, the keys are always fixed size, and you don't always know how to jump to exactly where you need to go. But if they're variable size, then that's harder. So I think at this point, it's worth me bringing up now the difference of the B tree versus the B plus tree. We'll go through the actual algorithms of how you do insertion and deletion in searches in a B plus tree. But at this point, you understand the basics of a B plus tree. So now the only difference of the B tree versus the B plus tree is that in a B tree, the keys and values will actually be stored in the inner nodes. And if a key exists in a inner node, it will not exist on the leaf node. In the B plus tree, the real keys and the real values are only stored in the leaf nodes. And that means in the inner nodes, what you may end up having is if a key gets deleted, it may actually still exist in the inner node. Because it's being served as a guide post to figure out how you do your splits when you do traversing. But that won't happen in a B tree. In a B tree, it always has, if a key exists in the index that has to exist in the tree, it gets deleted and it's always removed. So the B tree is more space efficient because keys only appear once, whereas in a B plus tree, the keys can appear multiple times. But it makes it actually harder to support concurrent access to update these things. Because now you have to worry about when things get deleted, you have to worry about modifications occurring in both directions of the tree. Where in the B plus tree, you know that everyone's always going from the top down to make changes to the underlying leaf nodes. So this will come up more when we talk about concurrent control for indexes. But in general, again, the B plus tree is sort of easier to implement. So this is what I was saying before, that people may say they're using a B tree, but in actuality, they're using a B plus tree. When I look at Postgres, to me, it looks like a B plus tree. And these inner nodes in the B plus tree, again, are sort of the guideposts that allow us to find the thing we're looking for. All right, so now we understand the high level architecture of this data structure. Now we want to talk about how we actually modify it. Search is pretty straightforward. Search is just looking at the keys at whatever level you're at in the tree. And then you compare with the key that you're trying to find, and that tells you whether you go left or right or continue searching in the array. So now we want to talk about how we actually modify it. To do an insert, you basically have a key. You're going to do a search to find where that key should exist. And then you want to put it into that leaf node in sorted order. But then if you have enough space, if you have enough free slots in that leaf node, then you're done. You don't have to do anything extra. But then if it's too big, if you run out of slots, then now you have to split it. And you're going to split it into two separate nodes. And you're going to move up into the tree the middle key of that sort of corresponds to where you're doing your split. And this is a recursive operation because what's going to happen is when you split a leaf node and you push up a new key to its parent node, that parent node might get too big, too. And now you have to split that. And that pushes something else up, and I have to split that. This is sort of the self-balancing part, by ensuring that you're not overflowing your nodes and you're automatically reorganizing the data structure. So rather than me making a bunch of PowerPoint slides that try to visualize what's going on here, I instead actually want to use this visualization tool that somebody developed at University of San Francisco. Because I think it'll make this more clear. So if you go to that URL here, I can't see this. There we go. So if you go to that URL that I showed, that's the same link as this. It's just a shortener. So let's do it this way. So what we're going to do is basically insert a bunch of keys and then we're going to show how the tree will restructure itself. And it'll appear here at the bottom. So maybe I'll do this. I'll move that down. So we'll insert the key 3. Then it makes a single node. It has a root node that has an entry. We insert key 5. It finds the key it's looking for, then it does the insert. And here also, too, if you check what these radio buttons are, you can specify the maximum number of elements, the degrees, then the maximum keys that you can have. So if you hit 3, then it'll do a split. So now let's say I want to insert 4. And what's going to happen here is, again, do the binary search inside of this node, figure out where 4 should be, insert it, and then recognize that it doesn't have any more space. And now it needs to do a split. And so as we said, when you do the split, it's going to create two new nodes. In this case, here, where the root, so it's going to create two new leaf nodes. And then it's going to choose the middle key to move that up to now be the guidepost key in the parent node. So we do 4. It lands in the middle. But it's too many keys, and now we split. And then 4 goes up the top. And then now we have 3 and 4. 3, 4, 5. Let's say I want to insert 2. What should happen if I insert 2? You go to the left, because 2 is less than 4. Then you land on 3. Then you do binary search. You recognize that 2 is less than 3. So it should end up on the left of it. Is this clear? The maximum number of keys you can have. So if you hit 3, then it'll do a split. So now if I insert 1, again, what will happen is I do my binary search. I would land on the left node here. And then I would recognize that I don't have nor space. And now I have to do a split. So in this case here, what should be elevated as the guidepost above it? 2, correct. And then what happens on the right-hand side? We're 4 and 5 are at the leaf node. What happens there? Nothing. Because at the root node, I still have an extra slot. So what it did there, again, it picked 2 to split it and then put 1 as a separate page, moved 4 over in the root node, and then made 2 be the first entry we have on the other side. So let's see whether, I don't think I'll let me do a decimal, but let me start over. We'll do 1, start at 3, 5, 7, and then I'll put 6 in. So let's say now I want to, oh, I should have put a decimal in. That's all right. Let's say I insert 8, right? You do the search on the root node. You recognize that 8 is larger than 7. So you go down to the right-hand side. And then you insert 8 between 7 and 9. And then now you need to split that bottom node because, again, it's run out of slots. So now you'll see the recursive nature of this data structure because what will happen is when I split this leaf node here is 7 and 9, now I'm going to create two additional leaf nodes. But now I need to have four pointers in my root node. But I can't because I already have 3. 3 is the max that I can have. So now you have to start recursively doing splits and reorganizing everything, right? See? So is this clear? Insertion, if you run out of space, you just split. And if you may have to do this recursively, you can go up the tree and keep splitting. Yes. Yes, so your question is, does the node have to be half full? Is that the floor or the ceiling? I think you usually do the floor, yes. OK. So now to do deletes, again, you do your look up starting at the root. You find the entry that you're looking for that you want to delete. If it's not there, you're done. You don't have to do anything. That's obvious. But now you're going to go ahead and delete it. And if you have at least half full, then you're done because you know that you don't need to be balanced. But if deleting that entry now puts you below the halfway threshold, you actually have two choices. The first choice is that you can try to steal a entry from a sibling and move it over to you so that you remain balanced. You may have to change the parent node to now have the correct boundaries to go with left or right. But if you can't do that, then you have to pick a sibling and end up merging with it. And now there are performance trade offs for different instances of a B plus tree where choosing the left one versus the right one can have different outcomes. Typically, though, most times in a textbook or something like that or in a lot of systems, you just choose one. You always choose left. You always choose right. You don't try to maybe be a bit more intelligent. So then what will happen is when if you do emerge, you end up deleting an entry in your parent because now you're pointing to one less leaf node below you. So let's go back to our little thing here. So let's say here I want to delete six as if I find. So again, I'll start the leaf node. I'll go down to there, to there. And then I can go ahead and delete six. And this is OK because we're still half full. We have two entries. We're at half, halfway point. But now let's say I try to delete this eight here. It goes down, finds eight, jumps here, goes ahead and delete. And then we're still half full, so that's OK. So this is a great illustration of what I was saying before about the difference between a B plus tree or a B tree because I deleted key six and I deleted key eight. But actually, no, it elevated up there. That's a bad example, sorry. You could still actually have them deleted or have them missing or haven't still exist in here. So let's say if I tried this. If I search 10 at a land in the corner, now I try to delete nine. You don't have to do that, but they're doing it. Because it's still correct, right? Because if nine was still there, 10 would still be greater than nine so it's still correct to go over there. I don't know why they're doing that, but that's fine. Let me see if I can insert some stuff and have it show you where it actually does the coalescing. Well, I'll just delete 10. So you go down, delete 10, and it looks at it sibling seven and recognize it can't steal anything from there to borrow it so then it has to restructure everything. Right? You guys are easily excited, OK? So I think we put a max degree four. We can get it to actually, let me speed up the animation too. So insert one, three, five, seven. All right, so what we'll do here, we're going to delete seven. Let's delete three. And what will happen is the index should recognize that, again, this thing is less than half full and this guy is more than half full so we can steal five and move it over there but that means we'll have to update the parent pointer. This should work. Always test your demos before you do them. Six, OK, there's nothing to delete. Yeah, I'll add six and then we have to do a split, reorganize, maybe try to delete five. That's fine, delete six. So recognize that that middle guy was empty so I wanted to rather than just doing another merging then if I just moved it over, then I don't have to do major restructuring. OK, so again, the link will be in the slides for this here if you want to play around as you're building your project. So any questions about inserts or deletes? Any questions about merging or splitting? Yes, yes, yes, yes. So in practice, typically what happens is people have done some measurements on this. The typical fill factor you see, meaning how full are the actual leaf, the nodes in your index because you can have empty slots, typically they are about 67% to 69% full and the fan out is the how wide are the leaf node here. In terms of capacity, if you have a height of four, I mean you have four levels in your index, you can have 300 million different entries with a height of three then you can have 2.4 million entries. And so the way to sort of think about in the context of memory, you almost always want to have your indexes be stored entirely in main memory because in some things we'll look at later on, for some queries actually you can actually do all the processing just based on the indexes without having to actually look at the real data. If I wanted to account, say count me all the number of entries where the key is greater than five, I don't need to look at the actual tuples to do that count, I just look at my index, count the number of things I find and then I'm done. So typically the database management system will prefer to keep indexes, index pages in memory than tuple pages. And this is sort of part of the sort of complex buffer pull replacement policy that they'll specify of like recognizing that some pages are more important than others. And so almost always, at the very least if you can't keep the entire index in memory, you always want to keep the root node in memory, right? And if you have an eight kilobyte page size or 16 kilobyte in MySQL or four kilobyte in the other systems, then that's trivial to do that. And often what happens in the cases, in my example, say I just kept inserting entries in sort of sequential order, you see this all the time if you use an order increment key because you insert one, two, three, four, five, six right in that order, that means all your inserts are basically hitting the right side of the tree. So you always want to keep at least the right side of the tree in memory and who cares that the left side gets swapped out because you're not really maybe going back, reading those older things, right? So in addition to just like the web, whether you bar from the left or bar the right, there's a bunch of other design decisions you have to make in your database system when you build your P-plus tree that I sort of want to cover real quickly. So the first thing is what should be the threshold of deciding that you need to merge, right? Split sort of has to happen because you're out of space. You can't have overflow pages, some systems do that, but usually what happens is they end up tweaking the merge threshold. So in this tutorial on the visualization tool, it does the merging as soon as you hit that less than halfway point. But it may be the case that you don't want to do that right away because say if your application is inserting, deleting, inserting, deleting, right? The same key back and forth over and over again, then that means every time you insert it or every time you delete it, you'll have to do a merge, every time you insert it again, you'll do a split, right? And the index is just sort of churning and doing all this extra work. So a lot of times they will actually maybe delay the actual merging process until some downtime or until the administrator says do, you know, rebuild my indexes or something like that. Typically what you also see, you know, we won't talk about this too much, but the DBAs often will rebuild their indexes and that means basically just building a whole another index that's sort of nicely packed together, right? And if you have a big database, it takes a long time. So a lot of times when you see websites, like especially banks that say that they're down at like 3 a.m. on a Sunday, they're often rebuilding their indexes and they're sort of optimizing it so that the next time you do an insert and delete, you don't immediately do a split or merge. So again, the basic and the way to think about this is that you maybe let it be maybe one less than halfway full because then you don't want to reorganize immediately right away. You may, there may be something else that gets inserted afterwards that it'll handle that. So it won't be perfectly balanced but in practice it's good enough. The next thing we have to deal with is how do we handle non-unique indexes? So we talked about this for hash tables and it's essentially the same ideas in NP plus tree. So your two choices are to just interleave the duplicate key keys inside the page multiple times or you can have a pointer to some unique value list. So it sort of looked like this in your leaf node. We have the key one repeated multiple times and then each offset points to the corresponding value. Or you can do it like this where each key now points to some special array in your page where you have all the values that correspond to that key. And that means again, there's some extra metadata you need to know to say how many entries you have in your, for each value list. And so you know how to jump to that offset inside the page to see, to get the beginning to find the thing that you're looking for. I would probably say that the duplicate key, this approach here is probably the most common. Then this one here. All right, the next thing we have to deal with are variable length keys. So the first way to do this is essentially to store a pointer inside of the leaf node to the actual value of the key. So instead of packing the key in the leaf page, you have just the value to the pointer to the tuple. So that means in order to read the key, you gotta go jump to the tuple and jump to the offset where that entry is for that column and then there you can find the actual value. So this particular approach is actually very rare. I can't think of any disk that actually does this. People try this for in-memory databases in the 1980s and 1990s, but typically people don't do this anymore, right? You always wanna keep the actual key, the key contents themselves inside the page where the index. Then you have variable length nodes, but as we talked about before, that's actually really tricky to do now in your buffer pool manager because now you may have fragmentation or may have holes because the different nodes or the different pages are gonna be different sizes. I mean, it makes it hard to reuse pages. And probably the most common thing is that you just embed an array of pointers inside the node itself for these variable length values. Sort of like an extra data storage where you have the actual, the contents of the keys. The next thing we can talk about do is dealing, if you have really large keys though, is what's called prefix compression. And so the way to sort of think about this because again the B plus tree doesn't actually have to store the real, or keys that actually exist in your table in the inner nodes, right? If a key gets deleted, it may still exist in the upper nodes of the tree. And so that means that for the inner nodes, we don't actually maybe need to store the entire key every single time in its entirety. It may be enough for us to actually just to store a prefix of it and that's enough for us to direct traffic to say you go to the left, you go to the right. So let's say I have this example here. I have two really long strings and those are my keys. And but you see that right away, the first one starts the ABC, the second one starts with LMNO, right? They're already distinct enough where I don't need to worry about maybe looking at all the characters to do my comparison. So instead what I'll do is just store the prefix of them. And again, that's enough for us to figure out whether we want to go left or right. So this is actually a very common you see in database systems. It's called prefix compression. And then this is something the database system will do for you automatically if they support this. And again, when you have to then restructure your index when there's a split because of an insert, then you may have to expand out the number of characters you're storing in your prefix. Yes. So your question is what if you had the exact same prefix? Yes. So right, so in this case here, say I have something that is A, B, C, D, E, whatever, right, and the prefix matches. What should happen? You do the comparison of the first prefix and you're less than equal to that so then you know you need to go here, right? The first three characters of the thing I'm looking up on will definitely be less than L, M, O, or L, M, N, right? So I know I need to always go to that side. Now, when I want to see I have an exact match, in the leaf nodes I have to store the full key, right? This is only for the inner nodes. And we get away with this in a B plus tree because the keys in the inner node don't have to be exactly as they exist in the actual table. So you always have to store the full key in the leaf node, you don't have to store the full key in the inner node. Yes. This question is, does this only make sense when you have very little keys? You have to say 64. Yes, so yes, if you're trying to keep it word of line, your 32 bits or 64 bits, then yes, you're gonna pad it out anyway to keep a nice layer of line. Yes, this is the only matter is we have large variable length keys. Quite, okay. All right, the next thing to talk about B plus trees is actually to do bulk inserts. So it's very common that you see in certain scenarios where someone comes along with an existing dataset, like my emails that I shared last class, and I wanna just bulk load and insert them into my table, you could just follow the normal insert operation protocol that we defined earlier, where you sort of build the index going from the top down. But in actuality, what you wanna do is actually build the index from the bottom up. So what you wanna do is pre sort the keys you wanna insert, and then you just demarcate the leaf nodes along the keys, and then you build up the index, leave the rest of the inner nodes. Let's say I wanna insert all these indexes or all these keys into a single index. The first step, I'll just sort them, then I'll just create my leaf nodes. In this case here, I'm always leaving an extra slot open, but in practice, you could just keep these all tightly packed. I could put six here, and then have seven, seven, nine, and 13 together, like that. This is just for visualization. But then, again, then know what you have, now you know your boundaries, and then you just build the rest of the inner nodes on top. So when you call, rebuild index, or optimize index in certain database systems, you'll get, this is essentially what they're doing. They're basically just reading along the leaf nodes, because it's already in sorted order, and then building your index that way. Okay, so that's everything I think you, basics you need to know for B-plus trees. Again, the textbook sort of will cover these operations again. And then for the homework that is due next week, we basically do some basic things with B-plus trees. Then in the project, you actually implement your own B-plus tree. You don't have to do the splitting emerging. All right, so in the remaining time, now I want to talk about skip list and talk about rate X trees. So a quick and easy observation to think about this, about an index, is that it's the easiest way to build what we call the dynamic index, meaning if I need to allow it to have inserts and deletes, when I want to preserve the order, it's just to build a length list. Because now it's really easy to insert something because I just jumped to one location, and I only have to update one pointer. Assuming we were running in a single direction. The problem with this, though, is that we have to do a linear search. That's actually not true, it should be a binary search. Assuming it's not in sort of order, then you have to do a linear search. And but a binary search is what it's essentially doing is sort of allowing you to jump over different parts of the index. So rather than if I want to do a scan now to find all my entries, rather than sort of starting from here, jumping to there, jumping to there, if I had a little extra pointer to know how to jump over the next entry, then I don't even have to bother looking at this key here. And so you can sort of extend this and keep going up and up and have larger, larger numbers of jumps. This is sort of essentially what the B plus tree is doing, but this is another way to think about it. So this is what essentially a skip list does. A skip list is going to be a multi-level link list that's going to have extra entries that allow you to jump over intermediate nodes. So it was first developed or invented in 1990 by a professor at University of Maryland. It does now actually show up in a couple of different database systems. So this famous skip list database system is probably MemSQL, like MemSQL doesn't use a B plus tree, they use a skip list as its main index data structure. But it's also used in WiredTiger, which is the database engine that MongoDB bought, and Facebook's RocksDB, it's used again for internal metadata about things that it stored. But most data systems that don't actually use skip lists for the table indexes, only MemSQL is the one that does this that I know about. So the way to think about this is then again, it's multiple levels of a link list. And then at the lowest level, it's just again sorted in single direction, and then at the second level, you skip every other key, and then the third level, you skip every third key and fourth key and so forth, right? But what's gonna happen is when you do an insert, instead of always, you know, you have to decide whether you wanna add an entry in these upper levels, you just flip a coin and then randomly decide whether to include entry or not. So skip lists are called probabilistic data structures or random data structures because unlike a B plus tree, where you're guaranteed, if you follow the protocol, that every lookup will be logged in in a skip list, you're not guaranteed that, but in practice, it always is, right? So let's look at an example here. So first is that we have our sort of level guide posts, and again, the P here stands for, what's the probability that for a given key, it'll have an entry at this level? So at the very bottom, the probability is one because you always have to have every entry. Then above that, it's 50% and then above that, it's 25%, right? And so again, you see that within the bottom level, everything's sorted, and the second level, we have every other entry, and then they have a pointer to either the next entry at this level or the entry that exists below it. And so this is sort of like a B tree where you can't have a key in an upper level if it doesn't exist at the lowest level. When you do a delete, you have to knock out what's called the tower, you have to take out the entire tower, right? And the top here, at this point, we've never inserted anything that got to the top level, and so it just has a pointer to some null bit to say that there's nothing here to find, there's no entry. So let's see how we would insert a key, right? We wanna insert a key five. So again, we flip a coin, and then we decide how many levels we wanna insert it, so let's say we flip a coin, and we end up with the third level. So we need to put entries here at all these locations. So we end up writing to the first one first, and then we write the second one, and then the last one, and we have the pointers all the way down, right? So I'm not describing how to do this concurrently, just take my word for it, there's a way to actually make sure that you don't do this without anybody seeing a key that shouldn't exist, yes. This question is, when you flip a coin, how do you slide, what? Sorry, I mean, all right. Actually, no, hold on, hold on. So you always land here, right? Then you go to the next level, flip a coin. If it's heads, you add it. If tails, you stop, right? So if you get heads, you add it. Then you go to the next level, flip another coin, right? So it's an entry six. Yes, so you always add it here, then you get to the next one, you get heads or one again, so you keep adding it, right? And then yes, in theory, right? You could infinitely get nonstop ones, right? If you have those odds, go to Vegas, right? Probabilistically, you're not gonna get that. So that's fine, right? Okay, so right now we saw all our pointers, and now if anybody does a look up here, they'll find key five, right? Key five is fully established in our index now, right? So now let's say I wanna do a look up on key three, right? You always start at the very top tower, right? The top level, and you would say, you would look, peek down to what the level pointer is pointing at, it points to key five, but you know key three is less than key five, so you don't wanna traverse along this, you jump down to the next level. Now you see here, key three is greater than key two, so you know you wanna jump to here, and then continue along the level. You keep going until you find a key at the current level you're at that is greater than the value that you're looking for. So in this case here, key three is less than the key four, so you know you can't go that way, right? You have to go down, right? And then you come along until you find key three, the thing you were looking for. Yes? What if I just go and construct an index at every data set before? Okay, so this question is, why does this have to be a public data structure? So there is a implementation of skip lists called deterministic skip lists that do exactly as you described. Yeah. Yeah, I was gonna say that would make sense for like a bulk insert. Yeah, yes. Yeah. There's a bunch of stuff I'm glossing over here, but like to basically understand that like, you know you have these levels and you randomly in quotes decide whether to add entry. In the back. What is the advantage of randomness? So the question, yeah, the advantage of the randomness is here that like, you end up having to store less data than in a B plus tree. Right, because a B plus tree, you always have to have exactly, you know, filled in all the guideposts you need at all the internodes, right? There's keys here. In this case here, you're allowed to have possibly think of it at a level where you don't have any entry. And that makes the scan be slower, but in practice it turns out okay. The way to sort of think about this, and I should have showed a picture, is that a skip-list is essentially just a rotated B plus tree. That's not perfectly balanced. Correct, yeah, so the other issue is that I don't have to do any splits, I don't have to do any merges, right? If I insert an entry, all I have to do is just update all these pointers. I don't have to go rebalance everything, right? All right, so now I have to do a delete. Again, I'm gonna do this at a high level, but let's just say that you just set a flag in the node to say that the entry has been deleted. So if anybody scans along and sees that entry, it knows that it shouldn't be able to see it and it just ignores it. And at some later point, we'll come by and do garbage collection to clean up the space. Once we know no thread is pointing at it, right? So let's say I delete K5, right? All the entries now have this delete flag, it will initially set to the false, and then we do our lookup to find K5, set the flag to true. So now if anybody scans along and finds K5, they know K5 shouldn't be there, so they ignore it. And then we can come along and then start cleaning up all our pointers to route around K5. And at this point here, if we know that no thread is looking at it, then it's safe for us to delete it. Okay, so again, related to sort of his questions, the main advantages of a skip list or a B plus tree, sort of what I call a typical B plus tree is that you don't want to have, you don't want to always have every single key. And in this particular example of the skip list, you don't have pointers in the reverse direction. That makes it hard to do scans in the reverse direction. It requires you to do extra work, but for our purposes, we don't care about that. And again, any insertion deletion doesn't require rebalancing, right? It's always gonna be a straightforward operation. Now the downside of a skip list, and why you don't see them that often is that these are data structures that are optimized for being in memory. Now there's a paged version of a skip list where you pack multiple things in a page, and then it really starts to smell like a B plus tree, right? So you don't really get that much benefit. The random number generator can be expensive, depending on how you implement it. And then if you said you don't have pointers in the reverse direction, scanning in the reverse direction is expensive. So we're not gonna cover, this is also you really need to know about a skip list to do the homework. In the advanced class, we'll talk about how to actually build a high-performance skip list using compare and swap instructions where you don't have to maintain latches on nodes in order to update those pointers. And this is sort of why the memsegal guys chose to use this because latch-free or lock-free data structures were, people think in general they're much better than a sort of standard latching data structure. Actually, but in practice, every research show that's not true either. Okay, so now the last index I wanna show you guys, the tree data structure, is completely different than all the other ones, right? And the main difference is that in the skip list and the B plus tree, when I do a key comparison, as I'm traversing the index, I'm doing a whole comparison. Yes, I talked about the prefix where you can sort of only have to do comparison, part of it, but in general, when you get down to the bottom, you're now doing a full comparison of the key. And if it's integers, that's not a big deal because you can do that in a single instruction in the CPU. If they're strings, then that's a more expensive operation, right? It requires multiple instructions. And so a Radex tree differs in that rather than storing the entire key at every single level of the tree, you're actually gonna only store one digit of it. And by digit, I just mean like either a single bit or if it's a string, a single character, one element, atomic element of it. And so what'll happen is it's really interesting about this is that the height of the tree actually doesn't depend on the number of entries you have anymore. It actually depends on the length of the keys and the contents of the keys, right? And like a skip list, it's not gonna require any rebalancing and the way you wanna reconstruct the key is actually defined by the path you take as you search into the index. So who here knows what a try is? One, two, a few, okay, good. Who here knows what a Patricia tree is? Nobody. Who here knows what a Radex tree is? No one, okay. So Radex tree is another name for Patricia tree, right? But a try is this. So a try is basically the way to think about it is that you're taking the keys and then you're splitting up all the elements and that's what you're gonna store at each node level, right? So say you take the key hello, then you see I have H, E, L, L, O, right? All the way down and then I have my pointer to either again the record or the actual tuple data itself. So if I wanna see whether I have the key hello, I first take H, do I have H? Yes, I go down. Then I do a comparison say for the next one is E, I go to the left, then L, L, O going down. This is all a try is, right? So a Radex tree, the main difference is that instead of storing a separate node for every single element in your key, wait, I haven't had, right? You actually pack them together when you know you don't have any overlap with other keys in your index. So in the case of H, E, L, L, O, hello, I only have three keys. There's no other key that has E, L, L, O. So rather than storing individual nodes, I only have a single node with E, L, L, O. So now to do my lookup to see whether this key exists, I don't have to traverse the tree. I just do the comparison with the remaining part of the suffix of the tree. In the case of hat and half, they overlap on H and A, but they differ on V, E and T. So therefore I have separate nodes for those. Yes? So it means that the red X tree is always a binary tree. This question is, is the red X tree always a binary tree? No. We have the H node here. Yeah, and then we have three children. Why would you have three children, right? Because the next key is A, and HA from hat and HA from half overlap. If you added like another key that started H, E, I, then you have a third entry. Question in the back? Yes. Your question is with this what, sorry? Yes, your question is, do you have to do splits on red X trees? Yes. Your statement is, would it be tedious though? Yeah, it's just, I'll show you the next slide. How do I say this? Slightly more expensive, but not that big of a deal. Yeah, it's in memory, so. Another question? Correct, yeah, to generalize, it's an optimization, yes. What is it called, red X tree? Or a particular tree? No, look at me, right, I don't know. All right, so again, H E L O, and then we're about there. So there's one particular implementation of a red X tree called, called Art from the HyperData Management System. And it sort of looks like this, at the different levels, you try to pack together in a single node, you pack together all the keys that correspond to that level, and then you say you want to insert hair, right? You just, you don't have to create a new node, you just have a little extra space where you can pack this guy in over there, right? If I want to delete hat and have, then I find the entries I want to delete, and I recognize that this guy basically is hanging out by himself, there's nothing else here. So I can go ahead and move IR up above and pack it in the node there, right? So yes, you have to do splitting and merging, but it's, tedious is not the word I would use to describe it. It's not, you don't have to do a brute force or something to figure it out. Yes, said again? Her question is, does this work well for range queries? Yes, because it's in sort of order, right? I don't know, actually in this example it's not, but it should be, yes. So his question is, for hello, if for that particular one key, if the last character is different, if what, so if it's just hell? Oh, so if you sort of new key, say you sort of hell, H-E-L-L, right? Then you would split the hello and put and have one pointer to the value of the record for hell and then another record that would point to the O for hello, right? You would split it that way. E-L-L would be in one node, right? And then the O would be in another level below it because there's a distinct key that has the O in it. So you have to have a demarcation in the index to say that there's an L for that, okay? Yes? You would have that in the same node, right? I'm not really showing memory layout here too much. I feel like there's a much simpler way where you just allow an episode of character and say, okay, it's kind of there. Let's take a little offline. We can tell what they do in Hyper. Okay, so the tricky thing though about Radex trees is how do you actually do your comparison, right? Because I sort of showed simple examples here. We have a bunch of characters in a string and it's straightforward to read them from left to right as we do read in English to map them down in our index. But for other types that we want to store in our index, we have a problem because if we're running on Intel, it's a little Indian and the bits are in reverse order. And then if you have signed integers, you have the two bits at the top that are telling you whether it's negative or not, right? So you have to do an extra work that I'm describing here to actually be able to store any arbitrary value in a Radex tree, right? It's not like in the B plus tree or the skip list, you just take the two keys and you do your comparison directly on them. Now you need to make sure that you're walking through the digits in the correct order. So let's look through an example here. So let's say we want to store this integer key like this, right? And if we want to represent it as a hexadecimal number, we would have these four digits. And then depending on whether we're big Indian or little Indian, we may end up with different ordering, right? So in this case here, if I'm storing things in little Indian and I would do my comparison, it may not actually end up being the correct thing that I'm looking for. So I want to reverse it so that I'm always looking at, you know, starting at the air rec numeral, starting at the one, then followed by the six, followed by the eight, right? Because if you go in the other direction, you may end up getting an incorrect result when you do your comparison. So in practice, data systems that use Radex trees, you have to do this extra step when you put keys in and when you do probes and searches to make sure that it's always being read in the correct order for the digits, okay? So just to finish up real quickly, this is actually from a paper we published in Sigma last year that was run by a PhD student here at CMU where we did a comparison of a sort of state of the art B plus tree, a skip list, and the Radex tree from the hyper database system. And we're also throwing in here a mass tree and a BW tree. We'll cover that in the advanced class. Just sort of think of these as being highly tuned in memory database. So none of this is on disk, all of this is in memory. So the main thing I wanna show you here is that the Radex tree is crushing everything. And this is because it's really, really fast to do comparisons because you're looking at those single digits. So if you're not doing any inserts, you're not doing any updates, you're not, you know, doing splits and merges, it's super fast to do that digit comparison to see whether the key you're looking for is there, right? But lo and behold, the B plus tree from the 1970s still holds its own and actually outperforms the, in some cases, the, actually this slide now. But in practice, we have other experiments that show the B plus tree can actually beat the skip list. Yes? I'm wondering how these things appear in the same, I guess some of these are optimized for memory. So this question is, why should you throw these in the same graph? Because they're optimized for disk versus optimized memory? Well, basically, I'm saying that new. So, no, so I would say all of these are designed to be in memory indexes. So, the simplest thing to think about this in the case of a B plus tree is setting to have a 4K page, because that's the page size on disk, for a node, you have like 512 bytes because that fits in the cache line, right? That's the main difference. The graph will look good. Yeah, the graph will look, yes, he's right. The graph, I mean, depends on the workload, depends on, yes, how we organize things, right? In practice, the skip list going at the disk, if you're not packing things in a page, if you're going to get a single disk, you know, disk read for every single node that you read along the bottom, then that's going to be really, really slow. So, yeah, so there's a lot to discuss. Well, again, we'll cover this in the advanced class. All right, I have two minutes. So, real quick. So, I want to show you some extra things you can do with indexes that the book doesn't cover, but I think are actually useful in practice. So, the first thing you can call, you can create what are called partial indexes, where when you create your index, you can specify a where clause to actually do filtering of elements before you put it into the index. So, in this case here, say I have a table on foo and I'm building an index on A and B, but I add a where clause to say only include tuples, only include tuples, the only good values for A and B for tuples that match my predicate where C equals Wu-Tang. So, now if I come along and I have a query like this, right, where I have NC equals Wu-Tang, then the data system knows that I can just use that index I defined above and I can find the things I'm looking for. And the reason why you do this is because now our index has less data in it and it probably can fit entirely in main memory and it'll be really, really fast to do this. So, most commercial data systems and Postgres can do this. I don't know if MySQL and SQLite can do this. Then you can have what are called covering indexes. So, covering indexes is not something you actually specify in your index, it's just something the data system sort of get if the data systems can do this and support the assignment operation. Basically, what a covering index is occurs when all of the attributes needed for the query can be found in the index. So, in this case here, I'm doing a lookup on Foo, getting to select on B and doing a lookup on A. So, I have A and B in my index. So, I can do my probe by index, find the data that I need and then that's all I need to compute the answer. You never have to go after the actual tuples to see what the values of A and B are because they're all packed in the index. So, this is great if you can do this, but it may be the case that there are other attributes that you want to include in your query that you don't want to index on. So, for these you can have what are called include columns. So, what happens now is that I still have my index on A and B but I'm telling the data system to also pack in my leaf nodes the values of the tuple of attributes C for every single tuple that I'm indexing. So, now when I do my lookup, I get A and B from the index but then if I want to see whether the contents of C equals Wu-Tang, then I just look at the include column that's packed in the leaf node itself and again I never have to go look at the actual unaligned tuple data. Everything I need is stored together. So, this particular trick is not that common. I know SQL Server supports this and maybe DB2 and Oracle but I know Postgres and my SQL can't do this. Right? Again, the idea here is that we want to maximize the amount of data we can get from an index without having to go to disk and look at the actual tuples themselves. Okay, so to finish up, the B plus tree is always going to be a good choice for your index and in your data management system. People come along with various different data structures and things like that. The B plus tree is always very, very good. The skip list and radix trees have some interesting properties. They have some downsides that make them not maybe useful in practice as the primary index to use for a disk-based data management system but for in-memory systems you may want to choose these. And then we'll cover how to build lock-free data structures and talk more about concurrency control inside of indexes in the advanced class in the spring. All right, so next class on Monday we'll start talking about actually how to use all the things we talked about so far to actually start processing queries. We can organize in indexes, we can organize things on disks. Now we're going to start going up the stack further and say, all right, a SQL query comes in and we generate a query plan. How do we actually start crunch on the data and compute answers, okay? Any questions, concerns or problems? All right guys, see you on Monday.