 Today's lecture is going to be about tree indexes, specifically v-plus trees which we'll get to in a moment. Just really quickly, I just want to remind everyone about some upcoming deadlines. Project number one is going to be this coming Sunday, September 26th, as usual at 11.59 p.m. And homework number two will be due Sunday, October 3rd, at 11.59 p.m. So those are the deadlines and now let's jump straight into the material. So last class we kind of talked about hash tables and how data structures could be leveraged in all these different ways in our DBMS. So again, you could use it for internal metadata storage. You could use it for core data storage at the level of a table. You can use a hash table as a temporary data structure and we also mentioned that it could be used as a table index. But specifically what we're going to focus on today is table indexes and we're going to discuss why you may not want to use a hash table for that purpose and an alternative data structure called the v-plus tree that is more widely used for that use case. So I just saw on the same page a table index refers to basically a replica of a subset of a table's attributes that are organized or sorted for efficient access based on those attributes. So this is going to allow the DBMS to quickly find data that we're looking for. For example, if it's a primary key, we can quickly track down the data that we're looking for whereas the alternative, if we didn't have a table index, would be to perform a sequential scan of the whole table. So we'd have to read the whole table, all the pages to find all of the tuples that we're looking for. If you have something like an OLTP or transactional workload, you almost always will need some form of table index because as we kind of discussed in previous lectures, those types of workloads rely on low latency access. You need to be able to get to the data items that you want very quickly in order to read or update them. So you need some kind of index to get that latency, otherwise you're stuck again kind of performing a full scan. And with the table index, the DBMS is going to be responsible for ensuring that the contents of the table and the contents of the index match. So they have to be logically synchronized. We can't have updates, insertions, deletions applied to just one and not the other because then if we go and look up something in our index that's been removed from the table, we haven't yet removed it from the index, then it's going to point it to some garbage value. If it's the other way around and we go to look something up in the index that's been only inserted in the table but the index hasn't yet been updated to reflect that, the index is going to tell you it's not there. So kind of the DBMS needs to keep track of keeping the contents of these synchronized and we'll talk about that in later lectures when we focus on transactional properties. So during runtime, it's the DBMS's job to figure out kind of the best index or indexes to use to execute each query. So picking kind of the process of picking these correct indexes is part of what's called query optimization. Again, we're going to talk about that in later lectures but the key idea is that there are potentially a bunch of different ways to execute a particular query. So the DBMS is going to have to pick the best or fastest way to do that. And in terms of building indexes, there's this trade-off that we have to keep in the back of our mind regarding the number of indexes that we're going to create per database or per table. Obviously, the trade-off is that we get faster lookups. If we have indexes built, then we can use them to speed up our queries but on the other side of things, we have to pay in terms of storage overhead. So we have to store some extra metadata, the data structure itself in order to build the index and then we also have to pay overhead in terms of maintenance. So this process of keeping the index up to date and synchronized with the data that's stored in the base tables. So just in the back of our mind, we're always trying to balance these two things, the speeding up our queries as well as these overheads that we need to consider. So just a high-level overview of today's agenda. First we're going to talk at a high level, what is a B-plus tree, then we're going to dive down into how we can actually use them in DBMS, some of the different design choices that we need to consider when building the actual data structure, when we're designing the data structure. And finally, just a few optimizations that we can apply to improve our B-plus tree implementation. These aren't going to be all of the possible optimizations that are allowed out there but just a few examples. So the B-plus tree is by far the most commonly used type of index for a DBMS. So the first thing that we kind of have to talk about is this confusing naming scheme. There is a class of data structures or family of data structures called B-trees and there's also a specific data structure that is called the B-tree. So specifically in this course we're going to talk about the B-plus tree but people broadly use them, the term B-tree to refer to any of these different balanced tree data structures. So the B-link tree for example, which I think came out of CMU, kind of has some ideas where leaf nodes have these pointers that link between them, which we'll see in a few slides, but kind of the modern B-plus tree borrows ideas from all these others. So it borrows ideas from the B-link tree, borrows ideas from these others. So we're going to be in this class specifically referring to B-plus tree and I will try not to slip up and use the B-tree term, because specifically we're going to be talking about B-plus trees. If you're wondering what the B in B-tree or B-plus tree stands for, I couldn't tell you, so am I. The authors of the paper never explicitly stated what the B stands for. People have speculated over the years that maybe it stands for balanced, broad, bushy. One of the authors' name was Bayer, so it could be his name or they also worked at, but yes, it was Boeing Research Lab. So the B could have been any one of those. I unfortunately can't give you a definitive answer on it. So as I said, the B-plus tree is the most widely used data structure for indexing in DBMSs. Pretty much when you call Create Index in almost every DBMS, you're going to get a B-plus tree or something very close to it in the behind the scenes. So again, just to summarize, the B-plus tree is a self-balancing tree data structure that's going to keep the data sorted. So it's going to allow us to perform searches over the sorted data. It's going to allow us to perform lower bound or upper bound searches. We can find ranges in the tree, which we weren't able to find in the hash tables that we discussed. We can also perform insertion installations and all of this in log n time complexity. So you can think of it just as a generalization of a binary search tree with multiple child nodes. So where as a binary search tree has just two, you can have more than two children in a B-plus tree. And specifically, it's going to be optimized for systems that read and write large blocks of data. We're talking a lot about accessing data at the level of disk pages. And that is going to be the particular use case that this data structure excels at. So there's no original B-plus tree paper. Most people cite this. It's a survey paper from 1979. He mentions that there's this IBM technical report that actually described a B-tree, a B-plus tree in 1973, I think it was. But I can't find it and no one that I know can find it. So it's kind of lost history, I guess. So this is a paper that everyone references when they're talking about the B-plus tree. And even though this data structure we've seen is like most of the original publications are from the 1970s, as I said, it's still widely used today. So kind of the properties, the theoretical or formal properties that we care about with the B-plus tree, it's an M-way search tree where M is the number of branches emanating out from a node. Sometimes this is also referred to as the degree or the branching factor or the order of the tree. So this is the number of branches or child pointers coming out of a node. The properties again that we care about are that it's perfectly balanced so that every leaf node is at the same depth in the tree. Every node other than the root node is at least half full. And every inner node with K keys has K plus one non-null children. So if we have K keys in a node there's going to be one extra child and we'll see why in a second. So here's a visual example of a B-plus tree with some integers inserted into it. It has a degree or order of four because there's four child pointers emanating out of the root node. These nodes along the bottom we refer to as leaf nodes. And this node up here is an inner node since the tree only has two layers, the only inner node is also the root node in this case. So we're also going to have the leaf node layer. We're going to have these sibling pointers that essentially turn the leaf nodes into a linked list or a doubly linked list. So we can traverse along the leaf nodes in the tree and we'll see why. Or ways that can benefit us later. And the way that the keys are stored in the nodes are as this kind of a pair. So this really thin double line is a node pointer so that refers to a child node. And the key is stored next to it. So as I said, if there's going to be three keys, there's going to be four child pointers. So you can think about kind of these keys that are stored in the inner nodes as being kind of the dividing point between each of the child nodes. So for example, if we look at this tree, we see that all of the keys stored in the child node to the left of the first key five are going to be less than five. All of the keys stored between five and nine are going to be greater than or equal to five and less than nine. And all of the keys stored to the right of nine are going to be greater than or equal to nine. So at the child node or sorry, the leaf nodes, we're storing the actual keys along with their values. So we'll talk about exactly how the values are stored later, but in the inner nodes and the root node, we're going to store keys and node pointers at the leaf nodes. We're going to store keys along with their associated values. So are there any questions kind of at a high level about this right now? And then we'll go through some of the lower level details in a minute. Yes. So the question is there's nothing really on the right of things just always to the left. Yes, the space to the right is used for inserts and then as you fill up, you may need to split nodes. But yeah, so everything gets inserted in a sort of left aligned. Yes. So every B plus tree node is comprised of an array of key value pairs. So the keys are somehow derived from the attributes that the index is based on. So that could be like a primary key, it could be a combination of multiple keys. We'll see an example of that in a second. But kind of the keys are based on whatever attribute you're indexing and the values are going to differ based on whether the node is classified as an inner node or a leaf node. So again, the inner nodes are going to store child pointers and the leaf nodes are going to store the actual values or references to those values. And the arrays inside each node are usually kept in some kind of sorted order for the keys so you can search them quickly using binary search or something. We'll talk more about different search patterns or search algorithms for within nodes later but there's no rule that says you have to maintain sorted order inside one of these nodes, it just depends on what your later search algorithm is going to be. So the leaf nodes specifically, if we zoom in on what that looks like, conceptually we've been sort of visualizing it in the previous slides this way. So there's going to be at the beginning and at the end some pointer to the previous node and a pointer to the next node. And we're also storing these keys so K1 through KN and the associated values. So usually these pointers aren't actual pointers or references to memory locations usually they are just references to the page ID that represents the next node in the tree. Again having these keys and values stored here there are different ways to do this and we'll go through some options for how they're stored. One way is to as shown here physically store the value inside the node contiguous with the keys. Another option is to instead replace the value with a pointer or a reference like a record ID that tells us where to go find that value associated with the key somewhere else in the table storage or something. So yet another way we can physically lay this out and this is how the storage is typically done. You store some you know header or metadata at the top for example the level in the tree the number of slots that you have the previous and the next pointers and then you store the sorted keys in some kind of contiguous array separately from the value pointers here and this allows you just to have better sequential access when you're searching through the the sorted keys you don't have to worry about is this next thing a value is this next thing a key you can just kind of have all of the keys grouped together and then all the values grouped separately. So again kind of these keys if we know that you know key four is is that that offset then we can kind of look at the corresponding position in the value pointer array with very little overhead we don't have to you know search that one again. So as I said there are different ways that we can store the values in the leaf nodes approach number one is to just store the record ID which is basically you can think of like a pointer to the location of the tuple that the index entry that the key is referring to. The other option is you can kind of store the the full tuple data right in the leaf node with the key. So in this case you know that the leaf node there's not some kind of other data structure that like a you know a table or something that's maintaining the data that the the tuple actually lives in the index and the problem is that this is only going to work for the the primary key index. So if you have a primary key and you have your table organized in some order by that then you that that can live inside the index but if you have a secondary index you have to store the record ID so you can't have kind of these multiple sort orders because otherwise you're duplicating data and you need to keep it kind of consistent across all of these places. So there are various systems that do things different ways some let you do both but the the high-level point to take away is is the safest thing to do is is to kind of have this record ID pointer rather than this it's called a clustered index in the second approach. So I kind of I made this big deal at the beginning that sorry there's a question yes. So the question is if the record IDs are pointers to the tuples if the tuples are on a remote machine or something how would that work? So we're not up up to distributed databases yet basically so that the way you can think about this is just kind of if you you know have some combination of a page and an offset in the page that that represents the tuples record ID so you can go find it. If it's you know local then all you need to do is get the page go look up in the page directory get the page and then you can find the offset directly inside the page. I suppose if you had some kind of distributed database you would need some kind of global mapping where pages reside on different machines in your distributed setting so you know just based on that page ID you could go track down where the page is requested from a remote machine get it back and then you know you still have the offset in the page is the same. Does that make sense? So I made this big deal at the beginning that there's a concrete difference between a B tree versus a B plus tree and that we're specifically talking about B plus trees. In the original B tree paper the keys and the values could be stored in all nodes in the tree so as we saw in the previous examples we just had for the internodes we just had the keys stored and then pointers to child nodes and then when you get to the leaf nodes that's where the actual values are stored. In the original B tree you could store keys and values in any node in the tree, internodes, root node, wherever. So the trade-off here it's more space efficient since each key only appears in one spot in the tree but the difficulty is that it's harder to support concurrent access of keys so if you have potentially keys and values at all these different spots in the tree then you have to now concurrently manage updating nodes anywhere in the tree rather than just at the level of the leaf nodes. And we'll see why especially for insertions or deletions that becomes trickier as we go a few more slides in. And the other problem if you have kind of these key value pairs scattered throughout the whole tree is that it can cause problems for your access patterns. So we've also been talking a lot in the course about that particularly for disk based systems we prefer sequential access patterns over random access patterns if you have kind of you know keys at various places in the tree where you need to retrieve them you could end up with some random access patterns whereas if if the keys and values are only stored at the leaf node layer then you can get a sequential scan across those at the bottom. So kind of that's that's the main key takeaway here and the important difference is that the B plus tree is only going to store values in the leaf nodes and the internodes again the keys stored in the internodes are only used to guide and search process and they're only used as sort of these split points where when we're searching for a key in the tree we can say is the key you know greater than or less than this this particular split point. So the DBMS can use B plus tree in a few different ways if if the query is referencing any of the attributes for the indexed key. So for example if we have an index built on keys A, B, and C then we can support operations that are you know of the form A equals 5 and B equals 3 we can support operations of the form just you know A equals 5 or just B equals 3 and as I mentioned earlier the B plus tree can also handle other comparisons beyond just equality so you can do greater than greater than or equal to less than less than or equal to unlike a hash table which which can only do exact match lookups. So not all DBMS are going to support these types of selection conditions it becomes kind of tricky if you have multiple indexes or or indexes built in different ways to figure out the best combination to use but kind of the the the key idea is that we can leverage the indexes during query query processing time during our query optimization phase to figure out the most efficient way to execute the query. So just as an example of how this would look let's say we want to find key A comma B so those are the two values that we have set and we have the keys stored in our our B plus tree here so we can go to this root node and we say okay is A the first the first piece less than or equal to A that's true is B the second piece less than or equal to C that's true so we know that we only have to traverse down here and and look in this node so this is like a kind of a full lookup for the key that we're looking for so we get down to the leaf node and then we can search in the leaf node for the exact key that we want to find the the second type of lookup we can perform I mentioned is this kind of like a prefix lookup almost so we're searching for anything that kind of matches this pattern where the first key is a and the second key can be anything we don't care so we just want to find you know all all keys that match this a and then wildcard pattern so again we go to the root node here and we say a less than or equal to a yes so we're going to traverse down this path here to the child node and then we kind of just need to scan forward to find all of the keys that match this a wildcard pattern and since we have these pointers that point between or link the leaf nodes we can just scan straight across and and stop as soon as we find a key that's larger than a star so there are any questions about this I sorry can you repeat the question so the this the second yeah okay so the question is we can also support a search on just B equals something so I'm not I'm not showing it here the way that it would work is you would need to search in all of the nodes that could possibly contain that value so in this case if if we can I don't I don't know if you can do it with this tree construction but imagine a more complex tree construction where you could rule out when you're when you're traversing the path down you could rule out certain sub ranges where you know the B value can't be included does that make sense so I it's tough to do without a without an example here but if if if you imagine a more complex tree with multiple layers or different split points you may be able to exclude certain sub ranges of the tree where you know that like B value isn't between two other values does it make sense so the question is if the primary construction is on a how can you traverse down to how can you exclude ranges for the the secondary one yeah so if you can get into a particular sub so I don't think it's gonna work in this this example tree here but if you can figure out that there's no there's no there's no way that B can be included in a particular sub-range because you know the min and max values for that sub-range then you can exclude it from search yes I think like once you hit the CC you're gonna look and you can see anything greater than that it's not possible to have a B in the secondary so the comment was that in the leaf or leaf node all the way on the right it's not possible to have a B in the second position I don't I think you could in this case because so imagine like you have another like if you have D D comma B at the end I think if so I I don't think that the it works in this particular example here but I think that if you if you have a sub-range where you know that there's there's no way that you could have a B in the second position and I maybe we can take this offline after class and I can show an example but if you can narrow down a particular sub-range where you know that there can't be a B in the second position you can exclude it from your search and I so this is just with with two values I mean you can generalize to more ABC maybe that makes it a little bit easier to think about if you know that you have a comma B and then some other value you know that you know the B value you don't have to search for something that's a C in the second position does that make sense okay so for preferring an insert this is kind of the the basic algorithm is pretty straightforward basically we're going to find the the correct leaf node L that that the key belongs in we're going to put the data entry into L in sorted order so that's the key and either a pointer to the value of the value itself if L has enough space in it then we're done that's great otherwise we need to split L into two nodes so we can take the keys that are in L and split them some are going to go in the original L node so we're going to stay there so I'm going to go into this new L2 node we can redistribute the entries evenly between them and then we're going to take the middle key so the split point and copy that up to the parent so then we we kind of get this new split point inserted in the parent if the node grew too big and then we're going to insert an index entry pointing from the the parent down to the newly created L2 so kind of to split an inner node we we just redistribute the keys evenly but we again push up the middle key so it works similarly each time we want to split for example let's say the inner node where splitting is the root that's going to increase the tree height so kind of the the deletion is sort of the inverse of this we're gonna start at the root we're going to find the leaf L again where the entry belongs remove that entry from the leaf if it's at least half fold and that's great we're done otherwise we need to redistribute it redistribute the the entries somehow either by borrowing from our sibling or merging a node and the sibling so that the kind of that we need to maintain this property about the fullness of the node so then again if the merge occurred we go up to the parent and we have to delete the pointer to the node rather than adding it as in the insertion case again this can merge can propagate to the root which will reduce the tree height it'll it'll decrease the height by by one layer so one of the problems that we can run into and we talked a little bit about this hash tables is dealing with duplicate keys so the first approach is that is really simple we just append the record ID to the key that we're indexing so we just take the unique record ID let's say it's the the page number and the offset in the page and we'll just kind of stick that on the end of the key so now it's unique because you know only even if the key is the the same as another tuple only one tuple can have that unique record ID so we've made the key unique and then we can put the the key into the tree and we just do this partial partial like prefix sort to find the the key that we're looking for so the other alternative is kind of this this overflow approach so we've seen a few other cases like in the hash tables for example where if we have duplicate keys we can just kind of store them in this this overflow bucket this this works similarly I'll show you in a couple slides what it looks like but basically we're just allowing nodes leaf nodes to spill over into an overflow node that that contains the duplicate keys that we have it's typically systems will take the first approach is the second one's a little bit harder to maintain and deal with especially I mean you could imagine if you have a lot of duplicates you could have a leaf node grow kind of into this unbounded chain of overflow nodes so this is going to be the the first approach we talked about which is pending the record ID so again we have this similar looking B plus tree to what we had earlier less than 5 less than 9 and greater than or equal to 9 and what we're actually storing here instead of just the key 1 we're storing the key plus the record ID so it's not not shown here but imagine that there's the key so 1 3 6 whatever as well as the associated record ID that uniquely identifies the the tuple that we added so now if you want to insert 6 again what we're really going to be doing is already a 6 in there what we're really going to be doing is inserting 6 plus this record ID which is just the you know page and and the slot of the tuple so now we've we've kind of solved the problem we're going to to figure out that it belongs in that bin we're going to split the bin because now it's too full so we split it into these two separate sub bins and we can we can put the new we'll update the the the pointers move the middle key up move 7 up to the parent node and now we can insert 6 the new 6 not the the existing 6 we insert the new 6 which is the page which has the page slot record ID appended to it so we can get both of these duplicate keys stored in the B plus tree and they're still uniquely identifiable so does this does anyone have any questions about this yes sorry so the question is you don't have to store pointers at the leaf notes you mean in this this picture here yeah so I I excluded them there there should be pointers so there are pointers there there should be the pointers redrawn between all these it's just got a little messy with the digger but there there should be pointers between all those leaf notes yes the question is do you also need to store the value pointers in the leaf node because you already have the record ID stored inside the key I you don't have to store the value pointers so that's one possible optimization you could play because you can you know find out where to get to that value just by looking at the the second part of the key that we've the composite key that we've created I guess this doesn't this doesn't necessarily help if you want to store the values embedded in the leaf node just for you know ease of access in a clustered index but yes that's a one one possible optimization yes yes so the question is do you also store the key and record ID in the the inner nodes I I I'm not sure if you need to store the key and record ID in the inner nodes because I think you can get to the sub-range that you need I think you should be able to get to the sub-range that you need unless you fill up then you can collide in the inner node yeah so if you if you fill up the inner nodes then you can store it there as well yes I so I think the the question is to the record IDs gets stored in the middle nodes I think the answer is yes because imagine that you have more duplicates than fit in a page yeah because then if it needs to split based on something otherwise you wind up with some kind of like orphaned pages somewhere yeah okay so the the other alternative I mentioned was these overflow leaf nodes so again imagine we want to insert six here the node that it belongs in is full so we're just going to create kind of this overflow buffer where we're going to you know just have a pointer that points to it on the end and then we're just going to be able to fill it up with additional values so we just inserted you know three values there six seven six and they kind of just go in this overflow buffer the the in the example the the leaf nodes are no longer sorted again this isn't technically wrong but we just need to know that we we can't do binary search in the leaf node because kind of they're out of order now you need to do sequential search in it physically these get stored in in separate nodes so there's you know the original node there and then the overflow node or bucket logically it just looks like one you know really really big leaf node and kind of it's it's easier to leave this sibling pointer on the exact neighbors so that when we add in the overflow bucket or the overflow node we don't update the pointer we don't change the pointers from the the original leaf node to its sibling we just keep the pointing to the original one so I mentioned clustered indexes basically the idea is that the table is stored in the sort order specified by the primary key so you have a primary key on your table that uniquely identifies the tuples in the table and a clustered index is going to use that primary key as the the sorting key so this is kind of how different ways that we can use the the B plus tree in the DBMS and some DBMS is always use a clustered index if a table doesn't have a an explicit primary key that you as the user have specified then the DBMS will automatically make like a hidden primary key other DBMS is don't use clustered index at all so they just keep the pages or the the table stored as unordered pages and they they they don't have this kind of index organized table structure so what this would look like in during query processing is if we want to perform a search we're going to traverse the left most leaf page and that's going to give us tuples from all of the leaf pages so if you think about kind of we have this index structure here which is going to direct our search and we have the the data entries at the bottom and then below we refer to these data records which are stored in pages this is going to give us a sorted order for the table and again if it's clustered then our our pages are going to be organized or sorted based on the primary key this is going to be better than some kind of external sorting algorithm because it's already sorted in the index so kind of you can imagine the pointers like I said going in increasing order across these pages and our scan direction is going to be we're going to be able to scan you know sequentially across all of the values because they're already maintained in sorted order so on the other side if we have a non-clustered index so that means that the the table is not sorted by whatever the key is the primary key or maybe this is secondary index then we're going to be retrieving tuples from the index look up in the order that they appear in kind of the the base page layout and this can be very efficient because inefficient sorry it can be very inefficient because it might lead to a lot of random accesses in the data so just as an example of what this looks like again we kind of have this scan direction so we want to perform a sequential scan from some low end of the range to some high end of the range but what the pointers actually look like are kind of these random points so that the the keys are in sorted order but the pointers are in kind of all these random out of order pointing to all these different pages so we may end up fetching pages multiple times in kind of this random order and it's it's not going to be very efficient for scanning the range so kind of what what the access pattern might look like is if we have all these different pages and we just do it by by scanning across the index like this and accessing each tuple as soon as we find it in the index we're going to have several page reads here repeated page reads where we you know read pages and then you know read other pages and then go back to the pages that we looked at earlier so kind of one way to get around this is if the DBMS rather than accessing tuples immediately performs some kind of scan to accumulate all of the pages that we're going to need in advance and then we can sort them based on the page IDs that we're only accessing pages once so what that would look like here is you know we do the scan across the index we get all of the pages that we're going to need you know 101 102 103 104 and then we sort them in that that order so we only have to access each page one time we don't have to have these multiple reads jumping all around the file because the entries are out of order so there are any questions about this part okay so there are a wide variety of design choices that we need to make when we're physically implementing the the B plus tree so far we've been kind of talking about them abstractly but you know we need to make decisions about what what the size of the nodes are going to be what our merge threshold is going to be all that kind of stuff there's this great book modern B tree techniques and it covers a lot of these things we'll talk about plus others if you're interested I encourage you to take a look at it but kind of we'll just go through a few of these design choices that are pretty important when when we're physically implementing the B plus tree so the first thing that we need to think about is the node size kind of there's this this rule that the these the slower the storage device that you're using the larger the optimal node size for B plus tree is going to be so if you're on a traditional HDD then the optimal node size should be around one meg one megabyte faster things like an SSD optimal node size around you know on the order of 10 kilobytes and if you're in memory I can get as small as 512 bytes so kind of the on the slower storage devices that have this this better sequential read performance or sequential access performance the larger node size maximizes the sequential access so if you you know are reading one megabyte from disk it's a lot better than reading much smaller node sizes from disk and again the optimal sizes can also vary depending on the workload so if you're performing a lot of leaf node scans that's going to be different than if you're doing a lot of root-to-leaf traversals so kind of we need to consider both the hardware that we're running on as well as the the workload so the the second piece I mentioned was this merge threshold so we talked about kind of the theoretical rules for when you merge nodes when they're half full not all DBMSes do that sometimes what they'll do is delay merge operations in order to reduce the amount of reorganization they need perform and the idea is basically that you know if you're if you're doing a bunch of insertions deletions maybe insertions will come along later to fill up the the less than half full node that you have so kind of delaying or deferring these reorganizations that can be performed periodically to clean up the tree without having to do it on on every insert or delete that that would normally trigger this kind of merge so variable length keys are another important thing we need to think about there are many approaches I have listed for here kind of the first and most obvious way is just to store the pointers so the the keys are just a pointer to the the tuples attribute we don't actually store the variable length key we just store a pointer to the tuples attribute and then we can go look up wherever that's stored in a page somewhere to find out what the value is approach number two is kind of having variable length nodes so imagine that each node in the the tree can have a different size we don't have a fixed node size they can vary depending on the the keys that are stored in them but you know on the downside this requires pretty careful memory management I don't know of any systems that do it this way because I think it gets pretty complicated with figuring out the size of individual pages approach number three is to use padding so basically you just allocate the maximum possible space imagine you have strings or something and you know the maximum string is 100 characters long even for for keys that are less than 100 characters you always allocate a hundred hundred bytes for the the key if you know most of your keys are all about the same length this might be okay but if you have you know one really long key that you need to accommodate and the rest are pretty short then it ends up being pretty wasteful and the the final approach is kind of this key map in direction approach so basically it works similarly to the slotted pages that we talked about for the disc pages where basically you embed an array of pointers in the beginning of the node and they map to some kind of variable key length that they make to some kind of variable length key at the end and they kind of grow grow towards each other in the middle similar to how the the slotted pages worked so the the final piece and we've kind of touched on this a few times in in some of the earlier slides but is how once we're inside a node how we actually find the keys in the node so again these are just three examples of ways you can go about searching for keys in the node there are many others but these are pretty straightforward so the first approach is just a simple sequential or linear search so you scan the nodes from the beginning to the end imagine for example you want to find key 8 in this array of keys so we're gonna do is we're gonna start at the very beginning and it's just going to be you know a loop going forward one one key at a time until we get to key 8 and we're we're done as I mentioned earlier if if you are the keys are not stored in sorted order inside the node then you're kind of stuck with linear search there's no other way other than to scan linearly or sequentially through the keys in the node yes so the question is it does this just apply to the leaf nodes or can you have inner nodes that also have the keys stored in unsorted order I suppose you could also have the inner nodes stored in unsorted order but you would need to look at every single you would need to look at every single key to find out which two keys you you belong between right because without without looking at all of them you wouldn't there could be another one you know that further subdivides the range so I think yes it's possible but I don't think it's practical or efficient because you'd have to you know look at every single key okay so this is linear sequential search I'm sure everyone knows binary search basically if the keys if we know that they're stored in sorted order we can jump to the middle key and then we can we can pivot to the left or right depending on the comparison so in this case 8 is greater than the middle key 7 so we're gonna go over here and search this sub range 8 is less than 9 so we're gonna you know move over and now we found 8 so you know this this maybe doesn't doesn't look like a huge difference in the number of comparisons that we have to perform with a size array of size 7 so we're looking at 7 keys but as you scale up if you're looking at many keys then then the difference here can be substantial finally the third approach that I'm gonna mention is called interpolation search so basically you're going to approximate the location of the desired key based on the known distribution of the keys so again we have this array and basically we're going to estimate the offset based on the distribution so you take the key that you're searching for minus the the start key so that's 4 times the length of the array is 7 divided by the total range and that gives us using integer integer division gives us offset position 4 so we can jump right to the position in the array that that the key 8 should exist at now if if the the distribution isn't this is a linear interpolation if the distribution isn't quite linear then you may have to perform this interpolation several times or switch to some you know if if you can get close really close to the key initially you can fall back to some other either linear or binary search afterwards but basically the idea is to just use some kind of interpolation to narrow down the the search range that you're looking for okay so the the final piece I want to talk about is just a few different types of optimizations that we can apply in the B plus tree we're going to talk about prefix compression deduplication and bulk inserts and there are as I said many many more if you're interested you can either check out that that book where there are plenty of papers available that describe all sorts of optimizations of varying complexity but these these three are pretty common and I think pretty impactful so we'll just go through a few of them quickly so prefix compression basically if we have sorted keys in in the same leaf node they're likely to have the same prefix so the example I have here are three words robbed robbing and robot instead of storing the entire key each time so we have you know these three keys stored we can extract the common prefix that they have and store only the unique suffix for each key there are many variations on this scheme but basically this is the the premise we take the prefix rob and then for each key we store the unique suffix below it so this depending on you know how similar the keys are we can end up saving a lot of space if there's a lot of a lot of reputation of prefix so the second optimization deduplication basically if we have a non-unique index we can end up storing multiple copies of the same key in the leaf nodes so in this example let's say we have you know key one repeated three times we end up storing the key as well as the associated value instead what we can do is just store the key one time and then sort of maintain a list of the tuples with that key so similar to what we discussed in implementing the hash tables if there are many duplicate values we could just store the hash key once and then sort of a list of values associated with those keys so we could transform this array here into something that's more compact it looks like this so we store the key once all of the values associated with the key and then you know the next key so the final optimization I'm going to talk about is related to actually building the index or performing a bulk insert so this means imagine you you have kind of like an existing table or some existing data that you want to build an index a new index on top of so you're not starting from scratch you have something you know a table that already is data in it the fastest way to build a new B plus tree index for this existing data is to first sort the keys so we get the keys pre-sorted and then we build the index on top of the sorted data from the leaf nodes up so kind of what the way we've been talking about inserting into the tree or expanding the tree so far is by starting you know we always start at the root node and we work down until we find the the leaf node that the key belongs in and then kind of we perform the insert from there in this case we're starting from the leaf nodes so we take we take the data sort it and then we can build the leaf nodes efficiently as well as then subsequent layers of the tree so imagine we have these you know this this pre-existing data these keys the first step that we want to do is sort the keys so we get them in this sorted order and then it's it's more efficient for us to just kind of scan along here and insert them into some nodes we of course build the sibling pointers between them and then we can build the the upper layers based on these values so kind of the this this way of performing a bulk insert or a bulk load allows us to be much more efficient than having to you know check all of the logic of traversing the tree performing splits on every single insert so are there any questions about any of these optimizations that we took yes so the the question is are there efficient ways to merge B trees yeah so I'm not sure I think that probably it ends up being similar to like like a like a sort merge thing where if you have you know two two just think about it at the level of the leaf nodes right so you have one sorted array essentially over here and you have another one over here and you can kind of look at the first element see which one smaller take that and then take from each of those in order to to build up the tree does that make sense sorry I didn't so the the question is for the inner layers you end up having to rebalance or or split or merge things anyway so this end up being the same yeah I think it is so starting from the the leaf layer since the data the keys for each of the the two trees say verging two trees for simplicity the keys from each of those should already be in sorted order so I think you can start at the leaf layer fill out the leaf staking from each of the two trees and then build up from there I there may be and I am sure there is somewhere in the database literature work about efficiently merging two B trees or B plus trees at like the inner node part I can try and track that down for next class but I I don't know of anything off the top of my head but just from a you know thinking about it for a second I think you could start from certainly start from the leaf layer and you'd have you already have two sorted you know key sets that you can pull from in order to do the merge but that would still require building the the inner nodes on top of it yeah yes so the question is does a bulk insert mean inserting a new table a bulk insert means a bulk insert into the B plus tree so it's essentially imagine you have some pre-existing data it could be stored in a table it could just I mean imagine you have just a big array of integers or something you want to build a B plus tree on top of that pre-existing data so the way to do that rather than you know starting from the the root and kind of doing inserts one at a time because then on every single insert we have to traverse the tree find out where the key belongs potentially do a split to to to make enough room in a new node all that stuff rather than doing that since we have this pre-existing data if you just take the data and you sort it then you can efficiently build the leaf leaf node layers because each leaf node you know is going to contain so many values and then build the the inner node layers on top of that until you get to the root so it's it's imagine you have some pre-existing data so it could be stored in a table and you want to build an index B plus tree index on top of that pre-existing data so it's like a like a bulk load in into an index does that make sense great there any other questions okay so just to wrap up the the B plus tree is almost always a good choice for your DBMS it is frequently used as both a clustered and non-clustered secondary index in a wide variety of systems and as I mentioned even though it's from the 1970s it's still around and widely used today and if you if you issue kind of a create index command in most systems usually you will get a B plus tree behind the scenes so with that next class is going to be about index concurrency control so if you have a concurrent insert or deletes on indexes how do we make sure that those are our thread safe so if there are any other questions we can talk now or I have office hours right after this so I will see you next week 40 I'm out I'm out drinking drinking drinking then I burp ice cream with the BMP and the E-Drum get us a sane-eyed brew on the job