 So, today what I want to talk about is the beginning pickup on the stuff we didn't discuss last time. So, I think the mistake I did this semester was I'm trying to cram what was it, three lectures now into two lectures. So, normally we would discuss B plus trees and latches separately, but now I'm trying to include them in discussion of the tries here. So, first we'll discuss how to actually implement latches or what kind of latches we could use in our data system. Then we'll talk about how to do latching in a B plus tree, and of course this would be now contrasting with the BW tree, which is meant to be latch tree. Here's one B plus tree you have to have latches, and then we'll focus on what you guys read about in today's assigned reading on the GDRay, the art index, and the mass tree, all variants of tries. Okay? All right, so again, recall from the introduction class when we discussed the difference between locks and latches, that locks were this logical concept in a database system meant to protect entities within the database itself, like we can take locks on tables, take locks on tuples, take locks on blocks of tuples, you know, in the entire database. And then the latches were going to be these low level primitives you would use to protect the critical sections of any kind of current data structure, or anything that needs to have multiple threads accessing or modifying at the same time. And so this can get confusing if you're coming from the OS world, where there's no notion of a latch, there is now in C++, but it means something completely different. In the OS world, they refer to these things as locks, and so for today's lecture, we're mostly only talking about latches, and it's sort of confusing because the way you would implement a latch is a spin lock, so whenever it's time I say lock in this lecture, I really mean latch. But when you go read out, go out and read things that aren't in the database literature or community, they'll refer to things as locks, but they really mean latches when you're referring to databases, which is kind of confusing, sorry. So if you want to have a latch in our system, again, we are the database system developers, we are the people who are actually building the system, so we need to make a decision about what kind of latch we want to use. I would say we're not in the business of writing latches ourselves, and I do not encourage you to do that, although you can build the most simple as one, which I'll show in a few more slides, but the general conventional wisdom is do not write your own latching library. There's a ton of ones already out there, and just pick one of those. So we want to talk about what we're going to want to have in our latch that we could apply to our database system to sort of have to be most efficient in terms of storage capacity of how much space it takes to occupy the latch in memory, and also the efficiency of sort of the latching mechanism itself. So I would say we want a small memory footprint, right? Think about now in your B plus tree or in your Radex tree that we'll talk about today, like every node is going to have a latch. So if your latch is like a couple of hundred kilobytes, that's stupid, that's wrong. But even then, if it's a couple of bytes, that can start to add up, because you're going to have one for every single data structure or every single node. So like in the P thread and UTex, they are going to be 64 bytes, which is actually a lot. And the reason why it's so big is because they have to, for historical reasons, they have to have backwards compatibility with some older CPU architecture that require the latches to be 64 bytes. But there's other implementations that we could use that will be much smaller. We obviously want, in the best case scenario, when there's no contention on whatever it is we're trying to access, that we can acquire the latch very efficiently. So this is right, we want a fast execution path when there's no contention. We just go ahead and acquire the latch, and it doesn't take a lot of work, a lot of instructions for us to do that. And then when the, if we can acquire the latch, then we want to be able to deschedule our thread with the operating system so that we're not just spinning there burning cycles in a spin latch, which I'll show in the next slide, right? Because this is just burning cycles and wasting the CPU. And the OS is going to have trouble scheduling you because they're going to think you're actually doing work and keep trying to schedule to execute you when you're actually out of your knot. If now we have to maintain any metadata also too about what threads are waiting for us, or what threads are waiting to acquire a latch, we don't want to have to have each of our latch instances allocate and maintain their own queue data structure to keep track of what threads are waiting. So if we use OS blocking mutex, the OS does that for us. But if we're using something in user space, right, above the kernel, then we don't want to have every single latch, you know, maintain its own queue because that's going to blow up the size of the thing. So this is your question, yes? It's like when you say descheduled thread that's mainly going to the OS blocking. Going to the OS and saying I can't run anymore, don't. And here's the OS lock that I'm waiting to get notified on so that I can run, right? So, yeah, sorry. Does that mean that every single, like, lateral thing has to like a certain level wrap the OS lock? We'll get there. His question is, does that mean it's every single latch implementation has to wrap the OS lock? No. But it's actually, your question is actually very pointy because this is actually in the news, if you want to call it that, on like Hacker News and on the Linux mailing list because some guy that was working at Google on their new gaming console, the Stadia thing, it's like the streaming gaming console, they had this long blog post about how the OS kernel locks are a mistake and you really wouldn't be using spin locks. So then, of course, Linus didn't like that and he had a long post where he was saying like about how this is wrong, you know, why user land spin locks are a bad idea, so I'm obviously going to read the whole thing. The only thing I'll point out though is he has this one comment here that I'll highlight where he says, do not use spin locks in our world spin latch in user space unless you actually know what you're doing and the likelihood that you know what you're doing is very, very low, right? So, prior to this, every semester when I taught latches, I'd always say, oh yeah, never use an OS latch, a few techs, a p-thread mutex, because the going down into the kernel was always really expensive, right? To tell the OS, hey, I can't run anymore, that's now going into the OS. You have to update the data structure that the scheduler is maintaining to keep track of what threads are running and what they're waiting for. And I always thought, yeah, there's better off to do everything in user space. So he's claiming that, in his mind, the OS always can do a better job than what anything else could possibly do, because it has at least a global view of everything that's going on, and can make decisions about whether to schedule or not, right? We can't control that in our database system entirely. If we do green threads and manage our own threading stuff up above, we could do that, but that's usually an overkill for most systems, right? So again, the point I'm trying to make here is that prior to this blog post, I would always say use a test-and-spet spin latch, and I think the current understanding in the state of the art is that you don't want to do this. All right, so let's talk about what the different kind of latches we can implement. So the test-and-spet spin lock or the spin latch is the most basic one, and this is one you can implement yourself without necessarily going back and yielding to the OS. Then there's a blocking OS mutex, which is what Linus is referring to. And then I think the right thing to choose actually is this one, the adaptive spin lock, which is a combination of these two, and they're a little bit smarter than what the OS blocking one can do. And then we'll look at more sophisticated implementations, the Q-based one, and the reader-writer locks. But you can build these on top of these other ones here. These are like the basic primitives, and then you can do more sophisticated things on top of this. So the most simplest way to implement a latch is the test-and-set spin lock or the test-and-set spin latch, all right? And all it is is that we just have some chunk of memory, like a single byte or 64 bits, that we're just going to test to see whether it's set to zero. And if it is, then in a single-comparance swap instruction, we'll set it to one, meaning we've acquired the latch. And so you can get this in C++. They now, in the SDL standard, they provide you with this atomic template type. And then you can put whatever primitive you want, Boolean, integers, and things like that in there. So the way you basically use it is like this. So I would have, I define my latch, and this is just a syntactic sugar to declare an atomic Boolean. And then I have this while loop where I try to set the latch. If it's zero, and my single-comparance swap instruction sets it to one, I hold the latch. If not, then I fall down to this while loop, make some decision about what to do, and then come back and try again. So I'm spinning and burning cycles trying to get this thing over and over again. So the tricky part is obviously in the middle here, and this is what line is sustaining, that the decision of what to do here is non-trivial, and you may end up doing the wrong thing that could cause the OS to big, bad, skeleton decisions. Like you could just yield immediately, but then that's now going down to the OS, and more sys calls, and it's getting slower. In a database system, in the case of the BW tree, he was asking, isn't the mapping table, compare and swap operation, isn't that the same thing as a latch? Sort of yes, right? Because if I try to update the location in the mapping table, if I don't get it, then I'll abort the operation and restart it over again. Or I can retry and go over again, right? So again, this is the most simplest way to do this. We use a, in our system now, we actually use a spin latch that does essentially the same thing written by, from Intel, called the thread building blocks. So if you ever see like there's a spin latch in our system now, it's essentially doing this. But one of the downsides of this is that it is not very cash friendly. So what I mean by that is like, say I have two threads running on separate sockets, and there's some latch they both want to acquire here. So what's going to happen is, say this thing is held by another thread, and these guys are just going to spin and keep trying to call test and set over and over again trying to acquire this. For this guy, the memory is local to him. We'll talk about new architectures later on. But like this thing at this dim is closer to this socket, and it can access it more quickly than it's other guy here. So he can do this efficiently. But this guy over here has to now go over the interconnect and say, over and over again, is this thing set, is this thing set? And then whenever somebody has acquired the latch, then we have to do a cash evaluation message, because we know this guy is trying to read it. So the most simplest spin latch is not efficient from the CPU standpoint. It's not scalable, and the OS is not going to be, doesn't not going to know anything that we're doing. So what he was then referring to about the OS lock, this is sometimes called the basic OS mutex. And the basic idea is here is that I declare a standard mutex, and then it's just like the other code I had before where, but instead of having this explicit while loop where I'm spinning trying to acquire the lock, I just say, hey, try to lock this. And then in this case here, my thread will get blocked if I can't get it, otherwise then I fall into the critical section and do whatever I need to do. Anybody knows what you actually get when you declare a standard template mutex? What's that? Yeah, but underneath the futex, before you get to the futex, yes, but there's an alias in front of that too. P thread mutex. And the P thread mutex is just a futex. Other than that, I don't know what a futex stands for. Fast user space mutex. So what it is, it's basically a spin lock plus an OS lock combined together. So what will happen is, again, the user space latch, the spin lock, and then the OS latch down in the kernel. So if I have two threads come along and they both want to access this, they both want to acquire this lock. Say this whole thing represents a single lock. This first guy gets it. The second guy doesn't. So then he goes down now and waits on the kernel lock, the kernel latch. And now he gets descheduled. The schedule knows that he's waiting for this thing to get released. And once that happens, then he can then start running again. So this is clear. So this part is cheap because this is just a test and set. And again, when there's no contention, this is super fast because this is just hanging out in user space memory. I can test to see whether I acquire it without having to get a syscall. If I can't get it, now I've got to go down and block on this thing, that's expensive because that's a syscall. And some measurements take roughly around 95 nanoseconds, which is a lot. Think of that like in the test and set case, it's a single instruction which could be depending whether it's in cache or not. It could be one cycle to acquire that latch. In this case here, because we're going down to the kernel, the kernel has its own protectant primitives, then that gets expensive. So a better approach to sort of combines the two of these and still looks like a few text because there is still that fast user part is called an adaptive spin lock. And the idea here is that unlike in a spin lock where we spin in user space over and over again trying to acquire it, and unlike in the fast user mutex where we immediately try to get the latch, we can't get it, then we fall down to the US kernel, what you can actually do is allow a thread to kind of spin on the user space lock for a little bit. And then at some point you give up and then you fall back into the OS kernel and try to lock on that thing. And what they're going to call this is now a user space is this parking lot where you can have in user space keep track of all the threads that are waiting for this. They're sort of replicating what the OS is doing, but because it's up in user space and not a syscall, you can potentially, it's much more faster in some ways. And then what will happen is if a new threat comes along, tries to acquire a latch, we see that the user space latch is already held and we see people in our parking lot getting, or that are blocked, then we immediately go park ourselves and say, we're waiting on this as well. And we don't have to go down and block on the OS kernel. Well, in the parking lot there is a lock that will get us to be scheduled. So Apple has this thing called the WTF parking lot lock, or WTF lock. It stands for the WebKit template framework or something. But this seems to be the, if you're not going to use a blind spin lock, and you're not going to use OS blocking mutex, this might be the better way. Supposedly, it's better to use other Linus disagrees. So this is actually what the HyperGuys are using for Umbra in Germany. And somebody else is using this, but the name escapes me now. There's another data system that's looked into using this as well, and they claim that it's much better than the OS blocking mutex. Yes? Do you know if this one, if it's been times like, like usually seems like a problem? I think it's supposed to adapt itself. That's the idea. OK. So, all right. Actually, going back to this quickly, the other thing to point out, too, is in the OS blocking mutex, when you allocate the lock, when you allocate the mutex, it allocates the user space one and the OS lock. In this case here, you don't actually allocate the OS lock until you actually need to block on it, until you get put into the parking lot. And in Apple's war for WebKit, they wanted this because I think they're acquiring a latch for every single JavaScript object that gets instantiated in the JavaScript runtime. And so in their world, they have a lot of latches. In a database system, we have a lot of nodes in our B plus G potentially, but not at the level of maybe they're facing. All right. So now let's talk about some better implementations that we can build on top of our basic latch primitives. So the example I show with the spin locks where there's this cache coherency issue where if two sockets are trying to access the same latch, we have traffic going over the interconnect, where everyone's trying to spin on this one location, and we have to send invalidation messages to every single thread. You can do a Q-based spin lock where instead of having every thread blocking on the same memory location, you sort of daisy chain them together so that one thread blocks on one location and the next thread blocks on the next location. And then when you start releasing them, it sort of propagates through the queue. And so when you release one of them, it's only one sort of cache validation message because only one thread is actually blocked on one location at a time. So let me show you what I mean by this. So this is sometimes called MCS. So the Linux kernel actually uses this for other aspects of internally of the system. Mallory Crumby and Scott is the name of the guy that invented this. So if you Google MCS spin lock, you'll find information about it. So we had this base latch. Again, this could be the OS blocking mutex, the parking lot one. It doesn't matter. And then we have now our CPU comes along. And then when they want to acquire this latch, assuming this thing is already being held by somebody else, we're going to go ahead now and update our pointer to keep track of that. This is the next one to release. This guy wants to acquire this latch. It's not being held. So he's going to instantiate this new latch placeholder in this queue, goes ahead now, and acquires the latch by setting the pointer to now this new location here. And once that's done, again, compare and swap, we hold this latch. Now, when another thread comes along, they're going to try to acquire this latch, see this pointer is set and realize they can't acquire it. So instead of spinning on this and trying to do test and set to see whether I can acquire it, they are going to then follow the pointer here, see that this thing is now on set, update its pointer, which is compare and swap. So now it's sort of claiming this position in the queue. And then it spins on this. Same thing, and so forth like that. So then now what will happen is if this guy ends up releasing the latch, it would then get notified that this thing got released. And then this guy could take the latch next. So this is nice because, again, this could be one region of memory on one socket. This could be another region. And they're just potentially spinning locally, not going over to some remote memory location. Yes? It's like a CPU of one latch is released. If there's a CPU of four, does it go to the base latch and traverse the train again? So if I release this and four shows up, where do they follow? Yes, so the lines are kind of bad. This thing is actually spinning on this because it's waiting for this thing to get released for this pointer to get set back to zero. So everyone always has to start here and traverse the wall. The tricky thing now too is also say in my example here, everyone is just waiting to acquire the latch because you always want to do it. But if I have bought my operation, if I try to spin too many times, and this guy goes away, now I need to sort of reconnect them because I have this hole. And that starts to get tricky. If you always know that I'm going to block forever, and in some parts of the data system, you will want to block forever until you actually do something, then this is fine. But if you need to pull things out at different positions, then it gets harder. Yes? For this, is it reasonable to context switch yourself out, or somehow a signal will wake you up whenever the latch becomes available, even though you're in a queue? So what am I actually doing here? I'm waiting. What am I actually doing? Yeah, are you context switching yourself out? Is CPU2 actually spinning on this, or is it just swapped out in this amount that's right around the hole? It could spin, or it could be swapped out. Do you do signals for this? Like could you, or I suppose you could. You could. It depends on the implementation. I actually, I don't know. You could just increase the amount of time you're waiting because it takes longer. You could do a signal. Yeah, I think it depends on the implementation. I don't know what people do. I don't know what the kernel does. So then the last one that's actually probably super important for databases as well, are reader-writer latches. And the basic idea here is, just like we want to have under two-phase locking, we have multiple readers, one or two-pole, and a single writer, we want the same thing in our latches. So basically, we need to maintain now two latches, and then a queue of what threads are waiting for to acquire the latch. And then we can make decisions now about who should actually acquire the latch if we have queues on both sides. So say the thread shows up. He wants to relatch. The number of writer threads is zero. The number of reader threads is zero. So this means we're able to acquire the latch. And we just do a compare and swap to increment this counter by one. Next guy shows up, same thing, wants to acquire the relatch. We already have a thread that holds the relatch. So this can be shared, so we allow him to go ahead and acquire it as well. We just update the counter. Now our writing thread shows up, tries to acquire the right latch. Can't do that, because the relatch is currently being held by two threads. So we go ahead and just queue ourselves up as waiting. Now the thread comes along, wants to do a read. We could just let it acquire the read latch right away and let it run. But of course, this might starve the writer. So we'll block this guy, update its counter to be one. Once these guys go away, then this guy can acquire the latch. This is useful for all different parts of the data system. Yes? So with the spin latch, does it make sense to put a sleep in there? Like a nano sleep or something that makes it so I'm not burning cycles and that it could potentially let the OS do other things? Yes, but there's this trade off. If I put some amount of sleep in there, this actually gets to the line at this point. It's hard to get this right. If I put a sleep in there, then yes, I'm releasing the CPU to do other things. But now that means that when the latch is actually available to me, I could potentially be waiting longer before I can go ahead and pick it up and run. But now if I go the other direction, I make my sleep time super small, now I'm burning cycles and the CPU looks busy, but it's not doing anything useful. So I think, again, for project one, the CPU, when you run out a lot for that particular benchmark we gave you guys, the cores are spiked at 100%. Why? Because they're all spinning trying to acquire that latch. Because the Intel one, I don't think, yields. The Intel latch we're using doesn't yield. So now I want to briefly talk about how to do latch crabbing and coupling in a B-plus tree. So in the intro class, we describe what a B-plus tree is, same thing in memory database. We have key values arrays in our nodes, and only the leaf nodes contain the actual pointers to tuples. And then everything else above that are just guide posts. So why do we want to use a B-plus tree? It's the same reason why we want to use a B-W tree or any other order-preserving data structure. Because now we do lookups in log n. So we discussed latch crabbing and coupling last semester, but I just want to go over it in more detail now that we understand how we actually could potentially implement latches. And then we'll talk about a variant of latching from hyperguys for art that is not necessarily specific to a B-plus tree. They actually do use the same technique in art, but I'm going to show it in the context of a B-plus tree. So the idea here is that because we're going to allow concurrent access to our data structure, we have to protect it. And the obvious thing to do is protect it with a single latch for the entire tree. But of course, that would be stupid because now it becomes single threaded. So what we're going to do is now as threads start traversing into the tree, we want to acquire latches as we go down. And we only want to release them, the latches that are above us, when we know that the node we're at is safe. And we're defining safe to mean, based on the operation we're doing, we know there won't be any split or merge below us or at the node that we're at that would cause us to have to make changes up above us in the tree. So once we know that it's safe, we've released any latches that we've acquired on the way down. And then now other threads can come behind us and start doing lookups or modifications as well, because they may be going down a different path in the tree. So for the most basic protocol for search, because we're not modifying the database, we just take latches, one by one, going down. And then once we've reached a node, we can release the latch on our parent node, because we don't need to go back. For insert and delete, again, once we know that the latch we're at is considered safe, we can release all the right latches we've acquired on the way down. So for the search, again, basic operation is I want to do a lookup on key 23 down here. So at this first node, I'll acquire the relatch, then I traverse down to this node here. And at this point here, because I'm doing just a lookup, I'm never going to go back to the root node A, so it's safe for me to go release that latch. And this is why it's called sort of crabbing, because this is how a crab walks with one leg forward and then the next one and back and forth. So then we get down now to here, same thing. We can release the latch on C, because we don't know where to go back to it. And then we go down here to F and do our read on 23. Now we're going to do a delete, start with A. We take a right latch on A, come down here, take a right latch on C. And now at this point here, because we're going to do a delete, we don't know what's below us in the tree. So if we do a delete down here that causes us to have to do a merge, we may end up losing our key here. And now this node would be empty because it's a B plus tree, it has to be at least half full. So there may be a change below us that causes this thing to get merged, which means we'll have to update pointers up here. So therefore, it's not safe for us to release the latch on A at this point. So then we come down to W, or sorry, to G. We're deleting 44. At this point here, we know we're not going to have to do a merge, because if we delete 44, then our node is still half full. So everything above us is safe, and we can go ahead and release those latches. And then now at this point, we can go ahead and delete our key. So the idea is we want to release our latches as soon as possible when we know that we'll never make any changes up above us. So we want to release the latches before we actually do our delete. Releasing latches sooner rather than later allows other threads to make forward progress. So insert 40. Take the right latch on A. Take the right latch on C. Now this case here, we know that if we have to do a split below us, C has an extra space for another key. So we don't have to split this guy. So it's safe for us to release the latch on A. Then we come down here to node G. In this case here, where we want to insert a key, we don't have any more space. We're going to have to split it. So therefore, we have to maintain the right latch on C above it so that we can then update it with our new pointer here. And then when that's done, we then release the latch on C and then release the latch on G. We always release latches from the top to the bottom. So that's the basic latching protocol. Because obviously it's inefficient because all my examples when I did an insert or delete, the very first thing I did was take a right latch on the root and then go to the next child. So that means that every thread is trying to update the index. Actually, any thread also trying to read the index will get blocked anytime I have a writer, because they're always taking the right latch on the root. So a better approach is to be optimistic and assume that when I get to my leaf node, it's not going to have to do a split or merge. And therefore, I can take read latches all the way down. And then right before I get to my leaf node, or at the leaf node, I take a right latch, check to see whether my assumption was incorrect. If it was not, if my assumption that I'm not going to do a split or merge was correct, then I take the right latch to do my change and I'm done. If I am incorrect, then I can come back again and now take the more heavy weight right latches all the way down. So let's look at our two examples, delete 44 and then the insert. So I'm going to delete 44. Again, under an optimistic approach, I take a read latch, take a read latch. Again, at this point here, I don't go back to the root. So it's OK to release the latch on A. Now I get here. And then I can release a latch on C because my assumption that I was not going to have to do a merge when I do my delete was correct. So I can go ahead now and just do this delete. So instead of taking right latches all the way down, I took a read latch. And then you do the same thing for the insert. So is this clear? So there's an even better approach where we can avoid having to block the readers on the writers, or sorry, block the writers on the readers by not having a read latch at all. And the idea here is that I'm going to allow threads to read anything they want as they go down. But then after they complete their visit at a node, you check to see whether it's modified since when you started reading it. If no, then you know no writer came in and modified it, and you just proceed down the tree. If you do see a modification, then you realize somebody else got in before you finished, and then you just restart. So there's no more read latches anymore. There's a right latch to prevent two right latches from modifying the same node at the same time. But anybody that's reading that holds a read latch, that won't block any of those right latches. Because now we have the issue of, well, we could be doing splits and merges, and now since there's no read latch, we don't know whether somebody is reading something when we start deleting things. We do that same epoch garbage collection we did last time in the BW tree so that we know that there could be a thread hanging out in our index when we don't know where they're at within our epoch. So it's not safe to go ahead and clean up the nodes just yet. And then once everyone leaves the epoch, we can go ahead and clean it up. So let's look at an example here. So every now node is going to have this version counter. It's just 16 bits or 32 bits in the header that says this is the version of this node. And any time you modify it, we'll just increase that counter by one. So say we want to do a search on 44. So we start at node A. The very first thing we do, we're going to read what the version number is for this node. So we can keep track that we read node A at version 3. Then we examine the node, do whatever we need to do. In this case here, we're searching for 44. So 44 is greater than 20. So we're going to know we're going to go down this side of the tree here. So we go ahead and do that. Now we're at node B. And what we need to do is, again, read our version here, but then go back and read the version of the node we just came from. We sort of maintain a stack of pointers of the nodes as we traverse down. So we just go back and look where we come from. And then we check to see whether this version has changed since when we started initially when we read it. If not, then we know that nobody modified it. If it was changed, then we have to abort our operation because somebody did something here that we missed that we should have seen and want to go back and start over again. Because somebody could have modified this node here. And the correct pointer to the right side of this key is now some other memory location here. And we followed down the old path. And there may be a garbage below us. So we're going to abort ourselves and restart. Yes? Why don't you go all the way to the leaf? Question, why don't you go all the way to the leaf and do what? Sorry? This question is, why don't you go all the way to the leaf? Essentially, don't do this recheck here. Let's go all the way to the bottom. The pointers will be valid. I think is that true? Yes, the pointers are valid because you can't clean anything up before you switch to epoch. So I guess to the bottom and now I say, all right, well, what did I read all the versions or any of the versions changed? If no, then I'm fine. If yes, then restart. So in that case, it's sort of like in OCC, like we talked about cicada. If there's going to be a conflict and you're going to have to abort, maybe you want to abort sooner rather than later so that you're not doing much of work that's ended up wasted. But in doing this recheck is cheap. It's just reading that memory location to see whether it's changed. It's not like I'm fetching a page and bringing it back in. Yeah? So the person that modified C will also, who increased it from B3 to B5, would go back and increase the version of K to B5? If someone modifies C or A? So someone must have modified, some writer thread must be modified C, that's why he increased the version number. That's why we have a bigger version number, right? Yes, on C. Yeah, so then someone will modify this or again you'll come down and you'll see the same thing. No, so I think you're saying, let's go through the example and we'll come back to the right. But I think you don't have to modify, if you modified this guy, you don't have to modify this guy, unless you did a split and merge that caused this thing to get changed. So now I do my exam on my node, trying to keep what I'm looking for, so now I get down here on C, read V9, check V5, it's still valid, if yes then I'm fine, then do whatever I need to do and I'm done. So let's rewind and have the thread come back B. We just finished checking or reading node C, we traverse down to G, that should be A at A, at A, at C, at G, sorry. So say now another thread, writer thread comes along and he modifies this node here. So he acquires the right latch on it, which implicitly acquires the right latch on this. And then when they complete their modification, they increment the version counter by one. So now when this guy's down here and he rechecks, hey, is node C still set to V5? If no, then he knows that somebody else have come to modify this, and therefore he shouldn't have gone down here and he just restart. So in this case here, what did I do? I did delete, I didn't say. I did some modification here, but it didn't have to get propagated up into the. But after you do recheck, say, at that time, it is V5. Then you go back to V9 and somebody comes and delete. But who cares? Yeah, so his question is, say this guy modifies it, but then by here, when I'm down here, I went and checked in still V5, and I'm done. And then I'm back here, and then can I complete? Yes. It will be like we read before that month. Correct, yes, yes. So one thing about this that is potentially problematic is if you have really big nodes, like if you have a lot of keys in a single node, then these versions are very coarse grained, right? They're saying for the entire node, say I modified something that was this side of the tree, and this pointer is still valid, and everything over here is still fine, then I'm not going to be able to catch that. I'm saying, oh, the whole node changed. Let me just go ahead and restart. And then, so again, it's sort of this trade-off. You could have more fine-grained versioning per key, for element in the node, but the way they did in the hyper that it's had for a single node itself. Yes. I just want to confirm, do you restart from C or restart from C? So actually, do you restart from C? So again, it's like the BWT tree. I think you always restart from the root. Yes. So I'm reading, say, 40, right? And somebody is inserting 40 while I'm reading it. Won't there be a toned read? Somebody is inserting, like, moving 44. So someone's inserting what? Sorry? 40. All right. So 40 will go here. So I have to do a split. Yes? Okay, assume that you are not doing a split. It is a three size. Okay. Like, won't it never give toned read that, like, where some part of the memory is written and the rest is not written? Like, you're directly reading the nodes, right? You're talking like a torn write, you're reading a torn write. Don't read and don't write. So what is being torn, though? Sorry, like, basically, like, I insert 40 here, right? You'll move the, like, these are stored as linked list of elements, right? 38, 40, 44. Yes? So like, you'll be modifying that linked list and somebody is reading that linked list while you are modifying it. As you were saying, because the reader can come down and they don't get blocked on this, on the right latch, could it be that the case, like, in the memory representation of the node, could it be the case that I update the key list, but not the pointer list? Actually, I think the readers do get blocked on the right latch. I take that back. They have to be. Otherwise, you would have this problem. Yeah, but like, the writers don't block on the readers. That's the difference. OK. Yeah. You mentioned, like, there's the issue of, like, you have a very long node, like, you can run into, like, you essentially have this, like, coarse-grained issue. You can essentially motivate you to just make the size of your node be, like, essentially a cache line, such that, like, when you get a cache in validation, your entire node gets shot out or brought back in. Your entire node is essentially being, like, I don't know, atomically, but, like, atomically, it's being brought in and out of the cache. Yeah, so his statement is, I said, if you make the node too long, then you have these coarse-grained locks. But if you make it via cache line size, which is 64 bytes, then you can't atomically update it, but, like, at least it's being moved in and out of memory atomically. That depends on, like, the memory model of the OS, or, sort of, the CPU and what it guarantees in terms of rights, like, the ordering rights. They don't make it more efficient, but I still think you need the latches to protect things. Yeah. Yes? So you said that readers are, like... Well, we're 45 minutes. Hold on, let's take this offline, because, yeah, I just realized, we're 45 minutes, and we haven't got to the tries yet. You have a quick question, or? OK, yeah, sorry. Databases. OK. All right, so let's get to what you guys actually read about. So we'll cover B. If you want, we'll discuss B plus trees a bit more when we just talk about Project 2, when we release that. All right, so in the B plus tree and, like, the B over tree, I call them whole key indexes. The idea, again, you have this entire key being represented at all the various nodes in the tree. You can do prefix or suffix compression, but we can ignore that for now. But in the case of the B plus tree, because the inner nodes only contain guide posts, if I want to know whether a key exists in my table, I always have to go to the bottom, the leaf nodes, and then see whether that key exists. If I see that key in an inner node, it may actually not be in the table, because if I delete a key, I don't necessarily remove it from all the inner nodes. It's only through the split and merge process where it could get pruned out. So now, if now, sort of his comment, if I make my node be the size of a single cache line, then in the worst case scenario, in order for me to figure out does a key exist in my table, I have to pay one cache miss for every single level of the tree, just to see whether something exists. So this is sort of the motivation of the tries. The idea is that instead of storing the whole key at every single node, we could store a digit of the key at a node. And that way, if we can determine potentially more quickly whether the thing we're looking for is not there or not. So tries are really old. I think they're from the 1950s, like some French dude invented it, and then he didn't have a name for it. And then there's this other guy, Edward Fenkin, who supposedly is at CMU. He came up with the name of tries, which is short for retrieval tree. But again, the basic idea is that for all our keys, we're going to break them up into digits, and then store them down one digit at a time. So in the case of this data set here, I have hello, hat, and have. So at the first level, I'll have h, because that's being shared by all the keys. And then I can have a path down just for hello. And then just like in a B plus tree, down below at the bottom will be the pointer to the actual tuple that's represented by this key. So sometimes you'll see these referred to as digital search trees, or prefix trees. They're also radix trees, and particular trees. Those are compressed versions of a try. The sort of original try data structure has the full key represented like this. So tries are actually really interesting, because unlike a B plus tree where it's log n for the search time, on average, in a try, it's OK, where k is the length of the key. But if I have a key of a, b, c, then if I'm not doing any compression to look to see whether a, b, c exists in my try, assuming I'm storing one character at per level, then it's just 3. Where in a B plus tree, I could be mixing with a bunch of other stuff, and it could be longer. The other interesting thing about them is that the data structure is deterministic, meaning if we have the same set of keys, and we can shuffle them in any order, and insert them in any order, we will always end up with the same try data structure, just because the nature of how we're doing the combining overlapping digits. In a B plus tree, it's totally not the case. If I shuffle the keys in random order, I can end up with a completely different data structure from one time to the next. So the keys will never be actually stored in their entirety, and so we have to recreate them by taking the path down, we get to the bottom, and we keep track of every digit we see along the way, so then we can put the key back into its original form. So the language we're going to use to describe the properties of a try, the key concept is called the span. And this is just the number of possible paths we can have coming out of a given node. Essentially, it's the number of digits that could exist. And so what will happen is if a digit exists in one of the keys that we're trying to represent at our node, then we'll have a pointer now to the next node in our try going down. Otherwise, you store null. You need a way to represent that. So now the spans will represent the fan out, which also represents the physical height of the tree. So at n-way try, we say we have a fan out of n. So let's look at actually a real example of how you actually could represent a real try in memory. So let's say that we've had the most simplest try as a one-bit try. So that means that every single node, we're going to represent a one-bit digit, it's either zero or one. So say we want to store the keys 10, 25, and 31. So we would represent them in binary form like this. So in actuality, they'd probably be 32-bit keys. They have 64-bit keys. We'll keep it simple and we'll say it's 16-bit. So we have two bytes representing each key. So our try looks like this. So at the first level here, it's representing what the value is for the first digit position. So in this case here, they're all zero. So in our node, we have for the digit zero, we have a pointer down to the next level, and then for the digit one, we have a null pointer. So now if anybody looks up to say, well, is there a key where the first digit is one, we would look at this first node, see that this thing is null, and know that the key could not exist. Now for the next one here, it's the same one as the one above it, zero has a pointer, one is null. And I'm just going to say repeat 10 times because the next 10 digits are all the same thing. Otherwise, we've run out of space. Now we get to this position here. Now we see there is actually a difference between the keys. This first one here, the digit is zero. The second two, the digit is one. So I have now a pointer over here for the zero path and a pointer over here for the one path. And then going down for this key here, there's actually no, I mean, there's only one key of my corpus has a zero at this position. So this one here only has one pointer down to one level to the next. Over here for the other two keys, they're the same here. But then they split zero one like that. And then we traverse down for both of them. So what's one easy optimization we could do to compress the size of a node? Spoiler, what's one easy optimization we could do to compress the size of a single node here? What's that? Don't store every pointer for the ones that don't have anything. Don't store every pointer for the ones that don't have anything. Well, we don't do that anyway, right? That's null, so we're not storing anything. Only prefix completion, then if they don't have anything. But that's not going to compress the size of a node. How do you compress the size of a single node? I mean, you can take out the zero and one and just like. Exactly, yes. Yeah, so he said take out the zero and one. So I don't need to store zero and then the pointer and then one and then the pointer. I implicitly know that, well, if I want to know is there a pointer for the position zero, digit zero, in my array of pointers, I just say, all right, here's a pointer. And if it's null, then I know that the value of the digit one doesn't exist. So this is horizontal compression. This is compressing the size of a node. What you're referring to is vertical compression, where if I now have, in this case here and over here, once I get down to this path, there's no alternatives. It's always going from one level to the next. There's no branching out. So instead of storing every single level, and these pointers essentially always take it to the bottom, I can then just compress it down to only store nothing. In this case here, this path here has nothing else going down. So I just replace that with now a pointer to the actual tuple. And then for this one here, same thing. At this level here, I just have the tuple pointers going out to the actual tuples themselves. Yes? How do you actually differentiate between a pointer to another node versus a pointer to an actual tuple? This question is, how do you differentiate between a pointer to another node and a pointer to an actual tuple? It's like a bit. Yeah. Yes? I guess you can. Yeah, sorry. Yeah, that would be, yes. That would be, can you do that? You would need a way to record that this thing was repeated 10 times. The ARD tree stores it in the header. What is the key for the link? Yes. Yes. Yes. Right, so there's another way to do vertical compression. Yes? So the one important thing to point out here, though, is that if I do this kind of compression in a B-plus tree, again, I have the whole key in the leaf node. So if I want to see whether that key exists, I'm guaranteed to get an answer from the B-plus tree, because it's either there or not there. In this case here, I could have a key that had 0, 0, 0, and then had a 1 here. I'm going to look and see whether I have a key like that. But I don't know this because I truncated it here. So I now have to follow this pointer and go actually look at the tuple like I did in the T tree to see whether this thing is actually matching me or not. So although I can compress the size like this, reduce the height, I may still have to go look at the tuple to actually get the original key. This also prevents you from doing covering indexes or covering queries where you want to be able to answer a query entirely based on the index. In a B-plus tree, you can do that. If my index is on A and I do a lookup, select A from the table where A equals 1, I can just only have to access the index to answer that query. In this case here, I still have to follow the pointer to get to the original tuple. OK. So this is sort of a crash course of tries because we covered this in the intro class. But now I want to talk about more sophisticated variance implementations of the tries that people are actually using. So Judy arrays came out first and they were developed by HP. The art index was in the paper you guys read from the hyper guys. And the mastery is a variant of a try where it's a try of trees and every node will be a B-plus tree. So by understanding what they do here, you'll see why they have to do it the way they did in mastery. So the Judy array and the art index will be both 256 way tries. So that means the span or the number of digits they're storing per node will be 256. But the goal is now store them in a compressed way so we don't have to allocate all that memory to store 256 digits. So the Judy array, as I said, came out by HP in the late 90s, early 2000s. And this is thought to be the first adaptive Radex tree. And the way it's going to work is that basically we look at the number of digits, unique digits we have for every single node, and then we'll choose a different node layout based on this composition. So the Judy array has a bunch of different variants. There's a specification that describes how it actually all works. It's like 80 or 90 pages. It's very complex. The one we're going to focus on is called Judy L, because that looks like the Radex tree or the art index that you guys read about. They have variants though for bitmaps and then for strings, but we'll just focus on this one here. So there is an open source L implementation that's LGPL. If you go read Hacker News, people freak out because they think nobody wants to use this because it's patented by HP, although it expires in two years. And everyone thinks that if you go use the Judy arrays, HP's going to come and sue you. But if you follow this link here, this will take you to a mailing list post where the authors of the Judy arrays say, HP doesn't care about this. Go ahead and use it any way you want. But as far as I know, although it does solve a lot of problems and can do very distinct representation of large key sets, the best way to knowledge, nobody actually uses them. Whether or not because of the patent, I don't know. So the important thing that's going to happen in the Judy array that's going to be different than the B plus tree and the BW tree is that they're not going to store any metadata about the node in the node itself. I think again, think of like MVCC and the header of the tuple, we would store the timestamps, and the B plus tree, you would store information about what keys I have, what offsets I have. We're not going to do any of that. We're only going to store information about a node in the pointer to that node. And so in a red X tree, you're not going to have any sibling pointers that you have in a B plus tree. Like you can't scan along the leaf nodes. You always have potentially backtrack. So that means that for any given node, we know there's only one pointer to it. So we're not going to worry about keeping that synchronized, a bunch of pointers synchronized if we modify the layout of that node. So what's going to happen is they'll have to store a pointer, a memory pointer, as we normally would, to get to the location of the node. But they're actually stored double the size of a pointer to actually pack in all this additional metadata, like the node type, the number of keys, the population at the node, what the prefix value could be if there's only one child below, and then the 64-bit child pointer. So if I have a 64-bit child pointer, I need another 64 bits to store all this extra metadata. In the original Judy array implementation, they're back on x86 32-bit. So they had 32-bit child pointers and 32-bit metadata. But now in modern systems, it has to be 64-bit pointers. So it takes 128-bit pointers. So they're going to call these Judy pointers. In the paper you guys read that evaluated them against the redx tree, I think they called them fat pointers. It's the same idea. So they're going to have three node types. Again, so it's a 256-way try. So that means that you can store up to 256 digits per node. But the idea is that not every node is going to have exactly 256 digits. So rather than storing 206 pointers, they're going to have a more compressed form to represent these things. So we're going to talk about the linear node and the bitmap node. The uncompressed node will be the same thing that they do in, sorry, take it back. We'll talk about the linear node and the bitmap node. The uncompressed node will be similar to what Hyper does. But Hyper will also have the linear node. It's the bitmap node that they're going to be different. The idea here is say if you're storing strings and you're storing URLs, a lot of URLs are going to start with www dot and then whatever the domain name is. So in that upper node, those digits are always going to be the same. So I can pack them into a linear node. So I can represent a large number of keys below me in a small amount of space at that new level. Again, the idea is going from the top to the bottom. This is when you have a small number of digits, a little bit larger number of digits. And this is when you have more than you can sort this. But at most, 256, because it's a 256-way tribe. So let's look at the linear nodes. Again, the reason why it's called linear node is because you're just going to linear scan to find the key, find the digit you're looking for. So you just have two arrays. So the first array are the sort of digits, and then the second array are the child pointers. And so whatever offset you are, when you scan along to find the digit you're looking for, you keep track of how far you went, and then you can jump to an offset in the child pointers. So in the original implementation of the Judy array, this thing was sized to be a single cache line. Now in the 64-bit architectures, this is going to be two cache lines. So the total size it's now to be is that these things are going to be each one byte, because you can store six of these, and then that'll be total six bytes. And then these are going to be 128 pointers, so 16 bits, because again, the Judy pointers are double the size of a regular pointer. So I'll have six of these, so these will be 96 bytes. So in total, a single node would be 102 bytes, but now we need to be word aligned to our cache lines, and so we'll pad it out and make this 128 bytes. So again, now two cache lines is all it takes to fetch this one thing. So the next node type is the bitmap nodes, and this was a little bit more tricky. So the idea here is that we're going to break up our digits that we're representing in this node into 8-bit chunks. And so think of this as like these parts here are all the offsets that represent the particular digit value for this node. So go down here. At offset zero, that's when you have nothing but zeros, and offset one is when you have a one at this location, two is when you have a one at listed location, and so forth. And I just do this all the way up to 256. So now what will happen is when I want to do a look up here, I have these subarray pointers are actually going to be pointers down down to an array that is my child pointers. So now when I want to do a look up, in this case here, say I want to look up for the digit one. So that's seven zeros followed by a single one. So I know that would be in the first position here. So I would see a one, meaning I know that there is a child below me, and I need to follow the pointer to that. So I would then follow this chunk's subarray pointer down to here, and then now I count the number of ones that preceded me in my position in my chunk map here, and that would tell me at what offset I want to be in this array. So in this case here, for this one, the position at one, there's no other one to the left of it. So I know when I come down here, I'm at position zero. And this one here, to the left of him, there is one one. So when I follow my pointer down, I want to jump over by one. Is that clear? Yes? No. So the question is, even if you're 8 to 15 range, do I need to come before you? No, because the pointer here is only for this chunk. So if I'm looking in here, I follow this pointer down, which is now offsetted by the number of elements. Yes? The question is, I can insert into this one here? So the question is, could I insert here and not mess with these pointers? No, because now I've got to resize this array. OK, so are these arrays as they're shown all? Contiguous in memory. Yeah, so think of this as two continuous regions of memory within the node. So this is saying you take eight bits. That represents the offset 0 to 7. And then after that, you have now a subarray pointer. And the subarray pointer, I think, is like you're just jumping to an offset in the same node. So it's probably going to be 16 bits. And then after that, now you have the next eight bits for the next offset region. And then it has a subarray pointer. So I think what you are getting at is if I now insert into this position here, that screws up the offset everybody else. And I've got to resize this. So inserting is expensive. So one more thing is that this is meant to hold some fixed amount of keys. But in the first one, if you insert four, like in 0 to 7, you have four months. Then you can't insert in this because the size of a subarray pointer is fixed. It will have only three. No, no, no. In my example here, I used three. But I could insert all ones. And then the first eight child pointers would represent the keys here, the digits here. Then it is same as the previous thing. Right, but the idea is that it's not always going to be all ones. If you have a sparse population of the digits on this node. Then why do we need the child pointers at all? If you are going to, you can just count the ones. I can what's what? Just count the number of ones behind you and just go to that. Why do you need the child pointers at all? Yeah, his question is, yeah, that's a good question. The question is, his statement is, if I'm just going to count the bunch of ones that came before me, then why do you bother with these? Because these things are always the same size. Can I just count the ones that come before me and jump to where I need to go? I think that is fixed that you can't have more than that. We're not going to talk to Simbi too much, but because this is a fixed size, I can do Simbi instructions on it very efficiently and count the number of ones that come before me. Don't think of that as a for loop counting along. I can implement a single Simbi instruction. All right, well, where's my ones? How many ones are to the left of me? And then offset, compute that offset very quickly. You could still do that. Yeah, yeah. Yeah, I don't know the answer. Yeah, yeah. OK, yes? Like the resizing issue, why don't you just hash the number of ones, or hash your number, and compute an offset in your thing, and somehow keep track of the two stones separately, until then you can essentially not have to deal with resizing until you're all full? The question is, if you hash your digit, and then you jump, then you're basically treating this like a hash table. And so then every single node's a little hash table. I don't know if the hashing is too expensive. Yeah, I think the hashing would be too expensive. I think we're all done with bit flipping, and jump pretty quickly where you need to go. I think a hash table's an overkill for this. Yeah? I think you can't do Simbi with hash tables. You can or cannot? You cannot do hash. Correct, yes. No, the hash can be Simbi'd. Depends on the hash. Right, depends on the hash. You can, yes, hash table would be an overkill. Although the B-plus tree that they're using in the Radex tree, or sorry, in the Mastery, sort of is the same thing. But they're trying to pack in as much data as you can in a single node. I think a hash table would be an overkill, right? The idea here is that when you think about in a Radex tree, depending on what your keys are based on, the upper level nodes might not change that often. It's the ones below me that could be changing, because things are getting certainly deleted. Again, going back to my URL example, if I have a bunch of URLs that start with www, then the upper nodes in the tree could be packed in tightly like this. I'm almost never going to have to modify them at all. So therefore, the indirection of a hash table would be wasteful. And this could just be some bit manipulation, a bit shifting to jump to where I need to go. But his question, his point, I don't know the answer to, was why didn't bother having this? Well, I can't just count all the ones below you and just jump to where you need to go. I don't know the answer. Yes? This works as the first one, but wouldn't the line-up become an issue? If you're really far down, you have to count all the ones before you, versus if you have that pointer there, then you only have to count that little chunk. Yeah, so his statement is, for efficiency reasons, if I know that I'm looking for a position 248, I can just jump here. I think that's the answer. If I don't look for a position 248, I just jump here, and I don't have to scan everything else that came before me to figure out how many ones there are. But I think there are SIMD operations that can count the number ones very efficiently. So you could do that, but they just didn't do that way. So again, there's three node types. We covered the linear nodes and the bitmap nodes. And in the uncompressed nodes, we'll see the same thing in hyper. You're just storing pointers to everything. So art is a variant of a radix tree that is going to do the same kind of adaptation that Judy does. But it's specifically designed for database systems, meaning it's meant to be an index that points to tuples, whereas the Judy array is meant to be a general purpose array that's the final resting space of data. So it doesn't have pointers to tuples. It is actually the tuples themselves. So again, we're going to do the same thing in Judy array. We're going to store metadata about every node in the pointer. And then there is no way to easily do, actually, at all to do a radix tree or an art index. They're going to do that version latching that I mentioned earlier in today. So the main difference is that the Judy array is going to have three node types of different organizations. Arts can have four node types that are mostly going to vary in the number of elements you can store or number of digits you can store per node. And then as I said, the Judy array is meant to be this general purpose associated array, whereas the art is a table index. So the first node type is going to be the same thing as the linear node in Judy array, just they have two different sizes. And all you have is just a list of sorted digits and then followed by the list of child pointers. And you can either have one that stores four keys or 16 keys. Same thing. So then now for the node 48, instead of having the bitmap node that we saw in Judy array, they're going to actually now store just an array of the keys that have pointers. But the pointers are going to be to offsets to another array of the child pointers. And so in this case, there's pointers so forth. So in this case here, the size of this will be each of these are a one byte pointer to some offset. And then so I have 256 of those. So that's 256 bytes. And then these are going to be up to 48 pointers and those going to be eight bytes because they're not doing the fat pointer thing as you do in Judy and then be 384 bytes. So put this together, this node is total is 640. So the node 48 means that at most I can have 48 child pointers. So a bunch of these are going to be null, but I'll have at least 48 digits in this array. And the last one is the uncompressed one. And again, this is the same thing as in Judy array. This is just one giant array where the position in this array for the child pointer represents, if it's null, then it's not there. The digit's not there. If it's a pointer to a child pointer, if it's a child pointer, then it is there. So the total size of this would be 206 times 8, so 248 bytes. And the idea is that as you're inserting things and modifying the index, the system keeps track of what the capacity is or what the number of elements you have per node. And then if you go above the max size of the node you're looking at, that node 48, and I insert a 49th digit, then I have to take a latch on that node and convert it over to this one here. So I'm going to skip the binary compatible key stuff. Again, we covered this in the intro class. Basically, just saying, if you store things in little indian as you do it in x86, like going from this side to this side, storing it way down, you'll have false comparisons for your values. So they convert everything to be big indian. And they have a recipe book for how to do this for any possible data type. The last quickly thing I want to mention is silo. So in the case of the art in the Judy Array, they would have different size nodes based on the population. And then as I said, once you go above the threshold or above the node can hold, then you had to go switch into the next node type. And they're doing this because they don't support dynamic node sizes. So to support dynamic node sizes, you could use another data structure like a hash table or a B plus tree. So this is what silo does, or sorry, this is what mastery does. So mastery is a tree of tries. No, it's a try of trees, sorry. And every single node, again, instead of having that bit packed node type, they're just going to have a B plus tree. And in that B plus tree, you could either have in the leaf nodes pointers to the next level in the try or actually pointer to a tuple. So again, just like in a try, you don't have to go all the way to the bottom to get a pointer to a tuple if you know that there's no path below that. But inside the B plus tree at every node, the pointers to the tuples are always at the leaf nodes. So the mastery was built for this silo project, which is a very influential system that a lot of other systems are based upon. It's written by the guy that you ever use hot crap to submit papers and conferences. It's the guy that wrote that. Eddie Kohler is insane. He's awesome. So this is a really interesting data structure. I don't know of any other system that actually does this, but it's used a lot in academic evaluations. All right, so now I just want to show back, bring this back to the same graph I showed at the end of last class that we sort of rushed to at the end. But now you have to understand what these art index, the mastery, are actually doing. And then it's just showing you that, again, the BW tree that we built for our system is going to get blown away against the B plus tree and the art index. But you can see here that the art index insertion is very, very fast, because as soon as I insert something and I realize there's nothing below me in my path, I don't have to keep inserting more digits. I can just stop. But you can see that the scan is really bad for it, though, because you can't scan along the leaf nodes as you can in a B plus tree. You have to traverse back up and go back down. So yes, so I fully admit I was wrong about the BW tree. I was wrong about latch tree indexes. I was wrong about never using an OS mutex and always using spin latches up in user space. So I'm wrong. I can admit that. It's OK. And then the Radex tree stuff, I think the reason why I had you guys focus on this because the B plus tree is still sort of the go-to choice when people build new systems. But tries are sort of now becoming more and more in vogue. I think partly due to the mesh tree and the Radex tree. And I know of several systems that are very interested in incorporating it in different facets. Like the data science guys, they work on Cassandra. They're a fork at Cassandra. They want to replace a lot of the internal data structures in their data system with tries or Radex trees. And Radex is showing up in a bunch of other places. So I think this is going forward. I think things like R would be more common. So next class, we'll talk about a sort of more apopry of other things. So we'll talk about system catalogs, data layout, and then storage models. So now that we know how to at least index some tables and run transactions on them, we'll start building up the storage layer of the system and actually start storing data and keep track of what we're actually storing. OK? And then I will try to update the project web page with more information about how to complete what are the different options you could choose for implementation. All right, guys. Enjoy your weekend. See you. Bank it in the side pocket. What is this? Some old pool shit, what? Took a sip and had to spit because I ain't quit that beer called the O.E. Because I'm old, cheap, ice cube. Ah, you look, then it was gone. Grab me a 40 just to get my buzz on. Because I needed just a little more kick.