 So last class we ran out of time talking about B-plus trees, so I spent a little time talking about the things we missed real quickly, some of the optimizations, some of the other design choices we have in the system in our data structure, and then we'll jump over to talk about how to do a multi-threaded B-plus tree, which again, you'll need for the second project. All right, so there's a bunch of other design decisions we have to make, not just like, you know, do I take from the left, take from the right, all these different things, but there's actually like the implementation itself. There's different policies you could have for how we organize data, maybe when do we split, when do we merge, and so forth. So I'm just going to go over some of the quick four main things that we did to think about. But there's this great book here called Modern B-Tree Techniques by Gertz Graffi. This came out, I think, 2010. This basically was the Bible of B-plus trees, everything you'd ever want to know. You just Google the name for PDF, it'll show up on Google, or the senior library has it. So some of this stuff we'll talk about here comes from this book here. All right, so the one thing we have to consider is like, what's our no size going to be? So I've already said, when we talked about buffer pools, we said that in the enterprise systems you can actually specify what do you want the page size to be for different components in the system. You could have for certain tables, certain page size, and then even within the data structure themselves, the hash tables or the B-plus tree indexes. So the way we think about this in a B-plus tree is that the conventional wisdom is that or the research shows that the slower your storage device is, the larger you're going to want your page size to be for your B-plus tree. This is because we want to maximize the amount of sequential access or sequential IO that we're doing. So if you're a really slow spinning disk hard drive, we want something like in a one megabyte size for a page, right? For one IO or one disk seek, we go fetch in one megabyte data, there'll be enough, it takes so long that we want to bring as many keys as we can when we do this. For modern SSDs, it's roughly around 10 kilobytes, and then if we're in memory, now you're down to like 512 bytes because you want to try to, you keep everything aligned to your cache sizes, but you don't want to blow out your L3 cache bringing in these large nodes. And of course, the optimal size you're going to want for your pages, for your nodes, it depends on the workload. If you're doing a lot of point lookups, just going from the root to the leaf, going fetching one thing, then maybe you want a smaller page size or smaller node size, but if you do a lot of range scans along the leaf nodes, again, you want to maximize sequential IO so you want to use larger node sizes, right? Yes? This question is, is one single node a page on the disk? I mean, the disk page, no, it's a database page, right? We talked about this, the hardware page is 4 kilobytes. The database can have smaller or larger page sizes, anything at a once. Yeah, again, so as I said, in some systems, you can have different page sizes for different components of the database. So at a high level, yes, when you say one database page, but I don't want you to think that throughout the database system, every component has to have the same page size. But the node size would equal an index page size managed by the database system. All right, so for bus top, I think we default to 4 kilobyte pages just to keep things simple. All right, the next thing we've got to deal with is the merge threshold. So again, the textbook definition, the B plus tree says that when a node is less than half full, you have to merge it because you don't want to violate that rule. But in maybe some cases where you actually want to delay the merge operation, let the node temporarily violate this rule because you're going to assume that it's going to get filled up pretty soon and you don't want to pay the penalty of merging it and then gets filled again and you get a split again, right? So the way to think about this is like you just say, all right, well, if it's one less than half or depending on what my height of the tree is, I'll let it go for a little bit, right? Obviously, if it's empty, then that's stupid, you got to merge it. But in maybe some cases where you want to just let it slide. I mean, what the right policy is depends on the hardware, depends on what the access pattern is for threads. So I don't want to give like some numbers and this is what you should always do. But these are some of the things that you can think about in your own implementation. Next thing we've got to deal with is how we do very length keys. So we saw this when we talked about very length tuples and very length keys or objects in our hash table and for the hash table for simplicity we said, okay, well, we're never going to have very long keys to make our lives easier. But obviously in a real system, this may not always be the case. So in general, those four approaches, we're not going to go into detail, I just want to bring them up real quickly. So the easiest way to handle this is never store the actual value of a key. I'd be very clear because there's the value of the key, then there's the value of the key value parent, right? So like the actual bytes of the key, you never store that in the B plus tree node. Instead, you store a pointer, like a record ID, that then points to where the actual tuple is that has that real data, right? And so in a disk system, this is bad because as I'm traversing, every time I got to do a comparison, like is my key less than whatever my guidepost keys are, I got to do a fetch in the page table to go get the actual tuple, then do a lookup to find the actual value that I want. So you never want to do this in a disk system. Where this shows up is actually in memory systems. So there's a variation of the B tree called the T tree where they don't store the actual key bytes themselves in the nodes, they store the pointer. And you would do this in a very memory constraint of wire mix. Now you're not duplicating the key all throughout your tree. But for our purposes here, we don't want to do that. You can have variable length nodes. This makes things hard because now every time you could have fragmentation on disk, fragmentation of memory, you got to deal with how to plug holes as you reclaim nodes. Another easy technique is new padding. So if I say that I define my key, or I build an index on a column that's of varchar32, well, I'm always going to store 32 characters no matter what the size of the key is. So that way I know exactly the size of the key no matter what. This obviously is wasted space. This is actually what MySQL gives you by default in some cases for the char type. So even though you say I can char in 256, if you store one character, it's still going to store 206 characters. The most common alternative is this key map in direction. Within the page itself, you would have an internal data structure, like a link less and say, it's almost like the dictionary compression encoding stuff we talked about before. My key is actually represented by the integer. That's fixed length. And then I have this indirectional layer that I can manage within the node itself. If I need overflow, then I'll overflow. For project two, we'll assume that all the keys are fixed length integers. It makes things a lot easier. All right. The next nice thing we have to deal with is how we actually do a lookup within the node as we're traversing to see whether a key actually exists. All right. So the easiest approach is just do a linear search across the key array within the node. All right. So say I'm looking to try to find key eight. I landed this node here. I don't care what the root, a leaf, or inner node. The search is all the same. So I'm just going to rip through, scan across linearly one after another to find the thing that I'm looking for. And this will work regardless whether it's sorted or unsorted. So this seems kind of slow. If you have really large node sizes with a lot of keys. So one way to speed this up in a modern system is to vectorize it using SIMD. I don't know who has taken four, 18, 6, 18 yet. Does everyone know what SIMD is or no? I know you know. All right. So SIMD stands for single instruction, multiple data. The basic idea is instead of having an instruction in the CPU, it says one plus one equals and then some number. You take one single data item, another single data item, like add them together. There's actually vectorized instructions. You can take a collection or an array of items, put them at a special register, and then do a single instruction to then do whatever the operation you want across all the values within that register. So it would look like this. Say I want to take the first four keys, assuming there are 32 bits, 32 integers, and then there's this Intel instruction here. SSE, 128-bit registers. It's always going to be like all the SIMD stuff is always going to be the esoteric instruction names like this. There's there's there's libraries that'll hide this to you. I'm just showing you the raw one that I know. Right. So I'm going to load a register with the value eight because that's what I'm looking for. Eight copies or four copies of that because each one is 32 bits, 128-bit register. And then within a single instruction, I'll take the these these four eights here and compare it with the four values here. And then a single instruction, I'll see whether I have a match. This SIMD instruction produces back a bitmap. There's a bit set to one if it's actually been a match or not. In this case here, I won't find it. And then I come to the next one, ignoring that it's three instead of four. We put an empty value in. But then another single instruction to do that comparison. Right. So even though linear search, if I don't do the vectorized way, it could be slow. There's ways to speed this up in minor CPUs. Right. And so sometimes the the linear approach is actually a good idea. All right. Another obvious one to do is binary search. Again, if you're maintaining the sort order, then you just sort of jump around at the halfway point and try to find what you're looking for. So I jump the halfway point here. I'm looking for eight. Seven is less than eight. So I know I need to go to the right. Say if they care, look in the middle. Nine. I need to go left. And then I find eight. Right. The last one is less common. But it is probably the fastest approach, if you can pull this off. It's to do interpolation. And this is where if you know something about the distribution of the values of the keys, or the key, or the value, the distribution of the keys, and that they're sorted and no gaps, you can just do a simple arithmetic to say, I know what offset I should at least start at. In this case here, for instance, it's four to 10. I just do that simple math and I find the halfway point where my key should exist. Yes. This question is sort of the things you study in this class. Yeah, so his statement is, when could you actually use this? It only works if they're integers, right? This won't work with their flow points. It only works if they're sorted and there's no gaps. So where you could use this would be like an auto increment key or sequence of serial key, where you're just adding one to the primary key over and over again. So this is an optimization you can do as far as I know, nothing outside academic systems actually implement this. But it'll smoke everything if you can do it. Yes. Your question is, will we ever store the B plus tree on disk? Yes. This whole course is about storing on disk. Yes. Well, you have to bring in a memory to do anything with it. Again, classic binary architecture. I can't do any manipulation or any traversal or accessing data unless I bring it into memory. So all this is assuming that after you bring in a memory, you can do all this. But I will have to write out the disk and that's the buffer pool the disk manager does for me. This question is, if I build an index, do I store the index on disk? In this class, yes. Yeah. Yes. This question is, for the binary approach, can you also use SIMD? So SIMD is, again, I don't want to get into too much. If you take the advanced class, we'll discuss this more. With SIMD, it's really good for sequential access of things. Jumping around a random location is not really what SIMD is good for. So you couldn't really use vectorization for make binary search because it is like a bunch of jumps to the array. Because the assumption that you have all the values with no gaps doesn't always play out in practice. Yes. How's this computed? I know I'm looking for eight. I take the low value and I have the high value here minus minus the low value. And then I think the times seven is the midpoint. So I know I should jump to four offsets. So one, two, yeah, one, two, three, four. One over, that's what I want. Yes, it's a number, yes. Yes. Yes, this question is, do we need extra metadata so that this assumption holds? They know we could do this? Yes. Binary search is probably most common followed by, at least in the newer system, vectorization. The, I don't think Postgres or I think MySQL, I don't think any of the open source systems, so the older ones do vectorize linear search. All right. So there's a bunch of optimizations I want to go through, but we just don't have time. So I want to sort of cover some quick ones. We'll cover pointage, wavelength, and bulk insert. A lot of the duplication prefix, compression stuff, that looks like a lot of the compression stuff we talked about before for column stores, right? So I want to spend time specifically talking about the ones that are specific to B-plus trees. For buffering updates, this is sort of the log structure stuff we talked about before, where maybe instead of, it's called a B-epsilon tree or fractal tree, basically instead of me, every time I do a modification to a tree, instead of always applying that change right away, I'll just put a log entry somewhere in the tree, and at some later point, when I accumulate enough of them, I'll compact them and apply the change, right? No system, let me hear whether it's true or not. There was only one system that was doing the fractal tree, actually because they owned the patent on it. It was called Tokutak. They got bought by Berkona, and I think they have a fractal tree index that does the buffer updates. They have an engine of that for MySQL. It might have ended life. I don't think anybody actually uses it though. All right, prefix compression we talked about before. Deduplication, again, same idea. I have the same key over and over again. I can store it once. The suffix truncation, the idea here is that we can recognize that in some cases, we don't actually need the entire key in the guideposts, in the internodes. So instead, we just truncate it to be the minimum prefix we need to be able to discriminate whether we need to go left or right. That actually might be a better case. Again, this is to reduce space. Of course, obviously if now I have two strings to start with ABC, then I got to reshuffle this or extend it out. The point to swizzling one is very common, and this one is important. So the pointers within the nodes, like the root node and the internodes, the pointers are just page IDs. Because again, everything is organized around these pages in memory, and we have these logical page IDs that I then have to go to your buffer pool, go to your page table, deal a lookup to get that actual page in some frame in the buffer pool, then I can actually access it. But that means that as I'm doing these traversals in my data structure, I come across these pointers, and I don't have a real pointer in memory, I have a page ID. Then I got to go down to the buffer pool, say, hey, I need page two, it then maps that to a frame, and then I can access on it. So again, as I'm traversing along, even with the sibling nodes, the same thing. So with pointer swizzling, the idea is that if we pin the page in memory, meaning it won't move to a different frame, it's not going to get swapped out to disk, then in my B plus tree, instead of storing the logical page ID, which I then have to do a lookup every time, I can actually just replace it with the real pointer. So that avoids that lookup, so that makes the traversal to the data structure much faster. It's called swizzling, for whatever reason, if you're converting this page ID into the actual memory pointer. So the statement is, isn't it when I do my lookup from a page ID to the buffer pool, isn't that a hash table lookup, which should still be fast? It's not going to be as fast as just jumping to the location of memory. Yes, the statement is that's where the constant matters. Yes. And also too, you have to make your page table thread safe. So now I got to take a latch to go into the page table to get the actual memory address. For today's lecture, we're talking about how we take latches in the B plus tree. So now without pointer swizzling, I'm taking latches in the B plus tree nodes as I'm going down, but then as I do lookups to get the actual memory address of the page, I got to take latches in that page table, then I can traverse down. It's way faster just to avoid having to talk to the buffer pool at all. Let's come back to that. Any more questions about party swizzling stuff? Yes. So you have pages that you're doing sequential access, but it spans multiple pages. Are you ever interested in reordering or compacting the pages that are present in like your frame in a way that is at odds that you're trying to move around pages inside your buffer pool? So this question is, could it be possible that the buffer pool would ever want to reorganize pages? You can't compact pages because they're pages on disk, I mean reorganize, yeah, yeah. Is there any case that the buffer pool manager would want to reorganize pages and therefore this, if you do this, this won't work? I actually don't know whether they actually do that. I think Postgres leaves it the way it is. I think for MySQL, I don't know what the commercial guys do, right? The, yeah, I don't know. I don't actually know that you could do it. I mean, at the end of the day, like the, that's like a low level hardware optimization where like maybe the memory prefetcher could be, could help you on the hardware, the latches are sort of a layer above that and that's, that's the high pull of the tent, the thing you want to minimize. So not going to the buffer pool manager to get, get these page lookups, that's a bigger win than possibly reorganize maybe, but I might be wrong. Yes, quick, yes. So this question is, how does this work if, if this page is just swapped out? It has to, you have to pin it. This doesn't work if you don't pin it. Eventually when it does get, if David is, if eventually gets unpinned, how would this actually work? Somebody needs to go figure out, okay, I, I've swizzled these things and then flip a bit or flip something to say it's no longer a swizzle and use the page IDs. So you have, there's some bookkeeping you have to do to know you've done this. All right. So his question is more, is going back to the, going back to the, the suffix truncation. Yes. Yeah. So basically, right? Yes. Is it actually something that's done in practices that seems like odd? So his question is, in my example here, I'm showing, you know, string, string comparison is slow because you have to go by, you know, bite by bite, although you can't vectorize that in some cases, but the, is it better to use an integer representation and then do comparison on those? If it's dictionary compressed, yes, but most times in most systems it's not, right? Think of like, think of these B plus trees again, it's not, we're not going to use, we don't care about performance for B plus trees for analytics because those are doing long scans, that's really slow. This is really for like transactional things, operational workloads. So I want to be in and out quickly as possible of the nodes. So having to decompress things, that adds additional overhead. Okay. So pointers to it is, is, is very common. I'll say, let's say about this one too, is that the, you still have to maintain in the page, like you basically, you allocate a little extra space in the page, in the header that says, okay, like, all right, sorry, in the, in the pointers to the children nodes within this node, you, you leave a little extra space and say, okay, here's the swizzled pointer, right? And then the, you just have to know that if I then write this out back to disk, the page at the disk, when I fetch it back in, I make sure I invalidate any of the swizzled addresses because otherwise it can now be pointing to a location of memory that doesn't exist anymore, right? But you would store this, this, you would store these swizzle pointers within the page because you already, you know, you're already accessing the page. All right. The last optimization is with the block insert. And basically if you, if you know you need to, you know, build a B plus tree and you have all the keys ahead of time, instead of just going, you know, iterating one by one on each key and trying to build the, the B plus tree organically, it's actually faster to sort of the keys ahead of time, which I'll talk to you how to teach you how to do that next class. And then you just along the leaf nodes, you just put them in sorted order. And then you build the, the index from the, from the bottom to the top. You build all the sort of scaffolding above it. So my example here, I'm leaving, you know, one extra space within the leaf nodes. Some systems were actually compacted to be 100% full immediately. Right. Just to, to minimize space. Pretty simple optimization, right? And I think when you call, reoptimize or optimize on an index, this is essentially what, what they do as well. They'll, they'll scan out all the, the keys. Well, it's already sorted, but you basically just get all the keys and the leaf nodes and then compact it and then build it from the bottom to the top. Okay. Right. Again, as I said, different systems have different types of optimizations. But the, you know, the, the swizzing one is probably the most common one. And then followed by the sorting. All right. So, all right. So let's talk about how you make these, these B plus three threats safe. Or actually any, any data structure that's safe, but I'll, so most of the time we talk about B plus trees. Again, project one is due this upcoming Sunday. And again, we have the special office hours on Saturday. All right. So, so far, for the most part, we've assumed that all our data structures are a single threaded. But obviously in a modern system where we'll have a lot of cores, CPU cores, we want to be able to have multiple threads access our data structure safely at the same time. And we're going to do this essentially to hide disk stalls, disk stalls. So if one thread is running, one thread is running a query, and then it needs something that's not in memory and not in the buffer pool, it's got to go out and just get it. We want to have another thread continue and make forward progress. Of course, that means that, you know, if everything is in memory, they could be touching the same data structure at the same time. And we don't make sure that everything works out okay. So the things I'll talk about today is pretty much how every system works. There are some exceptions. Volt-DB and H-Store, H-Store system I built that came Volt-DB. Redis is probably one you're probably more familiar with. These are all single threaded engines. Meaning they assume that no other thread is running at the same time. So they're going to avoid all the crap we're talking about today. They don't do any of this. In the case of Redis, it's one thread per process. So if you want to have a multi-threaded version, you basically have multiple processes. In the case of Volt-DB, it's a single process, but then the threads are partitioned per core. So they know that you don't have a query touching two cores at the same time. We'll cover this later in the semester. But it doesn't say that this latching stuff is pretty much how everybody does it. There are some systems that are the exception. So the way we're going to make this all work is what is called a concurrency protocol. And the idea is that this is like the traffic cop in the database system that it's going to be responsible for making sure that the different threads use our data structure in the proper way and to make sure that the operations they wanted to do at the same time don't corrupt anything and don't cause any problems. So there's going to be two correctness criteria that we have to worry about in our data structure. And the first one is going to be the logical correctness, meaning can our threads see the things in the data structure that it's supposed to see? Like if I insert key foo, and then I come back and try to read key foo, should I be able to see it? Yes or no, right? The other thing with the word about its physical correctness, where it's the integrity of the data structure, the internal representation is actually correct, meaning if we follow a pointer to something that the data structure says is a valid pointer, we don't land in no man's land in our dress space and crash, or we don't land to a page that doesn't exist anymore or see incorrect data, right? So this is about protecting the internal representation of the physical data structure, like the pointers and so forth. So the one thing we're going to care about for this class is this one here. There's a higher level concept of correctness, this logical concept like if I insert a key, can I read it back? That's concrete control for data. We'll cover that after the midterm. This is really about making sure that we don't crash because we follow a pointer to nowhere. So we're going to talk about how you actually implement the latches in our database system. Then we'll start with a simple example of doing hash table latching without linear probe hashing. And then we'll spend most of our time talking about how to do B plus trees and then leave no scans that are thread safe as well. Hash tables are easy, B plus trees are hard. So that's what we'll sort of build up to it. So I mentioned in the beginning of the semester there was this distinction between locks and latches. I said if you're coming from a more systems background, a non-database systems background, what we refer to as latches, they were referred to as locks. But in a database world locks are this higher level logical protection concept or primitive that's going to protect threads or queries from transactions from other queries and transactions running at the same time. So I could take a lock on a single record and I would hold that for the duration of my query. And then the database system will have these additional mechanisms to make sure there's no dead locks and know how to roll back changes if you need to abort a transaction or abort a query. For this class, again, we're talking about latches. These are the low level primitives that are going to protect the physical internals of the critical section of our data structure from other threads trying to do the same time. And we're only going to hold these latches for very brief periods, right? Get into a node or get into a page, do one thing and then pop out and release the latch. And then we don't need to worry about rollback any changes because if we can't acquire the latch for the thing we need to do, then we shouldn't do it. And we can abort the whole operation and start over again. And again, this will make more sense as we go along. So there's this great table from actually that B plus G book I talked about before where they sort of lay out the distinction in more detail about locks and latches. Again, for this class, we're focusing on this. But the latches are going to be used to separate threads from one another for in-memory data structures. Again, assuming things that could be disk resident, but I could bring it into memory. And it'd be for the critical sections. And I can only do it have sort of two modes, my latches, read and write. And the way I'm going to avoid deadlocks is just through coding discipline to make sure I don't write stupid code or stupid programs that can have deadlocks. And we'll see how to handle that. And where the latches themselves will be maintained within the data structure. Again, we'll cover all of this, the locking stuff in lecture 15. Just getting finger latches as like the standard mutex you used for the first project. So in latches, there's only two modes, read mode and write mode. So with read mode, sometimes called shared mode, you can have two threads hold or multiple threads hold the same latch in read mode simultaneously. As long as they're not making any changes, everything's fine. If you need to make changes to the data structure, like update something, delete something, whatever. Again, not the entire data structure. You could have a latch entire data structure, but some critical section or critical piece of the internal of the data structure. If you need to make changes to it, you have to take the latch into write mode. And only one thread and only one thread could ever have the latch in write mode at any given time. So it's the simple compatibility matrix looks like this. So say you have two threads, if the first thread has it in read mode, the other guy wants to get in read mode, those things are commutative. You can do those together. But if anyone thread has the latch in write mode, no matter what other mode the other thread wants, you can't share the latch at the same time. So I want to quickly talk about how you actually implement this just so you sort of know what's going on underneath the covers. And again, I think OS class will cover all these things. The simplest thing to do is blocking OS mutex. This is what you get if you use standard mutex. You just have something like this. It only has one mode, lock, exclusive mode. So I can quiet the lock here, do something special in my data structure, and then just unlock it. All right. Nothing fancy there. If you use standard mutex, what do you actually get? Does anybody know on Linux? Yes. Well, we'll get there. P thread mutex, which is just a few texts. What are the few texts? Yeah. So he said it's a few texts that are fast user mutex. The way to think about it is there's a user space spin lock that you acquire without having to go down to the kernel. If you can acquire it, great, you're done. If you can't acquire it, then you fall back to a heavyweight mutex in the OS. And again, from a database perspective, that's bad because now the OS is deciding when your thread is going to get scheduled. And that's a sys call. That's expensive. And then now we can't use that thread for anything else. So in general, for this class, it's okay to use the standard mutex, but in real systems, you would not want to use this. So basically it looks like this. So I would have my few texts. I have a user space latch and then the OS latch. So I have two threads come along. They both want to acquire the latch at the same time. It only has one mode. In this case here, exclusive. First guy gets it so he can do whatever he wants. Second guy then falls back to the OS latch. And now the OS scheduler will de-schedule it and will not get woke it up until the first guy releases the latch. The other type of latch we could have is a reader writer latch. And again, for our purposes in this class, we're saying use the standard library shared mutex, which is just a p-thread read write lock. But the idea here is that we can now put a latch into two different modes. The reader, the write mode. So basically it looks like this. There's a single logical concept of a latch. And they keep eternal counters for how many threads hold the latch and how many threads are waiting for the latch. There's actually a priority queue as well to keep track of if you need to know what threads will wake up. But we can ignore that for now. So first that comes along, wants to get the latch in read mode. It says nobody holds it in write mode. Nobody holds it in read mode. We can go ahead and acquire that. It gets the latch and we just increment this counter by one. Next that comes along also wants to get it in read mode. Since it's commutative, we can get it at the same time. We're good there. This guy wants to come along. He wants to get it in write mode. And because it's already in read mode, he has to block and he waits. And then depending on the fairness algorithm that's implemented in your latch implementation, the next guy comes along. He also wants to get read mode. So even though that this thing is commutative with the read mode, and therefore he could acquire it right now, in terms of fairness, it's going to block this one because it says it knows that somebody else is waiting for the right latch and it wants to make sure that this guy can get it when they finish up. So the main thing that I want to stress here is that the amount of metadata we have to store for this latch is much larger than the few-text one. I could actually, instead of using the p-thread mutex, I could just do my own spin latch in user space and therefore I just need to store a single byte or 32 bits. Because the latches are stored in the pages with the data structure itself. And this thing here, you have queues, you have counters. This is a lot more expensive. But because we can now have a shared mode, we can potentially get more comparison because of this extra metadata we're storing. Again, there's all the other types of latches. If you read the Linux mailing list, Linus is like, database people would get it wrong, you should never use, they should not roll their own latches. I disagree with that in some cases. But for our purpose in this class, the standard type of library will suffice. But just know a lot of the real systems won't actually do this, they'll roll their own. All right, so let's see how we can actually start using them, all right, to protect our data structures. So the first thing we're going to do is hash table lashing. So in this case here, we're going to start with a linear probe hashing because it's the most simplistic hash table and we don't have to worry about the slot array and all the extra stuff we need to do in the extendable hashing and linear hashing. And so the other thing that's interesting about this linear probe hashing table is because the threads are always going to traverse the data structure in the same direction, we never have to worry about deadlocks. Right, because there's always hash to a single location and then scan down to I find what I'm looking for. You never have two threads going in reverse direction, which would cause a deadlock, right? So for resizing the table, we can ignore that, but basically think that you have a global right latch that sort of gates anyone coming into the data structure. So if you set that, then nobody else can actually then jump into the data structure and see other things, right? It's not ideal because it's a single latch to a data structure, but if you have to resize, then this is the easiest way to do this. All right, so there's two ways, two types of latches you can hold. And this is again a classic trade-off between computer science and like storage versus compute. So I could have page latches where within each page, I have a single latch and I could do it in reader-writer mode as well, but that's going to protect the entire contents of the page, no matter how many slots or how many entries I have in there. Or I could have a single latch per slot itself, again with either read-write mode or a single mode, right? And on one hand with the page latch, it's less metadata storage overhead to maintain individual latches per slot, but again, it reduces the amount of parallelism I could achieve, right? So let's see, let's see for example here. So we just have three pages, two slots each per page. Thread 1 wants to come along, he wants to find D. So do we have a hash up? We land in this page here. Again, we're doing page latches, so there's a single page latch that protects the whole thing. I'll put it in read mode and then go ahead and start scanning through to find what I'm looking for. Thread 2 wants to come along, he wants to insert E. So he hashes to this page here, wants to acquire the latch on the page, but because we're doing insert, we know we have to do, put it into write mode. So write mode is not compatible with read mode, so he has to stall and he waits, right? Thread 1 scans through, tries to find, tries, he's looking for D, it's not in this slot here, so now he wants to jump down to this page here. So at this point, Thread 1 can actually release the latch on the first page because it's linear probe hashing, these pages are sort of represented in logical order, right, just think of like it's page 0, page 1, page 2. We know that there's not going to be a page magically slipped in here where it could end up, someone could put the thing that it's looking for here, right? So we know at this point here, once we've reached the end, we just finished page 1, now we need to look at page 2, so at this point it's safe for us to release the latch, right, and then now it can do the scan here, while Thread 2 scans and tries to find the first free slot where it can write. Yes, so if it was David, it's not the position we'll get to. He correctly points out that at this point here, Thread 2 is just really just trying to find the first free slot it could write into, so it could take a read latch instead of the write latch, and then scan through, recognize that there's no free space here, then jump down to the next one, and then take a relatch again, scan through, find this, then come back and change it to write latch to avoid having to take relatches all the way down. Yes, we'll see that in B plus trees, yes, you're jumping way ahead, that is the optimization, yes. All right, the main thing I want to point out here is that at this point here, once we recognize that we're jumping down here, we don't need to maintain the latch for the page we just came from, and that won't always be the case in B plus trees. All right, so this guy goes through, he gets the right latch, then he can go ahead and insert the entry that he wants, and we're done. So again, this is with page latches, let's see how to do the same thing with slot latches. So this guy starts off, he gets the read latch on the first slot here, he starts looking for the thing he wants, this guy gets the write latch on this slot here, because he's trying to do an update, so he's fine, he gets that, then this guy has to stall because he's scanning through in linear probing, he can't get the read latch on this, he has to wait, this next guy scans down, same thing, he gets the right latch on that, he releases the latch on the previous one, this guy can now get that, he jumps down here, same thing stalls and waits until this guy finishes up, he does the write, and then this guy gets the read latch and does the read. Pretty straightforward. All right, let's make it harder, this is about B plus trees. So same things before, we want to have multiple threads go through our data structure and read and update key entries at the same time. So in this case here, there's two problems we're going to need to prevent. The first is going to be, we have threads trying to write to the same node at the same time. That's easy to deal with, in theory. The more challenging one is going to be where we have one thread traversing the index, and then another thread is down below, farther down into the tree, and they start making structural changes because they have to split or merge, and we need to make sure that the thread that's coming up behind it doesn't, again, fall on point that goes to nowhere. So let's see a more complex example. So I've labeled the nodes that we're mostly going to be talking about over here, A through I. My first thread comes along, it almost says leap 44 down here in the leaf node. So without any taking any latches, he's just going to traverse down, do the comparison within the keys to decide whether he wants to go left or right, reaches the bottom here and deletes it, and then now we're empty, so we have to do a merge. To rebalance the tree. But at this point, for whatever reason, it gets descheduled, the OS or whatever says, hey, you're not running anymore, so the thread stalls and it sleeps. Meantime, thread two comes along, they want to find 41 down here in the leaf node. So same thing, they traverse down, they get down here, they get to this node here, they do the comparison for the keys that are in node D. 41 is less than 44, so we know we need to go down this path here, but then it gets stolen, it gets put asleep. The first thread wakes up, he does the reshuffling, he goes away, now the second thread wakes up, follows his pointer down to the bottom and they see nothing. This is the best case scenario because you get a false negative. You say the key that should be there, the data searcher says it doesn't exist. I mean it's all bad, it's bad to get bad results, but it's a philosophical question. The alternative is the crash, because I could have swapped out this page here, so now this pointer takes me to nowhere, and then the system segs false and crashes. Whether or not you incorrect results versus a crashing is better, it's up to you. So we need a way to handle this. And so the technique we're going to use is called latch-crabbing or latch-coupling. I think textbook and Wikipedia refer to it as latch-coupling, but it essentially means the same thing. So this is the protocol we're going to use as thread traverse our index, so let's allow them to access and modify the Bblistry at the same time to ensure that we don't have any of those physical correctness issues in our data structure. So the basic idea is that every time we want to traverse the node, we get started at the root, we get a latch on the parent, then we get a latch for the child below it where we want to go, and then once we recognize that the parent is safe from any structural changes, it's safe for us to release the latch on the parent. And we just keep doing this recursively, going down the data structure until we reach the leaf node, and we have everything we need. And my definition is safe is that if we're doing an insert, we know that it's not full, meaning if I have to do a split below it, I'll have a space to absorb the new key that gets pushed up, and then if it's on delete, then we know that we won't have to do a merge if we remove an entry in that parent. Yes, this question is, and we'll see the example, this question is, if we reach a lower point in the tree and two level ops, we know we're safe, can we release the latches on those? Yes. It's called latch-crafting, so it's to be the way a crab walks. It's sort of like that. Yes. The question is, and he's correct, that I said if you don't want to do a merge, then don't do a leashing. If I have to move somebody over, then yes, I would have to do that as well. Yes. But you would only do that at the leaf nodes, not the inner nodes. All right, so let me go through examples here. This is just more formally describing everything I said. All right, so save on a 538. Again, start at the root. We're going to take the node A in read mode. We tend to take a node B in read mode. And at this point here, because we're doing a read-only operation, it's safe for us to go ahead and release the latch on A. So we go ahead and do that. Then we get the latch on D to reverse down. Same thing, because it's a lookup. It's a fine operation. We can release the latch on B. Then we get to the leaf node here. Same thing. We get the H, release the node on D. We find the answer we're looking for, and we're done. Right? So you're just always releasing the parent behind you as you go down for searches. All right, let's see a delete. So take the root node in Exclusive. So the root node in Exclusive, a write latch mode. Then we come down to B. And at this point here, since we know that we may need to call us B, but depending on what happens below us, we can't release the latch on A at this point. Because if we end up deleting the guidepost key in B, we would end up propagating that restructuring up to the root node. So we have to maintain the latch on A. But when we get down to D, now we see, okay, well, we have two keys here. So if I have to delete anything below D, then I can absorb that deletion in D. So therefore, it's safe for me to delete, to remove the latches on A and B at this point. Right? So what order should we release the latches? He said, does it matter? Is that a question or a statement? He says, it's from the top to avoid deadlocks. How can it be deadlocks? Everyone's going down to the bottom. Nobody's going this way. It's the only one direction, so no deadlocks. Yes, so he's correct. So he says, it's more efficient if you release them from the top to the bottom because in this case here, everyone's coming to the root, taking the right latch. So as soon as I can release that, then that allows somebody else to come in and potentially go down this side of the tree. Right? So I want to go from the top to the bottom. Basically, first in, first out. Yes. His statement is, in parallel, can you release it? How would you do that? What is the cost of spawning a thread? What is the cost of sending a message to a thread pool? A lot more expensive than flipping a bit to turn a latch off. Yeah. We want to avoid thread communication. Yeah, I can't think of any... No, yeah, sorry. In theory, if you align the latches in a sequential array, you can then do vectorized operation to flip their bits. But nobody, I don't think Emma does that. Yeah. If you don't store the latches in the actual tree themselves, you store it somewhere else in memory. But nobody does that, as far as I know. Yes. This question is, at this point here, my thread is pointing at D. How do I know that D's not going to have to do a merge? Because if I... Because I can only do one operational one key at a time. So I'm only deleting 38, right? So at this point here, I don't know what's below me, but I know that in H or D right now, that if I delete one of these keys, I'm not going to have to merge. Right? So any change below me will be isolated to just at D and below, not above D. So that's why it's safe to release the right latches above it. All right, so we get here. At this point here, we recognize in H that the... Again, we don't want them to merge. So we can release the latch on D and we're done. Right? So let's look at a more complicated example. Let's insert 45. So same thing. Right latch at A, at this point here at B, I recognize that if I have to split below, I could put a new key into B. So I can release the latch on A to get down on D. Now in this case here, I don't know what's going to happen below me. As I get down here, I may have to propagate the change up here. So when I'm at D, I can't release the latch on B, but when I get to I, I recognize that, okay, I have a space for that. So therefore I can release the latches on D and B above me. You want to release the latches as soon as possible. So you recognize you release the latch, release them, then do whatever is the operation you want to do on the node. Right? Because that gives the opportunity for other threads to potentially go down, do other changes in the data structure of the parts. All right. So let's see now when we have to do a split. So we're going to insert 25. Okay. Start at A, take the right latch on A, go down to B, take the right latch on B. B recognizes that at this point here, that we're not going to have to propagate any changes above B. So we release the right latch on A. Then we get down here. Again, same thing. We know that we can absorb any new insert a key at C and below. So we release the right latch on B. But then now when we get to F, we recognize at this point, okay, we are going to have to do a split. So we can't release the right latch on C. So now just do the insert. We can ignore sibling keys for now. But now because I hold the right latch on C, I can do whatever changes I want to it. I can put a new key in, allocated a new leaf node, and then have a pointer going on there. Right? So again, because the changes might propagate up the parts of the tree, you don't want to release the latches until it's safe. Right? So he sort of already brought this up before. So somebody other than him, let's think about how we can optimize this even further. And so in all the cases when I had to manipulate or modify the B plus tree, what was the very first thing I always did? Lock the root, right? Latching the root with the right latch, right? So this is essentially making our data structure almost be single-threading. Yes, I can do the crabbing thing where I can release the latches as I go further down when I know it's safe. But if this is basically, if all the friends have to come in and do a lot of updates, as he even reads, I'm always going to have to take this latch on the root node, right? So taking the latch on the root node every single time is going to become a big bottleneck as I scale up the amount of activity or the amount of currency I want in my data structure. So a really simple optimization is what he sort of brought up for the hash table one, where you assume that the modifications that are going to require splits and mergers are going to be rare in your B plus tree. So therefore, you take read latches all the way down, assuming things are going to be okay. Then when you reach the leaf node, then you determine, okay, was my assumption incorrect or not? If yes, then I just take the right latch on the leaf node, do the one thing I need to do and I'm done. If I'm wrong, then I just start the search all over again, but now I've used the more pessimistic approach that we did, right? So this optimistic approach is very common in other parts of data systems and currency. Actually, outside of data, this is why they used, where you assume that the things that are caused problems in your data structure, whatever it is you need to do, are rare and you take the fast path. And if you get it wrong, you just undo your changes or roll back and start over again. We'll see the same technique applied again when we talk about transactions and locks. So the algorithm is the same as before for searches, just read latches all the way down using the crabbing technique. But for insert, delete, we'll set read latches as we go along. And then when we get the leaf node, we set the right latch, determine whether it's safe at that point. If yes, then we just apply or change if we're done. If no, we restart whatever the operation that we want to do. And we don't have to undo any changes because we didn't make any changes. We took read latches all the way down, recognized that our assumption was incorrect and start over. So this works great when there's low contention because 99% of the time, 100% of the time, my assumption is going to be correct. Obviously, if there's a lot of threads trying to modify the database at the same time, you will have contention and you end up doing a lot of wasted work. You would traverse down, optimistically recognize you're wrong and set the start over. But in general, for most workloads, this works out to be the right thing to do. So let's see an example. So let's just do that delete 38 one. Again, so I'm going to take read latches all the way down, do my same crabbing stuff that I've done before where I release it from the parent. Then I get down to the leaf node here. I take the right latch, recognize that we're safe. So therefore, let's go ahead and release the latch on D and then now I can make the change that I want on the sleeve node. And I'm done. Let's do the insert 25 one. So take read latches all the way down. Again, latch crabbing as we go down. I don't know why Paul's there. All right. So get here, get the right latch, recognize that I am going to have to split. So I have to abort the whole operation, come back and start over and do it the way we did before. Yes. So his statement is, what I've said is that if I get to this bottom here, recognize I'm wrong, I have to start over. Couldn't I just keep track that okay, C is where I want to start from next time, instead of traversing all the way down. What happens if someone deletes C? But so your statement is instead of like, I don't have the steps, but as I get here, recognize that C is the one level below, one level above the leaves, so they will take a right latch on that. Keep track of what's the safest one. I mean, it's, I think you can make it correct, but it's like, it depends. Yeah, if you have multiple right sides, it doesn't work. So like, as you increase the amount of contention or conflicts, then yes, your approach might actually might be better. But if I assume that the conflicts are rare, then it's one less right latch. I have to take, which is always better. Yes. So the statement is, couldn't the system keep track of that, that I have to restart over and over again, and that at some point, is that what you're saying or no? Oh, so it's sort of his, his, his idea. Yeah, so I mean, so yeah, you could try to be, you could try to be clever and try to handle that case. But as I said, like if, yes, that approach would be basically his, would be a little bit smarter, rather than all or nothing, just do it some of the time. Where I actually thought you were going to say was, could you keep, could the data system keep track of that? My threads are having to restart over and over again. And at some point, if I reach a threshold where I say, I'm wasting a lot of time restarting this thing, let me just always just use the pessimistic approach, right, all the way down. But yours is sort of halfway through. Yes, in the back. So the statement is, what if we kept extra metadata in the internodes, about what looks like, what things look like down below, so that I could have hints and B, for example, to say, okay, well, you're probably gonna have to split down here. So at this point going forward, take re-latches. You could do that, but it's additional bookkeeping you have to maintain. And it might be cheaper, it's usually cheaper just not to do that. Okay, so he said two things. Anybody in academia or research, or actually even in this industry, looked at lock-free data structures, yes, and then you said rust. Why does rust make this special or different? That rust would somehow make this magically better or easier? I don't think it will. No. I mean, rust is good, but it's not like, you know, it's not like some magic thing. Like, how does this? I'm not that old, but I've been through enough waves, or like there's the hot thing. Rust is hot now, Go is hot a few years ago. And then people always sort of incorrectly think that like, somehow this new programming language will make hard problems go away, and it doesn't. I mean, rust is about like memory correctness. Not necessarily like these things. This is like lock correctness, latch incorrectness. But your first thing was, what about latch-free data structures, lock-free data structures? Absolutely, yes, these exist. But again, under high contention, actually we have a research paper I can share on Piazza where we actually built Microsoft's lock-free data structure called the BW3. We actually built it at the CMU. The guy that implemented it was a master student at MSCS. Dude wrote it in Notepad on Windows, and he made his Windows computer look like it was Windows 95, and he wrote it in comic sans. Like next like, insane. But no, like when we did, when we actually built it, and then we compared it against like a well-written B-plus tree, the B-plus tree smokes it, right? The worst data structure is a skip-list, which is latch-free. That's always bad, right? Good B-plus tree will crush these things. Yes, speed, correct, sorry. All the data structures will be correct, right? It better be, it's all correct, right? Under high contention because you have to spin, well, in the case of the BW3, you had a separate lookup table where you kept track of the logical tree IDs, the node IDs. So it's almost like the buffer where you have to look that thing up to find the thing you're looking for. There's all this extra parent swap magic you have to do to make this work. Yeah, it's not, I'll send the paper. It's not, No, please, don't, please, don't. No, no, it's, yes. So you actually, like... So your question is, if you use like conditional variables instead of... No, no, no, I'm sorry. Yes. But like sort exclusive, and if your atomics are structured differently, is it... Okay, let's pause that, yeah, that's not this class, that's the advanced class, yes. B-plus tree is always better. Tries are good too in some cases as well. Okay, all right, so is this clear here? So again, there's a bunch of optimizations you guys have proposed, which I like. There are things you could do to make this better. For our purpose in this class, just assume that it's the... Go down optimistically, abort if you're wrong, yes. So his statement is, there's this... There's an obvious trade-off in the amount of concurrency we can have in our system along two different dimensions, like the node width and then the height of the tree. For you? Yeah, so not just concurrency, all the properties we care about in a B-plus tree. I mean, I talked about the node size when we talked about the different hardware types, right? The answer is yes, but I don't... It's impossible to be able to say, here's an exact formula for every single amplification and for all different possible parameters you could have in your B-plus tree and your system, what the right trade-off should be. For this class, there's the word about correctness, four kilobytes, whatever defaults is used that for simplicity, depends on the size of the key. Let's focus on correctness, a quick question already, guys. I want to get through scanning. So your statement is, if I'm at C, I had to re-latch and then I get the right latch on this, but before I do, someone does something else? Yes? Yes? All right, so his statement is, what if I come through the first time and recognize that I'm going to have to split, I come back and do the pessimistic approach, but in that time, somebody else comes through and does an insert where now the node I land on can't absorb my right without having to split. So I could have done it the optimistic way, but how would I know that? Like, as we said before, we don't want to pass messages, we can't pass messages between threads, that's too slow, right? So I don't know, but I don't care, right? The right latch is always safe, what am I safe? Yeah, nobody can modify, so that's fine. Again, I care about physical correctness, I care about if my pointer is going to go nowhere, am I not going to have false negatives, right? That's all I care about. If the path I come back the second time is completely different in the tree, who cares? Right? Yes? So he said, why not have a right log so you can batch the rights? That's the B epsilon tree I said before, our fractal tree. There are approaches to do that. So database knows it's running on single core, should it just use a single giant clock? So the statement is, if a data system knows it's running on a single core, should it just use a giant right latch for the whole thing, you can have multiple threads on a single core. Right? But I can have multiple threads. One thread stalls because the page it needs is on disk, if that's my only thread then I'm screwed, right? I want other threads to go do stuff. Now, SQLite has a single writer thread and then multiple reader threads, right? But even then you still need to protect the latch, just protect your data structure because you could have one reader thread and then the one, multiple reader threads in your data structure plus one writer thread. All right, let me get through scans because you'll need this for Project 2, right? So all the things we talked about so far have been pretty simple because it's always top down. It's again, no deadlocks, right? Because everybody's going in the same direction. The original B plus tree did not have sibling pointers and as I said the B link tree does have sibling pointers. If you don't have sibling pointers, then the way you basically scan along leaf nodes is that you get to the bottom, get to the end, recognize that okay, I think I need more things and then you got to probe down again and land on the next one, right? And each time you probe or traverse the tree using the same protocols we talked about before. But obviously if you want to be able to scan along leaf nodes, now we could potentially have deadlocks because I could be scanning this way and you could be scanning that way, right? So let's see a really simple example here, right? I want to find all keys less than four. My thread stuff gives the red latch on A, then I just do the crabbing technique. I get down to C, take the red latch on that. And then now as I traverse along, I'm not going to release the latch on C until I get the red latch on B because I don't want someone to change what the sibling pointer is while I'm jumping over and then I land in nowhere. So it's sort of similar. I have to wait till I get to the next location before I to the next sibling node before I release the latch, right? So once I have that, I can get over here and do whatever I want and I'm done. Pretty simple. Let's bring in a writer, right? Sorry, two readers at the same time. So one guy wants to get all keys less than four. The other guy wants to get all keys less than one. They start exactly the same time. They both get the red latch on node A. That's fine. Then they both get the red latch on their corresponding leaf nodes. That's fine. But now when I want to go, again, scan across, because this guy has the red latch on this, this guy has the red latch on that, those are commutative. So this is actually not a deadlock. Both T1 and T2 can hold the red latch on their corresponding nodes without any problems. So then they just swap places and then release the latch on whatever one they came from, right? That's easy. Okay, now let's bring in a writer. T1 wants to delete four, a single key. The T2 wants to find all keys greater than one, right? So at the very beginning, T2 is going to get the red latch on node A and get the red latch on B and go down. Assuming we're doing optimistic latching, so the T1 will get the red latch on A as well, then it gets the red latch on C and then go ahead and release the red latch. So I get the red latch on C and release the red latch on A, but now thread two wants to traverse along the siblings and now get the red latch on C, but it can't do that because thread one has the red latch on it, right? So what should happen here? What could we do? I hear weight, I hear deadlock, abort, which one? I don't know, can I? No, so abort who? I'd T2. He says abort T2? He says why can't we just wait again? How long? Right? So the correct answer is actually wait a little bit, but a very little bit, and then kill T2. I'm sorry, kill T1, right? I'm sorry, kill T2. T2 is red, right? Sorry, blue. We're killing the blue guy, sorry. All right, the reason we have to kill is because we don't know anything about what the other thread is doing, right? Now we know all they're doing is deleting one key, and they're not going to have to go in this direction, but at this point there's nothing in our data structure that gives us that hint to know that this other thread is not going to get this latch on here. So technically there isn't really a deadlock, but we don't know that, right? This would be a difference between locks and latches, and locks, there's a lock table that keeps track of who holds what locks and what locks they're waiting on, and we can make decisions, you know, is there a deadlock and try to rectify things. And down in the threat, in the latching world, we're trying to be as quick as possible, so if we can't get the latch, oftentimes it's just better to kill ourselves and then start back over, right? Because the alternative is, again, maintaining a latch table about who holds what latch and what it can do for. But again, if I'm trying to be in and out in microseconds, it's going to be more expensive to update the latch table than to just, you know, delete myself or kill myself and start over. Yes. So, his question is, what if I had this scenario where I was doing delete and I needed steel siblings from my guy over here, could I have a deadlock on this? In that case, you would hold the, you would hold a right latch on the parent, right? So either this guy's down here, well, this guy down here, sorry, right latch on the parent, so either this guy got him before I did, and as he scans along, he wouldn't get the right latch on this and he would kill myself or I try to get the right latch on this and I can't do that and I kill myself. Yeah, I guess that's it. Yeah, yeah, yeah, so see if it isn't, that's correct. There's a scenario, this example here, there is not actually a deadlock because they're not trying to go in different directions. I'm just trying to say that, like, you don't have to have a deadlock but you don't know that you don't have a deadlock so it's better off to just kill yourself immediately. Yes. The question is, could there be a raise condition depending on who acquires the latch on C first? Well, no, because if I, if this guy, if T2 gets the red latch on this, then gets the red latch on this, this key can't get the right latch. So there is an a raise condition. Right, but the output of this, the very fine case here is one would depend on... Okay, so his state of it is, the result of the operation could change depending on who comes in first. Yes, that's a higher level concept with transactions. For our purposes here, we don't care about that. This is called Feyenoff's. We'll cover them later. Yes, and the question is, an example we talked about before where I was trying to steal over here and I did have a deadlock, would they both have to kill themselves? Potentially, yes. She says, her question is, is there no way to pick one? How would you pick one? Obviously, yeah, the simple here is to be the one that, the read one is cheaper, has probably done less work. So therefore, if I'm a read latch, kill myself instead of the right latch, but the read latch doesn't know the other guy has a right latch and the right latch doesn't know the guy has a read latch. Well, I mean, you would know that, but you wouldn't know, you wouldn't know what their intent is, potentially on the other side. So again, all this is, there's a bunch of different ways we can think about how to make this better, but in practice, we're trying to do this as fast as possible. So just, like, just finish it, right? Can you? Yeah, so her statement is, could you maintain different timeouts? Yes. And so there's a good example. So it may be the case that the fine key is less than, that's greater than one. Maybe there's a bunch of nodes over here, and it took me, you know, milliseconds to scan across. So I've done a ton of work. So at that point, maybe I want to be less trigger happy on myself and maybe sleep a little bit longer. Yes, you could include how much work have I done so far to decide how aggressive you are about killing yourself? Yes? So in this example? So the statement is, if I'm just killing myself and restarting over and over again, could I end up doing a lot of ways to work? Absolutely. Yes, it's not avoidable. Yes, yes. So if I want to, yes, if this guy needs to redistribute steel from the sibling here, I need to hold the right latch on the parent. Yes. In my example, I didn't do that. You would have to hold the right latch on B as well. Yes. I mean, what's the likelihood of that happening, right? Like, you're talking like, within like nanoseconds, it's showing me the exact same time. So what you could do is like, I'm not saying anybody does this because I think it's unlikely this would happen, but you could say, okay, I've restarted four times, so let me sleep for 100 microseconds instead of 20 microseconds, right? You could do stuff like that. Okay, these are a lot of good optimizations. So again, the main takeaway from all this is like, we're not going to do any deadlock detection or avoidance because the latches don't support this, right? It's us as programmers, again, if you're in the data world, highly paid programmers to write good code that doesn't have this problem, right? This is why you don't let an average joshmo JavaScript programmer work on your database system internals. You're in this class now. You're at CMU, right? You're not an average JavaScript programmer. You guys are smarter than I am. You don't know it yet. By the time you figure it out, you graduate and I look great, okay? What about a ****? You got a **** app? You say what about **** app? Okay, sorry. All right, cool. All right, so this is super hard to do. Obviously, you can imagine. We barely scratched the surface. I'll post a link about the latch tree data structure we built. It's even harder. And then, so now, starting this class, we'll actually talk about how you do run queries, okay? All right, guys. See ya.