 They be on the jails, Nick. Traffic 1, I was obviously due today. Is anybody going to be bold until they haven't started yet? So who here is done? You got 100%. 15, 20%. OK. Is there any one thing that everyone's stuck on or it's just matter of sitting through and getting through the test? I noticed people have trouble with memory leaks. Right? That's real. You got to fix those. Any other high-level things you guys are stuck on? I think the definition of the fact that I didn't have something to do with it. Definition of what? Sorry? The fact that I didn't. In terms of what? What do you mean by confusion? There is a test case on grid scope that tests the type, the batch page, the fact that I need low over there. And the definition is not that clear because compared with the data on the slide. So I know several have stuck on that as a case. OK. And you posted it on Piazza, right? Yes. All right. So there's a lot of posts on Piazza. It would be helpful if you can send me an email exactly what the issue is and that way we can fix it up for next year. If it's a combination of the slides don't match with what the assignment is or the textbook doesn't match or the reddit doesn't match, let me know what that is and we can fix it and make it better. But we try to improve it this year from the issues we had last year. So we always want to get better. OK. Sorry. So homework two is due this Friday at midnight and then project two will be going out today. And what we're going to do this time, as I'll say at the end of the class, this is going to be broken up to two checkpoints because this is what we learned from last year. Last year I was like, oh, the buffer pole is easy. The B-plus tree is going to be easy too. So people waited until the last minute to try to do it and then they were like, oh, shit, the B-plus tree is hard as we'll see today. And so what we're doing is we're having a checkpoint that's due halfway through. So again, just make sure that you guys have started thinking about it and worked on it and then there'll be a final submission where you get the full grade and everything. OK. So we'll cover that at the end. All right. So let's jump into today's material. So today's lecture is entitled index concurrency control. And so this lecture I actually gave last year near the end of the semester after we talked about concurrency control protocols for transactions. And I decided to move it up front this year because in the same way that you guys had questions about, oh, how do you handle updates when you're doing reads and your B-plus tree? I decided to move it forward now, that way we can talk about it in the context of why the B-plus tree is fresh in your mind. And you're going to need this for the second project. So everything up until now, at least the last three lectures when we talked about data structures, the B-plus trees, the hash tables, skip lists and all that, for the most part we've been assuming that we were building data structures that were going to operate in a single-threaded environment, meaning only one thread was going to be reading or writing to the data structure at a time. And we sort of talked a little bit about how you could do this in a multi-threaded environment with lock-free skip lists, but I was sort of being very hand-weighty about it and didn't want to sort of dwell on it because I wanted to talk about how we're going to do things correctly today. So obviously in a modern operating environment, a modern computer, you have now CPUs with a lot of cores and you want to be able to use all those cores. I mean you want to have as many threads as possible, potentially reading or writing to your data structure at the same time and you need to protect themselves from each other. So this is what we're going to be focusing on today. And obviously also it's not just in the context of taking advantage of all the additional cores. We'll see this later on, but these concepts we're talking about today are old but from the 1970s. And back then you had sort of one CPU socket with one core. You don't have one sort of thread that could run at any given time, but because we're a disk-oriented database and we said at any time a thread could touch something that's not in memory and therefore it has to stall because the system has to go out to disk and get it, we want to let other threads run at the same time. So this is why we want to have our data structures be protected from these threads updating things, reading and writing at the same time. The one thing I also say too is a sort of a spoiler is that everything we're talking about today is pretty much used in almost every single database system. One notable extension is actually VoltDB doesn't do any of the things that we're talking about today. They actually only run single-threaded on a core. Now they're going to have multiple cores running each in single-threaded mode, but this is going to allow them to avoid all the overhead of the index concurrency control and latching stuff we're talking about today. So I'm not going to say how they're going to do it, but again they're coming at the end of the semester and they'll sort of explain why they do this approach. So this would work really well for transactions and OOTB workloads, not so great for OLAP, and we'll see why later. So the way we're going to protect our data structures is concurrency control protocol. So concurrency protocol is this broad class of algorithms or methods that software systems can use to allow simultaneous threads to operate on the same object or thing at the same time, being vague when I say thing, and to ensure that their concurrent operations still produce a correct result. So I was doing a vague, a mean vague on two parts here, the term correct and the term thing. So by database thing, it could be a tuple, it could be a data structure, it could be a page, it could be a table, it doesn't matter, and by correctness, I had this in quotes because what correctness means from one protocol to the next can depend on what their criteria is. So sort of two broad categories of correctness could be logical correctness, meaning can I see the data that I'm supposed to see? Meaning if I write something into the index or the table, if I go back and try to read it right away, am I actually going to see it? Or am I going to see the things that transactions or threads that came behind me actually updated while I was going to see those? And then the other term of correctness is physical correctness, meaning is the internal representation of how the object is being stored in memory or on disk, is that considered valid or sound? And by that I mean there's not going to be a pointer or a location in memory that we're not supposed to be reading. If I follow the data structure, I follow my treating pointers, I see the nodes that I should be seeing. So for this, we're going to focus on the second one, physical correctness. How do we make sure our B plus tree is the integrity is sound? We're going to focus later on about logical correctness when we talk about transactions or high level transactions after the midterm, and I find this really, really fascinating. I spent a large portion of my grad student career at my early time here at CMU worried about this problem, thinking about this problem. This is a really cool thing to say I can interleave operations in any way or one, long as it comes out logically correct, I don't care how to interleave them. Whereas in the physical correctness, there are certain cases where you have to do things in a certain order. So again, we'll focus on the second one. The first one will come up later on when we talk about transactions and concurrence for higher level semantics. So today's agenda, we're going to focus on different line by latch modes, and then we're primarily going to spend our time talking about doing index-crabbing or latch-crabbing or latch coupling. And this is what you have to implement in the second project. And then we'll talk about how to do leaf scans safely, and then we'll talk about a simple optimization that was invented here at CMU called delayed parent update. It's sort of an obvious optimization, but we can sort of talk about the semantics of it. Okay? Alright, so I showed this slide earlier in the semester to distinguish between latches and locks in the context of databases. And I got the impression that everyone's sort of eyes were sort of glazed over and wasn't quite sure what I was trying to say. So I'm going to go over this again because we need to understand this for what we're talking about today. So in databases, in database systems, there's a dichotomy between locks and latches. So locks are things that we're going to use to protect logical contents of the database. So you can protect tuples, you can protect tables, you can protect the entire database. We're not talking about underlying low-level things like here's some region of memory I want to protect. That's not what locks are for. And so the locks are going to protect essentially the logical contents of our database. In this case here, they're going to protect the logical contents of the index. And these locks are going to be held for the entire duration of a transaction. That's essentially what a transaction is yet. Just think of it as a sequence of operations we want to do in our database. I want to update five tuples and I'm going to do them atomically. That's essentially what a transaction is. So any lock that I'm going to take on an object in the database, I'm going to hold that for the entire duration of the transaction. That's not entirely true, but for now we just assume that. And then the key thing about locks is also too, is that there's going to be this sort of additional process or validation mechanism that's going to decide whether our transaction is allowed to proceed or not. And if not, because of some violation of lock ordering, then we have to abort our transaction and roll back any changes. So again, a really simple example, I need to update two tuples. I get the lock on the first one, I update it, I get the lock on the second one, but then that fails, I can't get the lock and I have to abort my transaction. So I need to roll back the change of the first one. And there's some additional mechanisms we have to implement to do that rollback. Now, distinguish this between latches. And again, if you're coming from an operating system background, this is what they would call a lock, right? Like you're protecting the underlying physical data structure of something. So latches are going to help us protect the critical sections of our index's internal data structure. Remember I said the distinction between logical correctness and physical correctness, the lockers could protect the logical correctness, the latches could protect the physical correctness. This is like the critical section, we want to update a pointer to another node and we don't want anybody to read that until we're actually done. So for these latches, they're not going to hold them for a long time. Your thread is going to acquire a latch, do whatever it needs to do, and then immediately give it up. And means you can acquire and release and acquire and release the same latch multiple times during your transaction. The key difference also too in the latch and lock is that we don't need to actually roll back any changes. Meaning it's sort of an atomic operation, either it happens or it doesn't happen. And we'll see it later on, but like if it doesn't happen, if I can't get the latch that I want, I can just loop back and try again. Maybe just spin on it. So there's this great table from the book I recommended to you guys from Gertz Graffi on the B-plus tree stuff where he sort of lays it out exactly where locks and latches are how they're used in a database system. So he says for locks, they're going to use to be separate user transactions, updating tuples, things like that. And they're protecting the database contents, tuples, tables, databases. And we're going to hold them for the entire duration of the transaction and we're going to have different modes for what kind of locks we can take. For now, we can ignore all this, just know that there's more nodes than there's more modes than we'll see for latches. And the way we're going to protect against deadlocks is to have a separate detection mechanism or resolution process using various mechanisms like a wait for graph or timeouts and aborts. And the information about what transaction holds what lock we store it in a centralized lock manager. All of this will make sense later on, but for now, just distinguishing the two of them. Yes? Why do latches only have a read and write mode as opposed to what? Intention mode? That's a read latch, same thing. Yeah. All right, so again, in the database literature, again, distinguishing locks and latches, there's a shared lock. You can get a shared lock on a on a tuple. You get a read latch on the memory of that tuple. That's the way to think about it. All right, so the latches are going to be used to protect threads from each other. And they're going to be only used to protect the in-memory data structures, like the B plus tree we'll be building here. So that means that we don't ever write any information out the disk about what latches are being held. Because it doesn't make sense. We come back, we boot the system back up. These latches aren't around anymore. So we're going to protect the critical sections of our data structures. We only have two modes, read and write, which we'll go over in the next slide. And then the way we're going to avoid deadlocks being good programmers, good database developers and software engineers through coding discipline. So we need to be careful to write our code to make sure that we can't have deadlocks. And the way we can avoid deadlocks is basically if I can get the latch that I want, I have to know in what context I'm trying to acquire that latch. And should I wait to acquire it or should I kill myself right away? All right, in the context of a transaction with a lock, that logic is managed by the coordinator, like a lock manager. So again, this lecture is focused on the latches. Because we're protecting the internal data structure of our B plus tree. All the lock stuff will cover after the midterm in lecture 17. And again, I find all this stuff really fascinating. So sort of what his question was about can you have a read latch, like a shared mutex? The answer is yes, we're just calling this a read mode. So a read mode latch can, you can have multiple threads acquire the same latch in the same mode multiple times. So if I come along and I want to acquire this latch and you hold it and you have it in read mode and I want it in read mode, then I can also piggyback off you and also acquire it. And internally, the latch maintains some metadata to say, all right, this thread has me in read mode and this thread has me in read mode. And I know that one guy releases it. It's not like I release the latch entirely. For the write mode, only one thread is allowed to hold the latch in that mode at a time. So if you already have the latch in read mode and I want to get it in write mode, I can't, I have to get either blocked or abort myself. And if I'm holding a read latch and then nobody else can acquire it at the same time that I am. So there's this sort of obvious compatibility matrix like this. The way to think about this is the latch mode currently is in and the top set of columns is what latch mode somebody wants to acquire it in. So if somebody holds it in read mode, I can acquire it in read mode and no other mode. And everything else gets is denied. And what happens when you get denied depends on the implementation and what context you're actually trying to acquire that latch. All right, so now using these latches, we can talk about how to do, we want to do B plus C, a multi-threaded B plus stream. So there's two types of problems we need to deal with. The first problem is when we have two threads trying to update the same page at the same time to maybe just try to insert or update the key value pair arrays inside of it. Right, and this is sort of obvious way to fix this, right? You just take a read latch on that node, sorry, you take a write latch on that node and only one thread is allowed to update at a time. And then when you're done updating it, you release the write latch and then somebody else can go ahead and update it. Right, that's a no-brainer, right? We all know that from basic systems programming. The one that's more complicated though is dealing with traversals. Because now you could have one thread traversing down to the tree, try to get into a leaf node, and while another thread is doing an insert or a delete, that's going to cause me to either split or merge or move a sibling around and that could foul up the other thread because now I'm moving memory around, I'm changing pointer to now point to new locations. Right, and that becomes problematic because now we could have, if we have a race condition where I copy the data to a new location of memory, but before I update the pointer, someone follows the pointer to the old location but maybe I've already reused that memory for something else. Alright, we're talking microseconds here or nanoseconds but this, you know, with enough, you know, bad things can happen, bad things can happen, bad things will happen. So we have to protect ourselves. So let's look at a really simple example of what this looks like. So we have a simple B plus tree like this and then what I'm doing is I'm labeling the nodes on this side because we're going to focus on this side just with the letters so that we know what we're talking about as we go down. So say I have one thread and they want to delete 44, it's down here at the bottom, so it just goes down one after another until we reach the bottom, falling to separate keys this side, which way it wants to go, but then, and then it goes and it hasn't deleted this, but now our node I here at the bottom is not less than half full, it's empty, so we need to rebalance and we'll do this by stealing 41 from our neighbor and copying it over. But now at the same time, thread two comes along and they want to find 41. So they do the same thing, they come along down here, they get down to D and at this point, when they look at D, they see the separate keys and they're like, oh, well, anything between 38 and 44 is at node H so I know I need to go there, but in between this time I'm going to follow this pointer, in between this time, the other thread moves 41 over, so now I go down here and the thing that I thought should be there is not there. So this seems like this is sort of a logical problem, because I thought 44 was there but it's not there, but it's actually a physical thing because the data structure is telling you I should go to H and you'll find what you want but it's no longer there because then after the fact the D got updated with the correct separator key and so any thread coming behind this should have gone to I. So this is essentially what we need to protect ourselves from. So the way we're going to do this, this sort of standard technique is called latch crabbing or latch coupling. So when I learned databases when I was younger, they taught us the term latch crabbing, sometimes it's called latch coupling, I forget what the textbook says, but they mean exactly the same thing. So Wikipedia might call it one versus the other. So latch crabbing is going to be a protocol that is going to allow multiple threads to access and modify our B plus G at the same time and not have any of these problems. And so the idea is pretty straightforward. Anytime we enter the tree, we always have to get a latch on our parent node and then we try to get the latch on the child node the direction we want to go and then if we were able to get that latch, we move our thread down to the next node and then we go ahead and release the parent node latch if that node is considered safe. And so by safe I mean that we know that based on whatever operation we're trying to do, whether it's an insert or a delete that the node that we just moved to would not need to be split or merged. It can accommodate whatever the change is that may be below it. So if my node is full and I'm doing insertion then I may have to split that node, my parent node so I may have to split the node I'm at and therefore I have to update my parent node so therefore I don't want to release the latch in my parent node. Same thing for a delete if it's less than half full then I may have to do a merge and therefore my parent is not safe either. So let's see basically how this works. First search is pretty straightforward. You take relatches all the way down. You acquire the latch on your child the one that's below you. If you acquire it you move down there and then you always un-release your parent because searches are obviously read only so you're never going to modify the structure of the tree. If you do insert or delete you start the root and go down and acquire right latches all the way down and then as you go down if you know that the child node you just jumped to is safe you can go ahead and release the latch for your parent or anything above it. Okay? Let's look at an example. So say that our thread here wants to search for 38 so we start off at the very beginning we get the relatch on the root A then we jump down get the relatch on B then move down there again right at this point here since it's read only it's safe for us to release the latch on A so we go ahead and do that then move down further to go to latch on D go down there get the latch on H go down there and then we're done we can read the item that we wanted pretty straightforward and that's the coupling or the crabbing part. The crabbing part is to be the way a crab walks you're releasing these latches behind you alright so let's see how we want to do a delete so we start at the very beginning we want to delete key 38 here at the bottom so we start off the right latch on A and then we get the right latch on B and then move down now at this point since B is we only have two keys in here but since B is half four or less we don't know what's going to happen below us at this point right we only know with that our current node and the one that came behind us so at this point here we don't know whether if we go down to D whether it's going to have to do a merge because it's less than half full so at this point here we can't release the latch on on A because we may have to merge B right so then we get down to D and then now we recognize that D is completely full and we're doing a delete here so no matter what happens below us D can absorb if you will any change without having to propagate it back up into the tree so at this point here we can release the latch on A and B our entire lineage back up to the root so one key thing to point out here is that we're going to release the latch the latches in first in first out order that we acquired them so I'm going to release the latch on A first followed by B and we take I guess Y exactly yeah so she said you want somebody to access A as soon as possible right so there's no correctness issue right because if I release B here you know it's not an issue because no one can get to it anyway because I had the right latch on on A on the root so if I release the latch on A first we're talking again nanoseconds or microseconds here but it opens up a window now where someone can come in and possibly get into our tree and they may not even going down the same path that we are they may be going down another path on the other side of the tree but again everyone's always stymied on on the root and blocked on that so as soon as I can release that the better it is right so then I get my right latch on H I move down and I go ahead and I can delete 38 and again there's no we're more than half full so there's no change I need to make to anybody above it so I didn't have to hold the latch for that okay alright let's see now an insertion so I want to insert 45 same thing I start off I get my right latch on A I get the right latch on B to move down and again at this point I can release A because I know I have a free slot a free key space in B so no matter what if I have to split below I can absorb a new key at B so we can release the latch on A then we get down to get the latch on I and this point here I has room so we can release the latch on B and D all the way right actually in this case here for D going back and here I couldn't release latch on B because I don't know what's below me at D so I and I don't have any more space to take another key so I have to maintain the latch on B because I may have to go all the way up and do splitting but when I get down to I I say oh I have a free space I'm not going to have to do a split so therefore I can release the latch on B and D so let's look at one more example where you do have to do a split so we want to insert key 25 same thing start off on A get the right latch get the right latch on B move down at this point here B is not going to have to do a split so we release the right latch on A move down to C same thing we have a free space so we're not going to have to do a split at C so we release the right latch on B then we get down to F and now we see we're going to have to do a split so for this we can't release the right latch on C because we're going to have to put something in there because we're doing a split on on F right so we keep the latch on our parent node and then we do our insertion update the separator key and add our new key to the page and then once all that's done we release all the latches and now our new key 25 is visible to other threads and the integrity of the data structure is still sound right so if anybody coming behind us is doing a look up they want to look at these pages they would get blocked at this point because they're not going to be able to get any latch on C yes so the way you judge whether a child is stable or not is to judge whether this question is the criteria for judging whether a node is safe or not I'll just say what it is if you're doing an insert if the node is not full that means that no matter what happens below you because you do a split that you can have room to take a new separator key at that node at your parent node so therefore you're not going to have to split that node so you don't need to to hold a right latch on that because you're never going to get changes propagated beyond the one you're at so at this point here going back so we're here we get our right latch on C and then we jump down and now at this point we know we're doing an insert so no matter what happens below here whether we split or not or even if we do split that we can put it in we have room to put a space in here that means that we're never going to have to split this and therefore update B so it's safe for us to release the latch on D and then when we get down here we say oh shit well now we actually need to do a split so now we can't release a latch on C because we're going to have to update it and we do our split that way yes his question is would you need thousands of mutexes for each tree yes right but so I don't want to get into the implementation side of things for these you wouldn't want to use like the standard standard template library mutex which is just a linux few tex right you just want to use a spin latch you have a single atomic like integer and you do a compare and swap to acquire if you can't acquire it you spin on it that's the fastest way to implement a latch so now we're talking like in each page you can do a you can do 8-bit atomic latches because we don't care about the value that's not true because if you need to keep track of writers readers versus writers you do have to it has to be a little bit better and the way you implement that is you could have a single counter for the right latch and then a counter for the re-latch and any time you acquire you add one to the re-latch and then you subtract one when you release it you can do that atomically so we're maybe talking 64 bits of re-latch it's nothing so I sort of said this already when I asked before what was the first step that we did in every single case we were updating the tree what's the very first step block the root exactly yes and if you're doing all these updates you're locking it in a right latch mode so nobody can even read it so this becomes a major bottleneck because every single time you want to do anything you're locking the root and essentially making this be a serial single threaded data structure so this will become a major bottleneck in a highly parallel environment because everybody has to go through this one contention point even if they go down to different parts of the trees they're always starting at the same spot so we need something better and so this algorithm doesn't have a name it's sort of sometimes named after the people these two Germans bear schlotnik sometimes it's called the optimistic latching algorithm it's an old thing it's from like 1977 but the idea is pretty straightforward and this is pretty much what all the this is at the very least what every system that uses a B plus tree that must be multi-threaded will do something like this there's a bunch of other optimizations we can do we're not going to cover but this is what everyone does so this better latching algorithm is considered an optimistic algorithm meaning it's going to assume that if you're doing an insert or delete that the leaf node that you're going to be modifying will be safe you're not going to have to do a split or a merge and this sort of makes sense right like my simple example I'm showing nodes with two keys but as we said before in a real B plus tree your node size is going to be at least your page size or larger so like 4 kilobytes or potentially larger than that so the likelihood that you're going to have to do a split or merge for every single insert or delete is pretty low so instead we're going to assume that we're not going to have to do a split or merge and therefore we're going to take re-latches all the way down and then right before we get to the leaf node then we take the right latch on that leaf node and then if we find out that we are going to have to split or merge then we abort our operation go back and retry and then take the right latches as the normal protocol is the idea here is that you're going to assume that your splits are rare your merges are rare so don't take right latches go to that limb's concurrency and if you're wrong you just go back and restart it right so let's see how this works so we go back and do a delete of 38 and we start here at the root again instead of acquiring the read latch sorry instead of acquiring the latch in write mode and then we just go down and do our standard crabbing and coupling all the way down and then here when we get down to D we take the right latch on H, jump down here we recognize that we're safe we can do our delete without having to split we release the latch on D the read latch, go ahead and then do our delete right it's pretty obvious alright so let's see now we have that insert on 25 so I start again read latch on A, read latch on B traverse down, release that go to C, traverse down, release on B then I get down to F I recognize here that I'm going to now have to do a split so therefore I needed that right latch on C but I didn't get that as I went down because I was assuming I wasn't going to have to split so in this case here I'd recognize that my leaf node is not safe so I'm going to restart everything and take right latches all the way down so again this is an optimistic algorithm the searchers are just the same you just do read latch coupling, read latch crowding on the way down and then again for inserts and deletes you set latches on the way down as if you were searching in read mode and then you get the right latch on the leaf node you want and if it's not safe you restart and get right latches that way you get the right latches and the entire hierarchy of the tree going down to your leaf node otherwise you just do it so again this is an optimistic algorithm because it's assuming that the likelihood that you're going to have to do a split or merge is rare and therefore you end up saving saving work and improve your parallelism because you make this assumption but obviously if you're wrong then you're actually getting worse off than you would have been if you didn't actually take this optimization because now you're actually doing almost double the work so I do my traversal with read latches get to the bottom recognize that my leaf node is not safe now I have to abort and come back and do it all over again so that work I did to figure out that I wasn't safe is wasted and so there are some optimizations you can do to provide hints about what's below you in the tree they don't have to be entirely accurate they don't have to be entirely accurate all the time but the less accurate it is the more incorrect your assumptions or approximations are going to be and you end up actually doing much worse because you're wasting cycles wasting instructions doing work that ends up getting thrown away yes this question is do you always need to start at the root node so yeah there are there are some optimizations where you can have a hash table on the side to say hey if you really want this key here's the node to jump to in general yes means you don't have to that's an advanced optimization that people have tried there's a literature from early 1980s that discussed this but of course now what I was trying to say was like you either have to make that hash table that allows you to jump to a node always be in sync or you can update it lazily but now you may end up guessing wrong based on it and have to restart and retry always going through the root there's sort of this trade-off between most of the time it's going to be right and when I'm wrong it's not a big deal for me to go ahead and undo what I did so so his statement is in this example here like in this case here when I went down I could cheap track of the path I went to and then maybe instead of restarting from the root jump to another location but again like I need to protect that because by the time I come back and restart another thread could have like trashed this C a bunch of stuff and now I'm jumping to a bad location and re-garbage so again this is like I would say that that approximation is a bad one we do this in our BW tree which I don't want to talk about here that we implemented where we if you want to do range scans in reverse direction because it only has sibling pointers in one direction you could keep the stack of how you got down there how you got to it and why we can do this in the BW tree and it's hard to do it in this is like the BW tree there's a separate mapping table that says if you want this node ID here's the memory address in this they're like page IDs and then the page ID might not be there anymore right because it might have swapped out the disk okay cool alright so the one observation I want to make now is that in all the examples I've shown so far we've had our threads acquire latches in what I call top-down manner meaning we start at the root we get to latch on that root and then the next latch we want to get always has to be something that's below us the node that's below us we never go in the opposite direction in our tree remember I said earlier that although there's no reason you couldn't have pointers from a child to a parent almost nobody actually implements it that way so no one's going to be going in reverse direction everyone's always going sort of the top-down so the benefit of this and it's sort of obvious but is that you can never have deadlocks because the order in which threads try to acquire latches are always in the same direction I can't get my latch on this child node that I want unless I have the latch on the parent so nobody else is coming in the other direction trying to get the latch on the node that I have so in this case here as we're going down if we try to get a latch and we can't on a node then we can just wait because we know there can never be a deadlock and at some point whatever the thread that holds the latch is doing something they'll be able to release it and then we can go ahead and acquire it alright so again if you're implementing this as a spin lock we just sort of spin on the variable that maintains the latch at some point it'll change then we can acquire it and then we can continue with whatever operation we want to do and it doesn't make sense also if you think about it like if I traverse the tree and I can't get the latch on the node that I want to go to if I abort and restart then the likelihood I'm just going to come back to the very beginning the root and come down to the exact same location again and just get stymied or blocked on the latch I couldn't get before so you just sort of spin and wait but as we people were asking earlier I think last week about these sibling pointers but how do we actually handle the case where we have threads down the leaf nodes and they want to go left and right because we have these sibling pointers so let's look at an example here so we have a real simple B plus tree we have the keys one through four and our thread here wants to find all the keys that are less than four so we just do the standard optimistic latch crabbing technique we just talked about we cry the read latch on A then we jump down get the read latch on C and then our thread moves down to C so now as we're scanning along we would recognize that we still need to read data on this leaf node over here because we want to find everything less than four so it's basically from four to negative affinity so we want to go all the way to the end of the tree I'm not really talking about this here but there's things called like fence keys or sibling keys where you can sort of maintain a hint to say oh by the way if you're jumping this direction this key starts at two there's some techniques to do like that but we can ignore that for now so if I want to go now and traverse along my sibling to get to B I have to do the same latch coupling or crabbing technique that I did before I can acquire the latch on B unless I hold the latch on C so since I hold the latch on C it's allowed to go ahead and get the latch on B and then it moves over to this node and it's safe for it to release C the same technique that we did before but just moving along horizontally so let's look at a more complicated example let's look at another thread in these are both read only so T1 wants to find keys less than four T2 wants to find keys less than greater than one so the same thing they both start at the same time we acquire the read latch on the root node A thread two gets the read latch on B, moves down thread one gets the read latch on C, moves down and then now they want to both scan in both directions and at this point here and since the read latches are compatible with each other the thread one can get the read latch on B and thread two can get the read latch on C because that latch can be held by both threads at the same time in the same mode and then they swap over do our scan and at this point now thread one will release the latch on C and thread two releases the latch on B and we're fine this is read only so it's super simple there are no deadlocks but now let's throw in some updates so thread one wants to delete four and thread two wants to find keys greater than one so say we start off thread one thread two gets the read latch on A and say also two we're doing automatic lock coupling so we're going to get the read latch on our parent and only get the right latch on the leaf node so thread two goes down to B with the read latch thread one goes down to C with the right latch at this point we release the latch on A again we're doing lock coupling but now thread two wants to scan across leaf nodes and get the read latch on C but it can't because thread one already holds this latch in right mode so what should happen here take a guess should it wait? she's taking her head yes raise your hand if you think it should wait it should just kill itself raise your hand if you're a thug and think thread two should kill thread one and steal his latch which is actually a real thing maybe not for this alright so at this point thread two doesn't know anything about what thread one is doing right these latches are sort of they look like dumb locks they're dumb latches it just knows it's in right mode I can't get a read mode latch because that's not compatible with me I have no idea what it's doing so what'll happen here is rather than wait around and see whether thread one is going to give up that latch I'm just going to go ahead and kill myself right and abort my operation and just come back and retry it right and the reason why we want to do this is because otherwise we could have a deadlock because like I said thread two doesn't know that thread one is either just going to do its update on C and then immediately give up the latch or it's going to scan across and try to get the latch on B which it holds and therefore you would have a deadlock so to avoid all this if I haven't do anything sophisticated we just say I can't get the latch I want I'm dead and restart and that makes the protocol super simple it's not the most efficient thing right because again we can try to be sophisticated and say oh well I know that it's just doing you know a single delete it'll give it up in a few microseconds I don't do any of that I just say I can't get it I just abort and retry so this is how we're going to hand yes in the back so his question is would it be more efficient if the writing thread thread one would kill thread would kill the thread two if it sees it waiting for the the latch it holds so he says the so thread two could just spin here keep you trying to carve the latch and then if thread one wants to go over to B and sees that thread two is waiting for it'll kill it otherwise it just lets it spin it so the first question is where are you going to maintain that thread one is spinning for the latch on C you have to maintain that somewhere like some global data structure or you have to have your thread look through every single location of every transaction that could be running and say hey are you waiting for the thing that I have you can do it but it's a bunch of extra work and it's not worth it and it makes coding this thing way harder so it's just better off just to immediately kill yourself and then retry it we'll see something like this when we talk about transaction locks but in this case here again we want these critical sections to be as short as possible so it's better off just saying I can't get it boom, restart then try to do something way more complicated, yes two writers are going in the opposite direction do they have to work so the question is if two writers are trying to go in both directions if at the exact moment they try to get the right latch on the other direction then yes they would both restart but if you think about it in practice one's going to be slightly behind the other so the first guy would try can't get it, he kills himself the second guy would then be able to get the latch but it can't happen so again the main thing I want to emphasize here is we don't need to be smart we don't need to be clever to be super simple and that's actually going to turn out to be the most efficient way to actually implement this and we can have you know sort of the question he was he was the suggestion he had was oh well I can look to see that thread one sees that thread two is waiting for me therefore I can maybe kill him or give him a hint about something I had to store that somewhere I could store that now in my node to say hey here's the list of threads that are waiting for you but now I'm storing that instead of actual data right because that needs to be variable size because I who knows how many threads could be accessing this so as far as I know nobody does this everyone just does something really simple yes so his question is when I say the thread kills himself this guy dies and he restarts is he going to is this thread going to just maybe wait a little bit and then retry or just try right again depends on the implementation right so you can do like a standard technique would be what's called sort of exponential back off it's like TCP right so I try right away if I know that I've restarted twice maybe the next time I'll wait half a millisecond and then maybe a millisecond two milliseconds and so forth yeah I think it's better things if you don't stop for a while it might be it's such a situation that forever that lock will stop forever yeah so his statement is it's a possibility your thread could get starved that no matter how many times it tries it can never get it done absolutely yes again we'll see this in transactions with locks and there's a way to handle this because now we have some global information about what transactions are running you can do things like give priority to the oldest time transaction with the oldest time stamp so we can start joking about like in this case here when I decided should I wait or should I kill myself or kill him transactions with locking will make those decisions in this case here we don't want to do any coordination we just kill ourselves immediately yes so our question is say T1 is a relock on C T2 has a right lock on this what should happen in read mode in write mode her question is do threads that want to get latches in write mode have higher preference to latches in read mode again like you you'd have to have some kind of coordination mechanism to make that kind of decision right because in this case here all you know at this point here is that this thing is in write mode and that's it right you don't know that this guy is trying to come over here and get this in read mode or some other mode you just say I can't get it I just kill myself right away the most simplest thing actually turns out to be the best all right yes more I would like to know what makes difference in like I really try and I wait for a lock if I really try I still use CPU for a different thing if I wait on a real type okay so his statement is why is retrying better than just waiting so the way I would phrase it is not waiting indefinitely but waiting with a timeout so you could in theory say all right I think this guy is going to give it up which is probably true so maybe I'll just spin for a millisecond and then if I get it great if not I restart you could do it that way but in the end in that case you're deadlocked for that one millisecond until you timeout right you're not right you can you can stop immediately after the writer release the lock be very careful you can wait there could be a deadlock but you need a timeout because you don't want to wait forever because again with latches there's no like god in the clouds that have come down and say oh you guys are deadlocked let me fix this right we don't have that here we have that for transaction locks we'll cover this later we don't have here we have nothing we're like driving a car without a seatbelt so we could have this guy spin on this latch and if he gets it great if not abort and restart right that would work as well that's still correct but again for that one millisecond if we have a deadlock we're doing nothing right and again it's more like in this simple example it's only a two level tree right so there's only I only have one latch here but like if he's modifying a bunch of crap I may hold a bunch of right latches above me and that's blocking everybody else so again you could be more sophisticated so I this guy here has a right latch and I have like five right latches above me and therefore I want to go this way and I can't get it so I should abort right away because I released a bunch of latches or if I know I only have one right latch then who cares nobody else is blocked behind me so I'll wait for a millisecond you could do that it's still correct but then what commercial systems do for this I don't know actually I don't know what Postgres does either or my sequel okay so again the main take way how we can handle leaf nodes is that we just do the coupling of the crabbing along the leaf nodes but if we can't get the latch we want we should just kill ourselves right away right which is different than how we got latches going top down going top down we were allowed to wait so there could never be a deadlock going left and right there could be a deadlock so therefore to just avoid all this we just kill ourselves right away and again this is this is us being good programmers good database developers and software engineers that have to design our data structure to be thread safe and protected right it's hard to do so the last thing I want to talk about is a optimization again that was then here at CMU for the B-link from the B-link tree paper which was from 1981 or something maybe in the 70s by Phillip Lehman who's in the dean's office at SCS so again the only thing you really understand about the B-link tree is that it has the pointers the sibling pointers along the leaf nodes the original B-plus tree doesn't have that every modern implementation has this aspect of the B-link tree so the observation that they made was that any time that I was going to do an insert that if I had to split my node, my leaf node because it overflowed then at least I would have to update at least three nodes in my tree or right to three nodes so you'd have to write to obviously to the node that you're splitting then you'd write to the other node that you overflowed into and write all its keys into it and then you have to write into your parent to say here's the new separator here's the pointer to the new leaf node that I just created you may have to go up further in the tree but for our purposes now, we can ignore that so the optimization that they proposed was that any time that a leaf node overflows and you do a split just delay updating its parent node right, don't take the right latch on it give it up right away at some later point you'll go ahead and update it as we'll see in a second it still guarantees that the the contents of the the B plus tree is sound internal representation so say we want to use that example again we want to insert 25 so we do the optimistic a lot of coupling that we showed before, we take read latches all the way down we get down to C again we still take the read latch release on B now we do the right latch on F and now we see that we're going to have to do a split but instead of restarting which actually we wouldn't have to do in this case here because instead of restarting and coming back and getting the read latch on C we just keep going and then what we're going to do now is we're going to go ahead and do the split on F we hold the read latch, sorry we hold the right latch on it we can go ahead and do that, no one's going to get to us and but we're not going to update C because we don't hold a latch on that right so we insert 25, we add the new leaf node we connect it to our the sibling pointers and then we're done and then now what we're going to do is we're going to post some information in our data structure to say that the next time someone comes along and takes the right latch on C go ahead and add the separator key that we didn't put into it right so at this point here we've inserted 25 we split off a new node but it has 31 at this point here if any thread comes along and is looking for 25 or 31 it can find it even though we don't have this pointer from C to this new node here right if you look at this if I'm looking for 31 I go from 20, I go down this way 35, I go down this way 31 is greater than 23 so I go here and then there's some little information again we maintain about what the next sibling key would be so I know that if I'm scanning along here that I'm looking for 31 my range for this node F is 23 to 25 but oh by the way if you're looking for something greater than 25 but less than 35 go over and follow this node here so I can still find 31 if I'm looking for it I just didn't do it through C I just did it by scanning along the leaf nodes and again I do that leaf node scan crabbing that we talked about before so now let's simulate a point some of the thread comes along and say it wants to do an update and it comes down here you can read latch on B and then it knows that there's a pending update for C that needs to be applied it can go ahead and inquire C in right latch mode so then when it comes there it says alright now actually I can post my new separator key 31 and have it point to the new page that the other guy created alright the idea here is again that we want to delay having to do our update because otherwise we'd have to do the restart and inquire the right latch but if we come back the second time we can inquire the right latch on C and actually do the update it's pretty obvious optimization actually I don't know how common this is but it does appear in the literature a lot of times when people talk about B plus trees at least the modern variants of them okay so any questions latch crabbing or coupling in B plus trees yes in the back this question is it's a late update only on insert yes so hopefully I made it sound well actually not hopefully but hopefully I conveyed to you that at a high level the protocol of making a B plus tree thread save is easy to understand but I would say in practice it's actually very difficult to get correct now for the guy, would you guys be implementing in project 2 you know you don't have to implement all the various optimizations that we talked about you can sort of do the straightforward method and going always top down and left to right there's a boarding, should be pretty straightforward the other thing I want to convey also to is that although we described all these mechanisms to protect the data structure in the context of a B plus tree these higher level techniques are applicable to all sorts of different types of data structures think of things like acquiring latches always in the same order to avoid deadlocks killing yourself right away and restarting all those things you can apply to other areas, other facets of computer science with data structures as well so it's not just in yes we're doing a B plus tree because that's the most common thing in databases but for hash tables for red black trees and splay trees and other things you can do the same kind of thing okay, so any questions about lashing or crabbing? perfect, you're all salivating for project 2 okay so project 2 you guys be building a a thread safe B plus tree using the lashing and crabbing techniques we're talking about here today so the project is broken up into four parts the first one is defining the page layout then you have to build your data structure then you have to make an iterator class a wrapper around it and then implement latch crab so the way it's going to work is that we're going to provide you guys with again the scalp owning the API that you guys have to implement the same way we did for project 1 and then you guys just fill in your implementations and you're free to add whatever additional private classes or private data structures or private methods you want inside your class the highest level you need to implement the API that we give you okay so the one thing we're doing differently this year is that we're breaking up the project up again into two checkpoints so the first checkpoint we do October 8th which I think is a Monday at midnight again you just submit this on grade scope and it'll be the sort of the first half of the project so you implement the page layouts for the internal node and the leaf nodes and you only need to actually support unique keys we're making it much more simple you don't have to worry about I think also we don't make you support variable length keys either they'll be fixed length and then you start building out your B plus tree and in the first checkpoint you only need to support search and insert right insert you obviously need in order to populate it and then search you just need to check whether things are there that should be there so another way to think about this is you only need to support inserts with splitting and not delete it in the first checkpoint and just like before we'll provide you with some rudimentary test cases to evaluate your work the grade scope one will have a more comprehensive test but you're also encouraged to write your own you can build off the sort of testing framework that we give you so this first checkpoint will be worth 40% of the grade and then for the second second checkpoint is worth 60% and that's due October 19th which is a Friday I think the midterm is that Wednesday the 17th so this will be due after two days after the midterm so now for the second checkpoint you have to implement deletion right with merging or sibling stealing and then you have to implement an SDO index wrapper or iterator wrapper around your index and that means you have to do the leaf note scans and make them thread safe actually the second checkpoint overall you have to make everything be thread safe so you have to do the latch, crabbing, and coupling techniques we're talking about here and it's up to you to decide whether you want to use the optimistic one or not or you want to do delayed updates or other it's entirely up to you guys but then we'll benchmark you guys with the leader board and see who actually has the fastest one question can we just lock the root of the tree? the question is can you just always lock the root of the tree first of all latch the root of the tree no you can do that to start so you could do this at this point here like if you could get it with a single latch for the entire thing I think if I remember correctly from the test I think we have hooks so that we can pause threads in the tests and check to see whether you are actually supporting concurrent threads at the same time so we have things like that where like instead of just giving a key and then it comes back we have hooks in the in the traversal steps so that maybe we get to the level two and then we pause and then try to have somebody else come behind you and actually support concurrent operations because that way you know one thread is going down a different path and it should be allowed if one thread is paused inside of it so that's what we check from the concurrent index okay some general development hints I would say is that the way it's implemented is almost exactly how it's described in the textbook I think it's chapter 1510 it seemed down to like the method names or exactly what's in the textbook if you follow that, follow the semantics that that'll help you understand things one of the TAs took the class last year and he says like it wasn't until I went and read the textbook on their discussion of B plus G and crabbing that the project actually makes sense to me so I encourage you to go do that first before you get started the other thing you can do also and we have this on the project page you're encouraged maybe to set your node size to be smaller than four kilobytes the node size is the page size but now if you have that if you have it be four kilobytes then you have to assert a lot of stuff before it starts splitting so if you use a smaller page size or node size like 5 or 12 bytes then it doesn't take very much insertions before it starts splitting and you can test that to see whether your thing is working correctly and then the last one also too and again this is in the documentation of the project page you want to make sure that you protect the internal pointer for the root page in your B plus tree that's the thing that people got fouled up on last year a lot if I start doing splits and I update my pointer to my root node which is represented through this root page ID then when you come back you're now pointing to garbage okay so the way you're going to do this is that we're going to give you a new tar ball we'll give you a simple bash command to copy in the files from project one into the new tar ball because it can have some updated code and when you submit it you only want to submit the ten files that you modified plus the six files from project one so everything is built on top of what you've done before and is accumulative and as always post your questions on Piazza or come to TL for hours and then obviously don't cheat there are implementations of B plus trees on the internet you can look at them, you can follow them but copying them into your code is not going to help you because the API is different and don't put any of your code or solutions on github okay alright, any questions yes we say it again, sorry yes for project two for project one should be today we'll check this because I'll say it again, so every day you're late it's 25% off so it's and you get four late days so you can submit it at least four late days then it's four more days after that before you get to zero so it's technically eight days so submit it okay okay next class we'll finally actually start looking at how to actually execute queries we've sort of been beating around the bush they'll come later, we can read our B plus tree we can read our pages with tuples now we'll actually talk about how to do this again, building up the layers in the system building top of the B plus trees, building top of the buffer pool manager now we can run queries okay that's my favorite art what is it? yes, it's the sd cricket i d e s I make a mess unless I can do it like a geo ice cube with the g to the e now here comes Duke I play the game where there's no rules homies on the cusse, I'm a foo cause I drink broo put the bus a cap on the ice broo push week on the go with a blow to the ice here I come, Willie D, that's me rolling with 501, South Park and South Central G and St. I's when I party by the 12 pack case of a thought 6 pack 40 act against the real promise, I drink broo but yo I drink it by the 12 they say bill makes you fat but saying I's is straight so it really don't matter