 Okay guys, let's get started Again, thank you DJ drop table so always always always keeping it down How's your mix tape going? Okay, and it's all it's all like DJ drop table beats Nothing but you and like your own stuff. No, it's all Okay, all right, so let's get started it's a beautiful day out and I think that's why the The turnout here is so low which sucks because I mean every lecture is awesome lecture, but this one. I like a lot, too the Before we get into the course material just discuss real quickly. What's on the again schedule for you guys? Project one is due this Friday at midnight and again You should submit that on Gradescope homework to is due on Monday at midnight also spent on Gradescope So we'll send an announcement out on piazza, but we've updated the PDF So that you can drop the pictures in right into the the PDF So you should just spitting that no so we give you a template for draw.io So it's an online tool to go quickly edit and modify the templates for your answers So there should be no handwritten drawings and no photographs of like drawings that everything should be done digitally And then we'll be releasing project to On on this Monday as well and that'll we do Me two or three weeks in October, okay? So any high-level questions about the project or homework to okay, so let's get into this so The thing we need to talk about now is that we spent the last three classes talking about data structures Spend down hash tables and spent two days on b-plus trees Radix trees and other tree data structures So for the most part during this entire conversation when we talk about these data structures We've assumed that the that they were only being accessed by a single thread There was only one thread that could be reading and writing data to the data structure at a time And that simplified this guy the discussion so that you just understand what's the core essence of how these data structures work But in a real system, we obviously don't want to just have a single thread Be you know only accessing the data structure at a time we will allow multiple threads because a modern CPUs There's a lot of CPU cores, so therefore we can have a multiple threads running queries and all updating our data structures We're also going to allow this the high disk stalls You know or stalls due to having to go read read things from disks because now if one thread is doing something and it reads a page It's not in memory it has to get stalled while the purple manager brings that in and then we can let other threads keep running at the same time so We're have a lot of threads running in our system and we do this because that maximizes parallelism or maximizes the Reduces the latency for the for the queries you want to execute So for today, we're now talk about it now We bring back multiple threads and want to update and access our data structure. What do we need to do to protect ourselves? So let's say as in a quick aside So everything that we'll talk about today is what how most database systems actually work most data systems that support multiple threads We'll do the things that we're talking about today doing this latching stuff There are some particular systems that actually don't do any of this And only now single threads are still access the data structures and they still get really good performance So both to be and Redis are probably two miss famous ones that do this So in case of red is Redis only runs in one thread It's a one threaded engine in both to be it's a multi-threaded engine But they partitioned the database in such a way that every B plus tree can only be accessed by a single thread So you avoid all this latching stuff that we're talking about today and you get really great performance But obviously this means that it complicates Scaling up to a multiple cores or multiple machines, but again, we'll talk about us. We'll talk about these things Later on in the semester, but the main idea now is that everybody pretty much does this thing things that we're talking about So the way we're going to protect our data structures is through a concurrently show protocol a concurrently show a scheme And this is just the the method in which the database system Guarantees the correctness of the of the data structure by enforcing all the threads to access the data structure And using a certain protocol or a certain certain way And so I'm putting the the word correct in quotes because that can mean a bunch can mean different things and the kind of things We're talking about they're accessing although we've been focused on data structures But it really could be for any sort of shared object in the system Right could be for tuple could be for index could be for the page table and in the buffer pool. It doesn't matter So the two types of correctness we care about with in current control are logical correctness and physical correctness So logical correctness would be like a high-level thing that says if I'm accessing the data structure Am I seeing the values or am I seeing the things that I expect to see? So if I have a b-plus tree index, I insert the key five in my thread if that thread comes back and reads key five right away It should see it right should not get a you should not get a false negative But that's a logical correctness thing that I'm seeing the things I that I expect to see The thing that we're going to care about in this class is physical correctness But how do you protect the internal representation of the data structure how it maintains pointers and references to other pages and keys and values? How do you make sure that as threads are reading writing this data that the integrity of the data structure is sound? So it's able to be we don't want the case where we're falling down traversing to the b-plus tree and When we will jump to the next node We have a pointer to that and then by the time we read the pointer figure out where we need to go and then try to jump there Somebody else modifies the data structure where now that pointer is pointing to a an invalid memory location And we would get a seg fault So this is what we're trying to do today We're trying to protect the internal data structure to allow multiple threads read and write to it and that they still The data structure is behaving correctly For the logical correctness We'll worry about this more when we talk about transactions and currency troll I this is a whole another inch and super interesting topic, but for today We say, you know, how can we make sure that the the data structures are thread safe? So we'll begin by talking about what is actually a latch go a bit more detail than we then we talked about so far And how it's actually implemented and then we'll start off with an easy case of actually doing Thread safe hash tables using latches for those because that they're actually really simple to do But then we'll spend those for time talking about how to handle some b-plus trees and we'll talk about how to do leaf node scans And other optimizations again when we have multiple threads accessing things at the same time. Okay, all right So I showed this last slide last time And I don't think everyone, you know, we only talked about it very briefly and I don't think everyone absorb it So I want to spend more time talking about the difference between locks and latches so in the database world where I live a A lock is a higher level concept that protects the logical contents of a database So a logical content would be like a tuple or a set of tuples or a table or database And we're having using these locks to protect these logical objects from other transactions that are running at the same time Like if I'm modifying something In a transaction and so I don't want anybody else to modify that that tuple at the same time that I am Right you may for other reasons, but for our purposes assume that we don't want that to happen so for these locks we're going to hold them for the Entire duration of the transaction again, that's not entirely true But again for our purposes just assume that's the case and then we need to be able to roll back any changes We make to our to the objects we modify if we hold the locks for them So if I'm trying to transfer money from my account to her account if I take the money out of my account and then I crash Before I put the money in her account when I come back. I want to reverse that change. I made to my tuple So these so in that means the data system is responsible for knowing how to roll back these changes So notice up here. I didn't say anything about threads. I'm talking about transactions So a single transaction could be could be broken across multiple threads and they could all be updating the same tuple That's okay. That's allowed because the transaction holds the lock. It doesn't matter what thread that that's actually doing the modification Where we get now into the low-level constructs that we care about again for protecting the physical The physical integrity of our data structures or the objects is Laches so in the operating system world This is what they call locks of u-tex's in our world. There's latches because we need to distinguish them from locks So latches are going to protect the critical sections of the database systems internal data structures From other threads that are reading writing to that data structure or that object at the same time So we're only holding the latch for a short period just for the duration that we're in the critical section to do whatever Operation that we need to do I want to update a page. I hold the latch on that page make the change and then release the latch We don't need to be able to roll back any changes here because The option the operations we're trying to do are essentially meant to be atomic So I hold I grab a latch on something I make whatever a change I want and then when I when I release the latch Then the operations consider done. So all the changes are there if I can't acquire the latch then I'm not going to do the operation Anyway, so there's nothing to roll back so another way to think about this is this great table from the That that B plus G book I recommended a few lectures ago from good scrappy We has this nice table that lays out again the distinction between locks and latches So for locks, we're going to separate user transactions from each other And they're going to be protecting the the data is contents tuples tables things like that And we're going to hold them for the entire duration of the transaction There's going to be a bunch of different lock types that we can help hold on these objects again We'll cover this in a few more lectures And then when it comes time to actually dealing with deadlocks, we're going to rely on some external coordinator a lock manager or transaction manager to resolve any deadlocks that that could occur and The methods we can use our waits for timeouts of boards and other things and what we'll focus on these later What we care about is over here We have these latches are going to protect threads from each other for our in-memory data structures We're going to protect the critical sections inside these data structures. There's only going to be two lock modes read and write And the way we're going to avoid deadlocks is through us being good good programmers Which is nice for databases good equals expensive, right? So it's up for us to make sure that we write high-quality code in our data structures to avoid deadlocks because there is no external thing Like a transaction manager a lock manager that's going to rescue us if we have a deadlock It's up for us to us to design and implement our data structure in such a way that deadlocks cannot occur And we'll see what that looks like later on So again our focus is on here. We'll discuss all this lock stuff in lecture 17 after the midterm Again, I find all the super fascinating, but this is this is like This is like one of the black arts of database systems if you can you know actually make this stuff work All right, so let's talk about the latch modes for for for that we can have again. There's only two modes read and write so a latch if a latch is being held in read mode then Multi-threads are allowed to share that read latch Right because again, it's a read-only operation. So I can have multiple threads read the data structure at the same time There's no conflict. There's no integrity issues that could occur. So they can all share share that If I take out the latch in write mode, then I can only that's an exclusive latch Only one thread can hold that latch in that mode at a time So if I hold the right latch on making changes, nobody else can read that object that I'm protecting until I finish Right, it's the only two modes we care about think of this is like again multiple threads can share this one This is this is an exclusive latch All right, so let's talk actually how you implement a latch in a real system So the first approach is is probably the one you're most familiar with You know when you take any kind of systems course or operating systems course is a blocking operating system mutex and blocking OS mutex So this is the simplest thing to use because it's sort of built into the language Like it like in C++ the standard template library has this thing STD mutex and it's really simple to use You just declare it then you call lock do something what you know on your on the object you're protecting with it And then you call unlock and you're done right So does anybody know how this actually works in the operating system at least in Linux? How do the mutex like this work? Yes, he says few text. What is it few text was that? He said well so he said few text he's correct in Linux few text in for fast user space mutex The way it works is that there is in user space in the address space of your process There'll be a memory location that has you know like a bit It's usually a bite or so but I'll have a memory location that you can then try to do a compare and swap on To to acquire that that latch But then what happens is if you don't acquire it then you fall back to the the slower default mutex where that goes down into the operating system So the idea is you do a quick compare and swap and in user space if you acquire it You're done if you don't acquire it then you fall down to the OS which is going to be slower Because what happens is if you go down to the OS and sit on a mutex inside side of the kernel Then the OS is aha. Well, I know you're blocked on this mutex. You can't get it So let me tell the scheduler to de-schedule you so you don't actually run and the reason why this is expensive Because now the OS has its own internal data structures that is protecting with latches So you got to go update now the scheduling table to say this this process of this thread can't run yet So he's correct fast user space mutex is will be fast because that's just a spin spin latch We'll talk about the next slide, but if you fall down to the OS then then then you're screwed So this is another good example or like we were trying to avoid the OS much as possible for the first project You guys use this because it's fine, but if you have a high-contention system Then everybody's going down to the OS and that's that's gonna be problematic So the alternative is to implement ourselves using a spin latch or test and set spin latch So this is extremely in a extremely efficient It's super fast because on modern CPUs. There's a single instruction There's an instruction to do a single compare and swap on a memory address We think it's like I check to see whether the value of this memory address is what I think it is and if it is Then I'm allowed to change it to my new value. So think of like the latch is set to zero I check to see whether zero and if it is then I set it to one and that means I've acquired the latch And you can do that a modern CPU is in a single single instruction Right, you don't have to have you don't the right c code like if this then that it does it all for you So the way you would implement this is in c++ is that you have this atomic keyword Which is templated you can put whatever you want in there, but they have a shortcut for you Called atomic flag, which is just an alias for atomic bull And so inside this now when we want to acquire this latch We have to have this while loop That says Test and set the latch if I acquire it then I jump out of the the the while loop and you know Because I hold the latch if I don't fall into the while loop and now we have to have some logic to figure out What should we do The simplest thing is just to say all right. Let me just retry again loop back around and keep trying it Right the problem with that is though that's just me burning out your seat Or you're not burning out literally like you're just burning cycles in your cpu Because you just keep trying to test and set test and set test and set and it's always going to fail And you just keep spinning around in this infinite loop So the so the os thinks you're actually doing useful work because it doesn't know about instructions you're executing So it says you keep executing instructions. Let me keep scheduling you and you're to spike the cpu So this this test and set thing is the same thing he said before about the fast user mutex This is the same thing the os provides you in the linux standard or the scd mutex on linux But maybe I don't want to burn my my cycles, but just keep retrying Maybe I want to yield back to the os get de-scheduled and let let it schedule some other threat Or maybe I try a thousand times and i'm saying i'm not going to get this and I just abort So this is a good example of where we as the database systems developers and we can be smart or we can We can tune the our implementation Of how we're using latches in our data structures to be mindful or try to accommodate what we think the workload is going to look like If I think that this latch is going to be like whatever the operation i'm doing the latch is going to be super fast Then it's probably faster for me to just keep retrying because whoever holds the latch will give it up real quickly But if I think the operation is going to be super long Then maybe I want to yield or for some amount of time or eventually abort We can't do this in the blocking os mutex as soon as we try to get it. We can't get it. The os takes over and we're blocked Yes The question is what what is this? This Oh this Yeah, like the parameters would be like it's compare and swap It says at this memory address check to see whether the value is this Like pass into zero if it if it equals zero then set it to one Right and then there's different there's different api sometimes you'll get back the old value You'll get back a true whether it's succeeded. There's a bunch of different things And then they have they have test and sets for you know for all the different Types you could you could be based on So again the main takeaway here is that again We in the data system can do a better job than the os because we would know in what context we'd we'd be using this latch so For these two examples though the latch has just been you know, do I hold it or not? As I said before we have we have different modes. So we need a reader writer latch That can support These different modes and so the way we basically do this and we build on top of whatever Our basic latching primitive we have either the spin latch or the the os mutex And then we manage different cues to keep track of how many threads are waiting to acquire the different types of latches Right, so maybe just maintain some counters to say here's the number of threads that hold the latch In in this mode. Here's the number of threads that are waiting for it So if a read thread shows up and says I want to get the read latch Well, I look over here and say nobody nobody holds the right latch and nobody is waiting for it So I go ahead and and hand it out and I update my counter to say I have one one thread that that's uh That holds this latch Another thing that comes along it wants to also acquire the read latch Again read latches are compatible or making me shared So we just recognize that this guy already holds the read latch of this guy can also acquire it and we just update our counter So now the writer writer thread comes along. What's the right latch? It has to stall because the Because the read latch is being held by by other threads. And so we just add our counter here to say that we're waiting for this So now if a read thread comes along and wants to read latch What should happen? Right, so he says depends on what policy we're using we could just immediately let the say oh The read latch is already is already being held go ahead and also acquire it But that could lead to starvation because the right the right thread will never get to it So in this example here, we could just stall it add it to the our counter say we're waiting for this And then eventually when the first two guys release the latches the writer thread will get the latch Again, this depends what policy we want to use depends on In what context we the the latch is is is being used Right if it's a if it's a if it's a data structure where there's not many rights But the rights are really important then we want to give higher priority to the writer threads Okay again, we just build on top of our The data structures that or the the the the latching primitives that I showed before to implement something like this And you can still do this you still depending on how you organize the memory you can still do this Most of the operations on this atomically Okay All right, so let's now see how we take these latches and actually do something with them So the first one as I said well first talk about do hash tables because this is actually super easy to do And the reason why it's super easy to do is because The ways in which threads can interact with our hash table Is is limited meaning we we probe into a For this one assuming we're doing like the static hashing table the extendable and linear stuff the dynamic ones they're a bit more complicated but the the same principles apply here, but Say in the linear probing hashing table my key shows up I just hash it I jumped to some slot and then I just scan down In sequential order on the hash table to I find a thing. I'm looking what I'm looking for And everybody every other thread is doing the same thing. They're always scanning top to bottom Eventually you reach the bottom and loop back around but you think of that as just a circular buffer where you're essentially always scanning down So in this case here deadlocks aren't possible because everybody is going in the same direction Nobody's coming up in the other way and they hold a latch that I want and it holds a latch that I want Like you can't you can't have a deadlock So this makes this super super simple So for now to resize the table This one we just take a global latch on the Usually in the header page That just prevents anybody else from from reading or writing the the table until I complete the resizing But again, that's if we size our table large enough to in the very beginning like this is a rare occurrence Most of the time we're just doing you know probes or or insertions and that'll be fast Deletions also complicate this too if you want to do compaction or move data around but for that we can just ignore So the two approaches to do this Will differ on the granularity of the latches So the first approach you just have at a On each page you just have a single reader writer latch And so when a thread wants to do something, you know Do a lookup before it can read the page or access it it has to acquire the right latch for that page The other approach is do more fine-grained latching will you have a Latch for every single slot So that means as you're scanning down you can you acquire the next slot's latch And then you go into it and then you look for whatever whatever you're looking for So there's this trade-off between the computation and storage overhead between these two approaches Because the the page latch we have to store less latches. There's only one per page But now this can potentially reduce our parallelism because I can't have to you know Even though two threads might be operating in different slots because it's in the same page. They can't run at the same time In the case of having a latch per slot It's going to allow for more parallelism because the latches are more fine-grained But now i'm storing more latches in every single slot And now it's also more expensive to you know to keep acquiring all these latches Uh as i'm scanning through because i'm doing it for every single slot that i'm looking at So let's let's look at some high-level example So first one would be page latches. So again, say we have a simple Three three page table that has uh two slots per page And so the first thread wants to find d and say d hashes to this this position here this slot So before I can go look inside of it to see whether the thing I want is there I first have to get the relatch on it And then once I have that now my cursor can start looking at it Now let's say another thread comes along and they want to insert e And e wants to hash to where c is Can it do that can actually start looking at it? No, right because because it wants to take a Right latch on this page because it doesn't know that c is is full It doesn't know it's going to have to scan but so before to get even look at it It needs the right latch the right latch is not compatible with the read latch So it has to stall and wait So the first guy scans down he looks at c and now he needs to go look at this next page here And again the way we figure out what page to look at Is we just look you know we look in our The header for the hash table and the headers are going to say here's all the pages that you're looking for But logically they're ordered sequentially right so like page zero page one page two So you look in the header and say where do I find page two for my hash table? and so In order to do the traversal and when I want to start going from from page one to page two I actually don't need to hold the latch On one in order for me to jump down to two Because my my hash table static i'm not resizing so this location is always going to be the same So I can immediately release the latch before I jump to this and allow anybody else to keep running And then I can go ahead and acquire the latch for this This is going to be different when we talk about b plus trees b plus trees You have to hold the latch on whatever node you're coming from before you jump to the next node And it's only when you get to the next node. Do you then release the one behind you? Yes So he's talking so he proposed an optimization where in this case here for thread thread two Instead of trying to acquire require a right latch could I just require a read latch figure out whether the thing I actually want would be there or not Uh, and then if it is then I go back and try to acquire the right latch Or I just jump down here and say, uh, you know do the same thing because I know the thing I'm looking for Is not here if there's no delete to no movement. Yes, uh We'll talk the same technically apply for b plus trees I'm doing it that's sort of the the naive way, but yes, you can actually do that In general, you don't do you you don't really do latch upgrades You can't say I'm in read mode now put me in write mode You release the latch and then in one mode and put and get acquired again in another another mode All right, so this guy gets a read latch he can start reading this now this guy gets the right latch It sees that sees not what it wants so it wants to scan down here and this time T1 has gone away so it can go ahead and do the right latch see that the thing This slot is occupied come down here and do the insert Again, it's more coarse grain because only one thread can be inside if they're doing Uh, if the if the latch mode is conflict there's only one thread at a time could be inside the the table But it makes it more simple to actually acquire these latches. I don't not acquire latches every single one So let's let's see how to do it in uh with slot latches So again t1 starts it wants to do uh find d it hashes to where a is So it acquires the read latch on a And then t uh t2 starts it wants to do a write so acquires the right latch on c And at this point when t1 starts up again and tries to look at this It can't run because it can't get that latch whereas so he has to stall whereas, um This other thread can keep going down here and then now this guy can then pick up and keep going behind it All right, so then eventually it has to stall too because it can't go here. This guy moves on does his insert Uh, and then this guy can then proceed Right, so we can do the exact same optimization that he said and we'll see this in the context of b-plus tree I could just keep taking relatches to I find the spot that I want And then I try to acquire the relatch what I want But I do have to handle the case or I do I take the relatch see this is the spot I want to go Then I release the relatch then come back and try to take the right latch and in between that time Somebody might have inserted something in my slot and then I need then I need to be able to handle that and keep scanning down the low so just That technique works, but there's extra stuff you have to do So again the main takeaway I want you to get from all this there can't be a deadlock Because everyone's scanning from the top to the bottom That makes our life easier. There's nobody else coming in the other direction So that's why also too we can just release the latches before we jump to the next one Because we're not worried about the location of the page. We're jumping to changing Okay, all right, so let's talk about more complicated things. Let's talk about how to do this in a b-plus tree So again, we want to have multiple threads running at the same time And then we allow them to redo reason rights Without having to lock the or latch the entire tree or during that duration of the operation So the two things we need to handle in our b-plus tree to make them thread safe Is that we need to handle the case where two threads are trying to modify The same node at the same time And then we need to handle the case where one thread might be traversing the tree And then down below it before it gets to the leaf node another thread does a Modification that causes a split and emerge and now the location of a page May end up or of a node may end up getting moved around And the data i'm looking for is not there or even worst-case scenario I have a pointer to now and in memory an invalid memory location So let's look at a high-level example here So we're going to focus on this side of the tree I'm just labeled them a b c d e and then so forth on the leaf nodes So say we want to do a A delete on on 44 down to the bottom So the first thread is going to start at the top again We just do the traversal we talked about so far where we look at the separator keys We figure out whether you want to go left and right and we move down to the child node based on that So then we get down to this leaf node here and we can go ahead and delete our entry But now we see that our node is less than half full in this case. It's entirely empty So therefore we have to rebalance And so we're going to want to in this case here instead of doing a merge We'll just copy over A key from one of our siblings But let's say before we can do that rebalancing The os swaps out our thread All right, and we get stalled and now another thread is start running And that other thread wants to do a look up to try to find key 41 Right down here at the bottom So it does the same thing it starts traversing the tree and then it gets down to this point here And it looks at the separator keys and figures out. Oh, I want to go to this node But then it get it the os stalls this switches back to our first thread And the first thread moves 41 over And then now when my other thread starts up running again I get down here and the thing that I thought was there is no longer there Right So so best case scenario, this is just you know, we got a false negative here We thought key 41 does exist, but the index told us it didn't exist That if all the anomalies or issues we're talking about today, that's the best case scenario worst case scenario was This node got moved around and then now this pointer pointed to nothing And we and we would get a seg fault in a program of crash So the way we're going to handle this Is the classic technique called latch crabbing or latch coupling When I you know when I was a young lad and when I was taught databases I would they gave me I was told the term was called latch crabbing I don't know what the textbook actually uses but the wikipedia I think calls it latch coupling It's all the same concept the same thing So latch crabbing is a technique that allows some multiple threads to Access the the B plus three at the same time and we're going to protect things using latches So the basic idea the way this works is that Anytime we're at a node. We have to have a latch on that node It could be in write mode or read mode And then before we can jump to our child We got to get the latch on our child the next the next node we're going to we're going to go to And then when we land on that on that child we can then examine its contents and if we determine That the child know we just moved to is safe Then it's okay for us to release the latch on our parent And so the term latch crabbing sort of has to do with the way like crabs walk like moving one leg past another That's how we're going to acquire latches as we go down So our definition of safe Is one where if we're doing a modification The the node we're sitting at will not have to do a split emerge No matter what happens below it in the tree So that means that it's either not completely full if we're trying to do to insert We have room to accommodate any key that may come up to us or any key that we're inserting And then if we're doing a delete we know that it's more it's more than half full Meaning if we have if we have to delete a key, we're not going to have to do a merge Right. So again, the basic protocol works like this at the very root you you acquire the right latch You need so in the case we're doing a fine. It's all relatches all the way down Again, every single time we we get to the next node We release the latch on our parent node where we came from because again, we're not making any modifications So every every node is deemed safe For inserts and deletes We we start off with getting right latches all the way down and then Presently recognize that the node we're at is considered safe We can release any right latch we have up above in the tree Because again, no matter what happens below us, they will not be affected. They will not have to get changed So let's look at some examples. So again find is super simple I want to find key 38 at the bottom. So my thread starts off the beginning I get the relatch on a I come down to now b and now at this point here again, because it's A read only operation. It's a find It's safe for me to release the latch on a So as soon as I get down to b I can release the latch on a and I'm good to go And now I keep scanning down and do the same thing get to d release on b Get to h release on d and now I do my read and and I'm done Right pretty straightforward. So let's see now if we want to do a a delete So I start off with the right latch on the root I come down to b after I acquire the right latch now at this point here Is it is it safe for me to release the latch on a? No, why because I only have one key in b And so I don't know what's below me yet. I'm going down. I'm going when I'm doing 38 So I'm going down here. I don't know what these other nodes look like yet So if I do a delete and I have to emerge and I have to remove this key now I have to do I have to you know make a change up to a So in this case here, we have to hold the latch on b. I'm sorry hold the latch on a So then we get the latch on d get down here and now we recognize that No matter what happens below d We know that we have room to accommodate or we can we can at least delete one key and not have to merge So we can at this point here. We can release the latches on on a and b So essentially the thread is just sort of keeping a stack of like here's all the latches I'm holding as I go down so it knows at some point when I when I'm at a safe node I just release everything up above me All right, so now I get down to h I can release the latch on d because h is is 100 full Then I go ahead and do my delete and then when I'm done, then I release the latch and I go home So let's see now an insert same thing Start with the right latch on a and the root go down to b at this point here I recognize that b can accommodate any new insertion So it's safe for me to release the latch on a so I go ahead and do that and then I go down to d D is considered full So I don't know what's going to happen below me. And so I had to hold the latch on b So then I get down to i and now I recognize that i can never split because it has enough room So before I do the update I release the latch on b and b and d And then I then I can do my insert So for this the The order in which you release the latches doesn't matter from a correctness standpoint Right so back going back here I have to release the latch on d and b If I release the latch on d Before b that doesn't matter because no one's going to get to d anyway because they can't get through b So from a correctness standpoint, it doesn't matter But from a performance standpoint, we obviously want to release this one first Because this this covers You know more more leaf nodes. So you want to release the higher up latches as soon as possible Let's look at one more example where there could be a split So I I want to insert 25 same thing right latch on on a right latch on b B won't won't get over a full. I can release the latch on a I come down to c c's not going to get over a full So I can release the latch on b But then now I come down to f and now I see I need I need to do a split So in this case here, I need I need to hold the latch on my parent node on c While I make the change So I First insert 25 here take the spillover Page over here put 31 there and then update my My parent node Do I need to have a latch on this new new guy down here? with that Says no, why? He says no, we can access it because you can have you have a latch on the parent That assumes that there's no sibling pointers, which we'll talk about in a second So in this example here for simplicity reasons I'm not going to acquire the latch because everyone's going top to the bottom If I'm scanning along the leaf nodes, then yes someone can get to this and I have to protect them But we'll get to that Yes His statement is I said that the threads have a stack of the Of the latches that are calling as they go down shouldn't be a queue. Yes First in first out Okay, yes Yes So his statement is I said Going back to this example here. I said that you want to release the latches in the From the top to the bottom and you're saying in the os world you you you release them In reverse order So again, think about what we're doing in the data structure here At this point here like no one can get to d unless they go through b So me releasing the latch on d doesn't do anything Because nobody's waiting to get that latch Up somebody up above could be waiting to get to acquire b. So I'm going to release that latch as soon as possible So it because we know what the data structure how it's being used We understand the context of how the latches are being used. We want to release this one first Okay So now I want to ask you guys What was the very first step I did for all those modification examples for the inserts and deletes? What's the very first step you do? Exactly you latch the root in exclusive mode or right right mode That's problematic right because again the right lock right latch is exclusive No other thread can can acquire any of the latch at during that on that node So this becomes a single point of contention a single bottom in order to get into the data structure Everyone has to acquire this right latch and only one thread can hold that right latch at a time So this is a big problem. This is this is going to prevent us from getting high parallelism a high concurrency So we need something better than uh, just everyone acquiring the right latch as soon as they go in And so what we're actually going to do is exactly what he proposed before for the hash table Is make an optimistic assumption that Most threads are not going to need to do splits or merges at the leaf nodes So rather than taking right latches all the way down. I take read latches all the way down And then I take a right latch on the leaf node If I determine that I don't have to split then great I got down with just read latches and I can make whatever change that I want If I if I get it wrong and I do have to do a split emerge Then I just abort restart the operation from beginning and take right latches down So this is this is a standard technique we do in systems where you It's sort of optimistic versus pessimistic I'm optimistically going to assume that I'm not going to have to do a split So therefore I take the fast path and do do read latches We'll see this in context of other things like for transactions later on And for most data structures or remote b plus trees in the real world. This is actually a pretty safe assumption Right in my examples. I'm showing nodes would have two keys in them in a real database Your node is going to be, you know, eight kilobytes or 16 kilobytes That's going to have a lot of keys So most of the operations you're doing are not going to have to do a split and emerge In the rare case that you do have to do a split and merge and then again You just fall back to the standard latch grabbing technique that I showed before So this is from a paper from 1977 From these german guys bear and schlocknik There's no name for the algorithm that people usually just refer to as the bear schlock algorithm We're optimistic a lot latch grabbing All right, so let's say again, we want to do that delete on 38 So again, I don't take a right latch in the root I take a read latch all the way down and then when I get down to To d here I acquire the read the right latch on on on h I recognize that I'm doing a delete therefore. I'm not going to do a split and merge So therefore my my gamble paid off and I don't need to to restart right I can do my delete Without having to take right latches Right same thing for insert so insert 25 I take the relatch on the way down. I'm sorry I take a relatch and do grabbing all the way down And then I eventually get to see here where I'd take the right latch on f This one. I recognize that I'm going to have to do a split So I abort the operation and just restart it Start from the beginning and take take right latches all the way down So he said shouldn't you start at the point where you last release the latches on the way down So that would be in this case here at c, right? So your question is so If b had to know it b had to sorry When you mean by two nodes like two sibling two keys Yes Correct yes But how do you get how do you get c and f again? You can't He said he said you can maintain a stack of the pointers where you got down here. I can't do that because Again say page IDs Again these these abcde these are the logical identifiers for these nodes But they may end up being put into different pages So because I didn't I don't hold on any latches on these things Anybody can do anything and therefore the location of the page id for these these nodes may now be something different So now it used to be page 123 now. It's 456 in my stack I go look for page 123 and now something completely different Because I can't can assume that the location of these nodes will always be the same unless I hold a latch on them The read latch prevents anybody from writing them or doing any splits the right latch prevents anybody from Also modifying them. You always have to restart Yes Usually we talk to you needed to do a split but then we try to acquire the right loss again Right latch but keep going right right flushes again, and then we saw that we don't need to do that anymore because some losses between those two Say something else, right? So your statement is um Yes, save for beginning. So if I assume that it's just start over. Sorry I can name that we would need to split so we acquire the right latch Yes, but then we saw we no longer need to do it now because then while we were trying to acquire those right latches some other products would change the structural resistance So your question is if we're have say we're like Maybe like here So I hold the read latch on this And I hold the right latch on this and then because I at this point I need I need to modify it But also I don't know whether someone's going to change the change something that would cause this thing to get modified as well But again, everyone's going in the same direction So they can't do that like they can't get to they can't make any change here because I hold the relatch on that so They can't modify this node Right Yes Yes Yes Yeah, the question is in this example here when I got down here and took the right latch on f to do the insert And when I recognize oh, I got a split therefore I need the right latch on this and I don't have it to have to restart Do you just hold this the whole time? No Then if if you are like insert 24 and then insert 25 Yes, so when you insert it 24 you get down to the f node Yes, you and if you identify that you need to split but then the insert of 25 is already at t so when you release the red lock and the Before you get the Red lock of the room the insert 25 comes down to the bottom. Yes. Recognize that it's Autonicly yes, so that comes to the situation that we described the insert 24 will Uh, get the red lock and go down through the room and then split f. Yes, when the insert 25 comes down it is okay Yes, all right, so the reference what she said so say I had this example here I want to start 25. I got to the leaf node and recognize. Oh, I got a split. Let me restart And take right latches down But he needs to be between the time I restarted somebody else came along and wants to insert 24 And they're gonna have the same issue. They also have to split this So they come back as well and take take right latches on the way down But now because both of them are taking right latches only one of them is going to proceed at a time So now 25 say the guy that wants to start 25 he gets there first He inserts this and splits Then 24 is a lot of run It gets down here. It doesn't care that it already got split Again, this is a good example between the logical correctness and the logical view and the physical view I don't care my index where my key actually exists So I don't care that like oh, I try to put it here So make sure I put it in here the next time because I couldn't do it the first time I want to go exactly this page. You don't care Every single time you come into it. You're doing this traversal from scratch You don't care how you got there before So it doesn't matter that 25 inserts here. It splits or maybe 24 came first and splits It doesn't matter. It's still balanced and still correct Yes Correct so he said the second traversal for 24 It doesn't need a right latch because 25 already split it correct that so that that's more expensive But what's the alternative? Right the alternative is to take right latches every single time So optimistic is not perfect. We're not guaranteed to always do the least amount of work we need to do Because certainly if I'm if again in this case here my nodes are really small So I'm splitting and a lot if I'm inserting a lot So I'd be wasting a lot of a lot of cycles a lot doing wasted work Should reverse just to find out I need to come back and take right latches So in practice if the contention rate is high and therefore the optimistic assumption is incorrect You're actually going to be slower than just doing the pessimistic thing But for the these data structures in general For for what we're talking about here the optimistic one actually works the best the For the the hash table stuff actually haven't seen numbers in that case the it's oftentimes the the The pessimistic approach of taking latches on the page Is is actually pretty good because it's so simple For this one we can get more fine grain and and we get a big win But it depends on a lot of things depends on what the workload is are we insert heavy and look up heavy delete heavy Uh, it depends on you know the distribution or values depends on how many cores we have Right, it varies a lot in practice though most data systems just pick one approach They don't try to be adaptive because this is from an engineering standpoint is way more complicated Yes So So he says uh for b plus 3 nodes you can't use the low level slot latches Like you can in a in a page table No, because you could be modifying the um You could be modifying the Again, the physical structure of the index itself. So therefore i'm updating pointers so like if I have like Like if I need to split merge and I I need to have latches for all the keys In this in this node in order to move them around So so in general you just take a latch on the entire page I think that's true. I could double check that though It makes things more complicated. Okay So again, this this is just to reiterate what we already talked about Uh again for the for the for the search for the better the better lock Latching algorithm same as before insert delete. It's it's again You take relatches on the way down if it fails then you just come back and restart so Again, this is what I was saying before about how we're assuming that most of the time Taking the relatches on the way down is going to be good enough We're not going to have to restart right and therefore if we if we if we we choose correctly we predict incorrectly then That first time we went down. It's just wasted work. It was burning cycles And so we're not going to get the the better scalability or concurrency We may actually want but I'll say in practice. This is this is this usually works out nicely All right So the next thing to talk about is how we actually support leaf node scans So in the example I've shown so far with the B plus tree just like in the hash table All the traversals were in one direction. They always top to the bottom So there can never be any deadlocks because I never had a thread trying to come up from the bottom to the top In reverse direction and try to hold latches that holds latches that another thread wants, right? So If though now we want to start scanning on leaf nodes Things become more complicated because now we have things coming from top to bottom and and also from left to right Because in this case deadlocks could occur So let's see how we handle this So the first thing I'll say is the original this is before the original B plus tree did not have these sibling pointers on the leaf Nodes, this is what how most B plus trees have this now and this comes from the B link tree that was invented here at C and E So let's say I had this really simple tree like this And I have thread one wants to find all keys less than four So we take a relatch on the root Come down here after we get the relatch on on C We can release the relatch on a and now we want to start scanning scanning across Right, so say we reverse order on all the keys in this node But now we recognize that we got to keep going over here right So just like before in the case of crabbing when we want to go Uh horizontally We don't release the latch that we hold until we acquire the latch that we want So in this case here in order to get the latch on B I hold the latch on C once I acquire it then I can swing around and then release the latch on on C so in this case here For all keys less than four basically keys From less than four to negative infinity So we know that we're going to have to hit the you want to get to this end of the of the of the tree There's other tricks you can do like having like fence keys Or hint keys basically to tell you for this node here What's the keys over on this side here to tell you whether you even need to jump there or not? But for this example, we don't need to worry about that All right, so let's make it more complicated. Let's say now we have another another thread that wants to find all keys greater than one Well, okay, that's fine So the both of them start they both want to acquire the relatch on a that can happen because that that can be shared amongst them And then they this guy gets the relatch on B. This guy gets the relatch on C. That's fine Then they scan all their keys and they start going across And for this point here B wants the latch on C C wants the latch on B That could be shared right because the relatches So at this point here, they both acquire the alternating ones of the different ones That's good. Then they slide over and now they release the latch that it just came from So because the relatch can be shared. There's no deadlocks Right, so this works out fine. So let's talk about now when we have we have rights so thread one wants to delete four And thread two wants to find all keys greater than one So at the very beginning they start off They can both get the relatch on a because we're doing that the optimistic latch coupling technique or latch crabbing Where I at my route I always acquire the relatch and only get the right latch on the On the the child node. So the very beginning they both have a relatch. That's fine And then they both go down here B gets the relatch on on sorry thread one gets thread two gets the relatch on B Thread one gets the right latch on C because that's the entry that it wants to delete So now let's say that T2 wants to scan across because it's finding all keys greater than one So before it can jump into Into C it has to get the right latch on C or sorry the relatch on C But it can't do that because The first thread has the right latch on this this node So what should happen? What's that? He says should wait What else could we do? There's three choices, right? We could wait Right again thinking of that while that we just spin in that we could Kill ourselves and just restart the operation Or it could be like a gangster and try to steal it. You know take go over here Kill shoot it in the head take its wallet take its lash and then take over All right, so raise your hand if you think we should wait 25% raise your hand if you think we should just kill ourselves Even less raise your hand if you think we should be a gangster and steal it nobody So what's the issue here? What what is this thread know about this thread? nothing Right Because all the latch is just just little some bits in the data structure this and then someone someone acquires it in either read mode or write mode So there's no global view in the system to tell you what this other thread is doing The database system at a high level Sure, it says no i'm doing i'm doing delete on four But at this lowest level inside the data structure as our threads are traversing through we don't have access to that information Because that would be too expensive for us to go look up Again, we want these operations to be really fast because we're holding this latch on this guy here You know why we're trying to get that other latch So we could wait But that could be a bad idea too because we don't know what this guy's doing Right, we don't know whether you know in this case here in our example It's just deleting this one record this one key And it then it's done But we don't know that it could also be trying to acquire the latch on b And therefore I have a deadlock So the simplest thing turns out to be the best thing is this we say we don't want to live anymore And we use abort and kill ourselves And just restart the operation Right, this is the fastest thing to do because there's again these these latches are super dumb Like there's no information about who who's holding them and what they're doing So rather than try to reason about anything We just want to immediately stop what we're doing and restart And and assume another time we come back then the latch we wanted is now there. Yes Better than waiting says how's it better than waiting? So yeah, you can wait a little bit With a timeout and then eventually that the latch you want is not available. Then you just kill yourself That's a you could do that as well, but like i'm talking like maybe wait microseconds So his statement is back up here he said if we're down here on c For thread one doesn't thread one have a right latch So his statement is that if if C really wanted to go in this direction and do some modification Wouldn't have to have a right latch up above and therefore I uh This thread would not be able to get down here to go across well. No, right So say the blue thread starts first it gets the red latch comes down here and gets the red latch that I want some b Then t1 starts gets the right latch on on on a Right, and then gets the right latch on this So it doesn't know you don't know this like it can come in any order. Yes So his statement is which is true we could have starvation here where This thing here says I don't you know, I can't get what I want. I'm going to kill myself tries it again same issue. Yes And there's different ways to handle that that adds additional overhead In practice I don't think my sequel and post guys do anything. I don't know what the commercial guys You can do it. You can imagine how to do it. It's just it's extra work and it may not be worth it The simple thing might be the best thing Yes What do you mean like the whole program Like the process. Oh, no, no, no, no, no It's like so it's like so like an operation so this like Find all keys greater than one. We restart that actually, so um actually perfect next the next slide so The the way to think about this is that we have this database system We have this execution engine that's invoking queries and it says oh in order to get the tuple as I need for this query To enter this query, I got to go to the index and do final keys greater than one So then it invokes that on the index and then there's basically a retry loop that's inside the index where I keep retrying that to do that operation on that index until it succeeds for For inserts or things that could potentially violate an integrity constraint Yeah, you have a check to say, you know I try to insert and I couldn't because it would violate the integrity constraint not because I couldn't get the lash I wanted And in that case you you abort that up you abort that operation But in general you just keep trying this forever and because eventually it'll go through But to his point you could lead to starvation or just burning a lot of cycles Trying to you know, traverse the bottom and try to acquire the latch that you're never going to acquire But the main thing out the main takeaway when you get out get out of this is that Because this there's a potential for deadlock here, but we don't know there what the other side's doing Right, we want to be super conservative and just kill ourselves immediately We can wait a little bit. Sure, but we don't want to reason about what they're trying to do We just say we can't get this latch and really retry Because there's nothing else up above that's going to say, oh, there's a deadlock. Let me break break it by killing one of you Yes The statement is it wouldn't matter what that what kind of latch the other thread is having Sure. Yes. In this case here, it's this guy has a right latch. I can't I can't get the relatch that fails Has Right, so that was the gangster one, right? So that was saying like this guy has the relatch Maybe I prefer read read or write and therefore I want to kill this guy Sure, you can do that. But how do you actually implement that in your code? Now you need a way to interrupt this guy In whatever it's doing to then go steal the latch That's super hard because again, we're doing this these small critical sections I don't want to check a global variable says did somebody hate me and I and I should die Right So kind of coordinate that it is just it's not worth it in the back. Yes Uh-huh Okay, so I think you said here if if this thing actually when I do the delete I have to I have to merge and if I have to modify the root Yeah, how would that work? Well, again, like I would have to have the right so when I landed here using optimistic latch crabbing I would recognize. Oh, I'm going to have to merge and modify my parent So I got to go back and take exclusive locks all the way exclusive latches all the way down So that that gets him. Yes We avoid this issue if we like Whenever we get a lock whenever we decide that we're going to not unlock the parent We just lock all the children not just the one we're going to modify If we're like right about the leaf or something like that I say to you if if say to you if we're in this situation here So like we're in this situation here. Let's say we're going to split or whatever. Yeah, right So we have we maintain our right latch on a yes, right? And we say okay We're going to do some operations here require us to modify a So we just obtain blocks on all of the children of a because we know that a is a parent of leaves And then we never actually get starts a read operation on v And so then we then wait for the finish and then the read latch Starts and then we never have to work about this like Yes, he's actually correct. So so This is what that What he just said like say if the blue thread came first then a is trying to acquire all the children Locks and blue thread is firing the lock of that type So right, so Like so We tried to just do what you're saying so If I if I know I need to do a split in here and therefore I may have to spill over here I want to acquire a right latch on this and then and then a right latch From all my all its children And then that would allow me to do any modifications that I want to do Which includes updating the sibling pointers, which is tricky And then you're saying that could cause a deadlock because someone could be coming in a different direction The blue one already has a lock on v now What he said you have a lock on a and c and you want to acquire a lock on v Yes And he wants to Right so you kill yourself There's no point of acquiring locks um No, so If this guy had a split I have to update the sibling pointer too So you do you do need to acquire a latch on this guy as well But again the simplest thing is like I Another thing you can do to like say two threads at the exact same time to enterprise the exact same latches In practice, there's you know, there's enough They're not gonna be an absolute lock step meaning like if you bought them at the same time They immediately come back and hit the same conflict They're gonna be slightly different with each other But even then you could say all right, I've tried this before and I wasn't able to do it Let me back off a little bit and that way I at least come in staggered Then I avoid that issue and the simplest thing to say I didn't get the lash And I wanted to kill myself immediately and that avoids all deadlocks And that's gonna be different than when we talk about two-phase locking later on for transactions Because We will we'll have something else can come and resolve deadlocks for us, but we don't have that here okay So the last thing I want to I want to finish up discussing is a Is an additional optimization for handing overflows And this this comes from the b-link again the b-link trees what what first invented the the sibling pointers And then everybody does that now in a b-plus tree for the most part at least in one direction so Normally every time we have to do an overflow have to do a split in a node We have to update three nodes. We have to update the the node being split We have to create a new node to overflow into and then we have to update at least one parent Or in our in our ancestry to now accommodate that new separator key for the new node that we added So the b-link tree guys came with optimization where any single time a leaf node overflows You actually hold off on on updating the parent node So that you don't want to restart And this restart the traversal and do the pessimistic right latches all the way down You just update a little global information Global information table for the for the tree and says Anytime somebody comes through that part of the tree again. Here's how it wants you to update it So let's look at an example. So say I want to start key 25 Again, I do the optimistic latch crabbing on the way down. I get read latches I get to here on c again. I'm When I get the right latch on On f I hear is that I would recognize that I'm going to have to split But then rather than restarting and taking right latches I just give up the read latch on c I still do my insert Uh and add the new the new the new node But then rather than having to update this thing. I just have a little Global global table for the tree that says if you ever take the right latch on this node c Here's the change. I want you I want you to add in right And that way the next time somebody comes through and takes the right latch They'll do some extra work and finish updating what we wanted and this is still correct. This is still valid Because if I come along and now do a look up on 31 Well, I follow the The the pointers down and my pointer for for all keys greater than 23 will put me here And now I you have to know all right. Well, I should have this overflow thing if I'm looking for 31 Scan along the leaf node and that's actually what I'm looking for Yes Right, so basically so now there's now there's a Again this global thing that anybody can see when they first start says oh by the way if you If you're doing a modification and you're going by c take a right latch for it So this guy wants to now insert 33 I can do relatches all the way down To get to b and then but now for c I would know oh well I was told that I should take a right latch on this. Let me go ahead and do that Now I do I finished the propagation of applying that change there And then now the tree is considered considered valid right and take the right latch and complete my operation So it's just it's like rather than having to do the restart You update this thing to say all right the next time you go through somebody else will take care of performing. Yes How would you identify c and what do you mean? It's a page id right Or a logical node ID if you're going to see you know page 123 By the way apply this change for me Yes Do you update what sorry So back here Yeah, so so Yeah Yes, I'm for simplicity reasons There's different ways to do this in this case here if we don't have the same parent Then we may not have a sibling pointer to go in the reverse direction There's different limitations, but if you want to have bi-directional sibling pointers. Yeah, you'd have to update that That makes things way more complicated than what I could show in a class And when you get the link from the c to How about over to we need to update that over to link as well Uh, would like this thing? Well, that's just a sibling pointer But yeah, you have the update and say yes the sibling pointer the overflow thing is not there anymore But it actually doesn't matter anymore. Actually you actually can keep it right because because up above if I'm looking for things Uh greater than equal to 31. I'm never going to get this node. Anyway, I'd always get to this one So you don't you don't actually have to update it Yes Correct his statement is the first person for this for this particular optimization The first person that will update c applies this change and because they hold the right latch on it. It's an atomic operation Correct. Yes. Otherwise it could be changed. Yes. Okay, awesome All right, so let's finish up So hopefully I've convinced you that Uh, you know you want to do the lashing stuff, uh, but it's notoriously hard to do Right. I glossed over the sibling pointers how to keep those in in sync. That's a whole another, uh Bag of what we don't want to talk about that's super tricky. Um But again as I said the good news is that because it's super hard And if you can do this people pay you a lot of money to do this Uh in practice, I would say that you know, there's there are There's actually surprisingly, I mean there's there's a bunch of concurrent data structure Libraries that are out there the intel thread building blocks is one of them Facebook's folly so in general for Low-level things like you know internal hash tables and things like that That aren't being used as part of doing query processing storing data as an index All the shelf stuff is probably good enough All the commercial systems roll All the high-end systems roll their own data structures for these things But for table indexes, I think that having to having building a data structure that's specific to your database system Is super important because then you can tailor it towards whatever whatever your target operating environment is So the other thing I'll point out too is although we talked a little bit about hash tables We spent most of our time talking about b-plus trees But the core ideas that I've talked about Like making sure threads are always going in one direction to avoid deadlocks Killing yourself right away if you do encounter a deadlock Maybe optimistically Assuming that you're not going to have have to do modifications to to the structure And therefore taking a fast path first all these techniques are reused all throughout computer science and data and systems in general So it's not just b-plus trees. I bought these techniques are applicable everywhere. Okay All right, so any questions about about what we're talking about so far today All right, so the good news is next class we we can finally start talking about how to actually actually queries We know how to store them. We know how to index them And now let's talk about actually how do you you know run queries on top of them and produce results? Okay? All right guys. Have a good weekend. See you I'm bless. Let's go get the next one then get over the object is to stay so I'm Rick and say jelly hit the deli for a part one Naturally, bless. Yes. My rap is like a laser beam the boards and the bushes say nice Crack the bottle up and say no, I said but do those you don't realize you're drinking only to be drunk You can't drive keep my people still alive. And if the same don't know you for a can of pain pain