 Today's lecture is the start of the the next three lectures on indexing So today's class is gonna be on Sort of looking how to do locking and latching in the in the context of indexes and what this is gonna set up for us Is that on Monday next week? We will talk about latch free indexes So today's lecture sort of traditional ways of doing locking latching an index Then we're gonna throw all those away on Monday and say well you want to actually want to try to maybe use a latch free index and Then we'll come back on Wednesday next week and say oh actually no You don't want to throw away the techniques we're talking about here away away You want to add them back would add them back in a smart way, okay? So for today's agenda, we're gonna talk about index latches locks and latches the difference between the two of them Then we'll talk about how to actually implement a latch and I will say for this conversation here It's not specific to indexes when we have latches in other data structures inside our data system or other critical sections We want to protect with a latch we can use these latch implications And then we'll talk about how to index latching and then do index locking finishing up, okay? So I don't think I needed to define what an index is for this class certainly everyone here should have the Requisite database background But just to make sure that we're all in the same page by an index We mean some kind of auxiliary data structure that the database system is going to maintain to allow you to speed up the retrieval or access of tuples that are in a in a table and this is this you know classic fundamental trade-off in computer science between the sort of computational costs and the storage costs of Your algorithms their techniques so in exchange for having faster retrieval operations on our on our table We're gonna pay a penalty or having a higher overhead in terms of doing rights to that table Because now we need to make sure that the index is in sync with with with the table So we have to go apply your updates to the indexes and then we're also going to pay a storage cost in terms of maintaining the index Whether this is in memory or on disk right indexes don't come for free We have to store them somewhere right and so this is the this is sort of the answer to why anybody always asks like well Why can't I just add every possible index on my table for every possible column I could have to speed up every possible Query right you can't because it's really expensive maintain this and it's gonna cost a lot to store things So again, the best definition I can give for an index in the context What we'll talk about here is that we want to think of it as like a glossary and a textbook right if I want to Read the chapter on trapping. I Don't want to scan every single page to find the thing I'm looking for I want to look in the glossary look up the keyword and that'll give me a Page number that I can jump to Define the thing that I want and that's the tension what the index is going to do for us right so the tricky thing about indexes and we'll see this We'll see this today and we saw a little bit of this in the last lecture is that They're essentially a second copy of the data And now we have this fundamental problem where Anytime we update a table we have to make sure that update is propagated to the index if necessary Or likewise if I update the index, I better make sure I'm not that's in sync with the table Right because I don't want false negatives or false positives Someone tries to go read the index looking for a key that is in the table, but not an index and they would find nothing Right And so this is the phantom problem that we saw in last class that the different Concertial protocols are trying to deal with in different ways And then essentially in today's lecture We're going to see how to actually enforce the serializability or avoid phantoms directly in indexes themselves Whereas in the heccaton paper we saw that they just re-execute the scans and see whether you get back the same result in the The hypercase they were using the precision locks and then in the cicada paper you guys read They were storing the indexes directly inside the tables themselves. So you get courage to go for free So for today's lecture, we're discussing how to deal with that when when you don't use those mechanisms Right and these are the more traditional ways to do this in in in databases So the essentially two classes of data structures That you can have for indexes We have the order preserving indexes and the hashing indexes So order preserving indexes are anything that's going to maintain the sort order of the key that you're indexed on Or you know multiple keys if you're multi key index it's going to maintain them in some sort order And uh, this will allow you to support all possible predicates to do lookups That you may have if you want to do an index point query or in a race range query Your order preserving index can handle that and in general all your lookups are going to be a log n The other classes indexes are hash tables or hashing indexes and this is essentially a giant associated array that's going to map Uh, some keys or hash of a keys to a particular record um And for this you're going to get o1 lookup because you just hash the key and you jump in theory Basically to the right that's exactly to the thing you need You may have to do some extra steps depending on what your hash table how your hash table designed Maybe do a scan to find the thing you actually want, but in in in practice. It's o1 So for this lecture, we're only going to focus on the order preserving indexes Uh For hash indexes they are they do exist like you can in postgres for example You can say using hash or using hash table to get a hash table index But it pretty much in every single database system when you call create index you get one of these and then the the Things we're going to talk about today are mostly applicable to these tree data structures The latching stuff you care about for hashing as well, but like the gap locks and things like that you don't have So easily in in a in a hash table. So we're going to focus it focus it on here Okay, so I'm not going to talk about the The basics of a b tree or b plus tree Again, that you should have covered that in an introduction class. The only thing I want to say here is that uh just to sort of To make it clear when we say a b tree or b plus tree Those are actually two distinct data structures that mean different things and I'm totally guilty of this I'll sometimes say b plus tree or b tree when I really mean a b plus tree As far as I could tell if you go read the postgres documentation or go read the source code postgres They refer to things that's a b tree, but I'm pretty sure they mean a b plus tree Um, so just what we understand the the distinction between the two The original self balancing tree or one of the first ones was from 1972 Uh, and then this one what they did was they stored all the The keys and the values could be in any node possibly possible node in the index Right, so you could maybe traverse down one level and you would find the key You're looking for and the value would be right there that you could jump to the record that you wanted And so this is more memory efficient than the b plus tree Because every key only appears exactly once in in the index Right in a b plus tree The key value pairs are always on the leaf nodes And then any inner nodes are essentially guide posts that allow you to figure out whether you need to go left or right as you traverse the tree And so the what'll happen is you could have the case where a key could be duplicated Uh in in the inner nodes because you need them to figure out exactly where where you are Or if you delete a key it may it may be removed from the leaf node, but it still exists in the inner nodes right And so the This is going to be less patient fit into this because you may have duplicate keys But in practice everybody implements a b plus tree and not the b tree because Managing concurrent access to a b tree is more difficult so in a b plus tree all all the All the operations you perform will emanate essentially from the The the leaf nodes right from the bottom going up In a b tree you could have to do a split or merge At a higher level in a higher level in in the tree But then you don't know whether somebody below you is actually trying to do a split or merge as well And now now you're gonna make sure you carefully balance or manage the two the two the two changes So in practice everyone's going to implement this And there's some variations to b trees right there's b link trees b star trees Right and there's bits and pieces of them that exist now in the modern incarnations of the b plus tree But in general when people say b plus tree They're they really mean sort of like the classical one plus this extra stuff Okay, so the At this point in the in the class of the semester Right, we already know about two phase locking Uh, and we already so we know how to protect the the objects in our database Right, we have shared shared locks kusa locks and we can set them if for an in-member database We can set them in the header of our tuple and we know how to do compare and swap to Acquire those and we know we need to do deloc detection or deloc prevention to ensure we don't have any problems But with a index Uh, we can't just use two phase locking because that means a transaction would be holding locks on Nodes or keys in the in the In the index for the entire duration of the transaction And that's going to limit the amount of concurrency you can have And so we're going to actually end up use use something different than two phase locking And the reason why we can get away with this is that from the transactions point of view It doesn't care about the uh Whether the physical data structure of the of the index changes As long as the logical contents are the same So what I mean by that is if I if I want transaction I do a look up and say do you have key 22 and the index is yes, I do Then if someone comes along and inserts a billion keys more keys And now the complete physical data structure has changed And where key 22 was before is now in a different location in memory As long as I can come back and ask if it has key 22 and it gives me the same answer I'm fine with that from the transaction standpoint, right? That's still that's still serializable. That's still consistent So because of this we can be uh, we can do something different than we normally would in a uh in a In a you know regular database with two phase locking So let's look at a really simple example here, right? So we have a b-plus tree We have a single node and only has two keys 2022 So if my transaction comes along and I want to read 22 I do my look up in the root. I find it here and I'm good, right But now transaction two comes along and he inserts 21 and that's going to be in between these two guys here So now I'm going to have to split this node Uh move key 20 over here and move key 22 over there And then I can insert 21 and that causes me to reshuffle how what I'm storing in my index So if I come back now with 22 and do my look I was doing transaction one and do a look up in 22 It's not in node a anymore. It's now node c, but I don't care because I got the same answer But we'll see this when we start to talk about uh granular index locks This means we're not going to be able to do the same thing that hecaton did Where you could store pointers to the actual tuples that you hold the locks for or in your read set Right because I would have a pointer to this node here, but now it's something completely different So there's some logical information. We're going to have to maintain in order to Protect our transactions When they access the index right and this is fundamental issue again that we have is because this is a copy of the database That we need to be in sync with the actual table But it's a sort of completely different beast Um, we'll see other issues too like if you if you have to scan across the nodes that causes problems Right, there's it's more than just looking up for single keys. It's all possible access of this We need to make sure it's consistent and serializable All right, so I went through this the distinction between locks and latches at the beginning of the semester I'm just going to do it again at a high level overview because this was in the paper you guys read Uh, again, so this was a survey paper on on doing locking techniques inside of a b-tree By gertz graphy. I think it's a really good paper because it summarizes basically the The sort of standard methods to do all these things now from 2010 the indexes we'll read about Next week will be newer than that but a lot of the basic ideas are still applicable So in the paper gertz talks about the distinction between locks and latches Right and we said that a lock is going to protect the logical context of the index for other transactions And we need to hold them for the entire duration of the transaction And we also need to be able to roll back any changes that we we make it for anything that we have locked Because the transaction may end up end up aborting because of a conflict because the the applications that do abort it And in a latch this is this is going to be used to protect the critical sections of the index Essentially, it's internal physical data structure From other threads that may be performing operations at the same time we are And so for these latches we're going to hold them for the duration of the operation once we know it's safe to to release our latch From a concurrent programming standpoint For the physical data structure, we can go ahead and release it And then we don't need to roll back any changes because we still may be making modifications that are still actually correct Right. We still may be updating pointers That we need to make sure that they're still there even though our transaction rolled back because that's still correct Right. We saw this in the example I showed about the page table and a disk based system The transaction rolled back So but we don't want to roll back the update to the page table because the page actually does it does exist there So in the paper he makes this commentary about Trying to understand what does it mean when people say they have a lock-free data structure? And he basically says there's there's two things people could be talking about The first is that they mean no locks In the actual database itself whenever transactions Want to access or modify the database and so to do to do this Right. We could we could use the optimistic occurrence control that we saw from from last week Um, because that didn't have to acquire locks as it was running But we're still going to have to set latches Anytime we install updates Then the alternative is that you'd be saying that when you have a lock-free index you really mean it's no latches And for this we're going to use compare and swap to flip pointers To install updates to that we want to make to our index atomically So we'll see this on monday with the skip list and the bw tree These are examples of latch-free Indexes and they're going to be rely heavily on this technique here And so for this the construct is The idea of the concept of no latches is just for the physical data structure But in the context of actually exceeding transactions, we still have to use locks to actually validate whether they were actually Whether they were actually allowed to do the thing they were supposed to do right So for our purpose in this lecture, we're going to focus on On both locking and latching and indexes and then next class. We'll see how to do a no latch index Um, but in terms of no locks, we sort of already covered that with concurrency 12 from before All right, so now we need to talk about how you actually implement a latch Um, you know, I've been throwing this word around But we should actually talk about what we're actually how do you actually do this? So there's essentially four major types that we care about in the context of a database I'm not saying that this is the exclusive list, right? There could be other techniques, but for our purposes these are the four major ones So we have our dirty os mutex the test and set spin block q-base spin lock and the read register sets I'll go through examples of these of these But the fundamental primitive that they're going to use to make these Latches work is the compare and swap technique that I talked about before Right remember I said that this was a single instruction that modern CPUs provide you That will allow you to check the contents of some memory location To see whether it matches the value that you expect and if it's equivalent Uh, if the expected value matches the uh, what's actually there then you're allowed to Atomically install a new value Right, so in this case here, I checked to see whether the value was 20 It was so then I was able to flip it flip this in 30 Yes Your question is this generally guaranteed through virtualization. What do you mean by that? So your question is if that runs in a vm, would this still be atomic? Absolutely. So the vm is essentially um The hypervisor allows you to uh, basically go enroll execute all CPU instructions, right? What if you had two cores execute this instruction the same memory location? Okay, so his question is if I had two cores execute this exact same instruction at the exact same time what happens? Give me like two slides. We'll discuss this. Yes right, so um, so the The concept is basically pretty easy to understand again. This is a cpu intrinsic meaning the compiler will convert this to the single instruction you need Uh in order to invoke it if you're on a cpu that doesn't have this And essentially it'll it'll either throw an error and say I don't support this or it tries to rewrite it as With the gear like if clause doesn't like test and sets or not test and sets like if clause doesn't maybe the mutex All right, so The first lock is the easiest one to understand if you've taken a basic operating system course, right? This is when they teach you critical sections. They teach you about mutexes, right? um so a blocking os mutex is basically the like the primitive you get in either p threads or I guess actually c++ 11 you get the the standard mutex Right, and this essentially is just a wrapper around p thread mutex And this is just a wrapper around a few tex So then I knew what a few tex is He says it's based on a spin lock, but it has some additional features Yeah, so he's right. So a few tex stands for fast user mutex So what this is going to be is it's going to have a spin lock Which I'll show in the next slide that you that you try to acquire a Do a test to set try to acquire a flag that sits around in user land If you if you can get it then you're done you hold you hold a lock If you try to get it you can't then you fall back and acquire actually a dirty mutex in the operating system Right, which is which is super slow. So on practice setting a lock that actually has to go inside of the the kernel Is about 25 nanoseconds, which is a long long time Right, and so what will happen is if you try to acquire The mutex in the operating system and you can't Then the operating system is going to recognize that well You can't proceed because the thing that you need the lock you need is not available So it's going to and tell the and or inform the scheduler In the operating system that you're waiting for something that you don't have So the operating system is going to know enough to schedule your thread in the future Right, and then when you when when the lock is freed It notifies your thread then you're going to schedule from your quantum and then you can execute Right again, so this is the basic lock we teach you in operating system courses Right, so basically it looks like this. I do standard mutex I can do lock and then I do whatever I want to do and then I unlock it Right, so there's no deadlock detection here This is all you know, this is at the latch level the way to take like this And so it's up for us as the database programmers to make sure that we write principal code To ensure that we can't have it. We can't have a deadlock Right So again, so this is just a wrapper around this which is just a wrapper around a few texts so You have no idea how much I hate mutexes are like we should probably put a Script set up so that you try to commit something and there's a mutex It'll throw an error. I actually went this morning and I Disappointed to say we have a shitload of mutexes in our own system I don't think you know how many are actually being invoked, but this is an example where we don't practice what we preach I tell you tell us that we shouldn't use mutexes and my own database doesn't has mutexes So that's bad. I know there's one in the net ring thread though, right? Yeah, so in that case you need it because there's some other external package, but uh For other parts of the code we shouldn't be there. Okay So if you don't want to use mutex, what's the next best option? And that's the spin lock the test and set spin block So these are really really efficient because it's a single instruction to acquire the lock Right do the test and set compare and swap if you catch the flag you're good to go, right? So it essentially looks like this So the the standard template library provides you with this atomic Template and then you can put in whatever type you want to be atomic Uh, then they also have this atomic flag, which is essentially just a like a specialization Of this with a with a bullion, but it's guaranteed to be atomic So in this case here you could put something that is uh, you know an arbitrary length like you put like 128 bit data type in there But if the cpu only supports compare and swap at you know 64 bits Then it's going to not allow you to do that It's going to have to rewrite it to use mutexes or some other primitive So with something like atomic flag or small primitive types the test and set operational and it will be guaranteed to be unique So, yeah, so this is just a uh, this is just a Another way to say a ton of bullion So We do our test and set if we get the latch we're done If not, then we drop down in this wild loop and then we have to make some decision on what should happen And then we spin back and try to do the exact same thing Right, so the difference between this and the dirty mutex from the operating system is From the operating system standpoint it thinks you're actually doing work Right because you're just spinning through here You're executing instructions, but you're not actually acquiring the latch you did you need right whereas in the mutex case You're telling the operating system. I can't actually run because I I can't get that mutex So therefore the this operating system is not going to schedule you in this case here You'll keep getting scheduled and keep trying to run to until you can get the lock So the thing we have to deal with though is now is it's up to us as the database developer To make decisions about what should happen if we can't get the lock Right, so we could just yield our thread back to the os and wait for our next quantum come back and try again Uh, we could abort if we you know keep maintain the counter I said if we tried so many times if we don't get it I'll kill ourselves or we can immediately come back and try to get it right away All right, so typically in these in these latches You're protecting critical sections that that should only take You know a small number of instructions Right, so you don't want to set a latch for something that goes over the network and comes back because that's going to take a really really long time So in this case, this is why this is okay. Maybe just to spin and spin because about the You'll be able to get the latch you need fairly quickly So Relate to his question earlier What happens if you have multiple cores trying to do this exact same thing at the same time Well, this is why this thing's not going to be uh scalable or cache friendly So what'll happen is if I have a single latch and I have multiple cores trying to do test and set at the exact same time What's going to happen is because this is a modifying instruction I have to go send my instruction or my model you do the test and set Wherever that memory location is is located Right, so if these are running on three different sockets and say that the this cache line with our latch is in this This socket's local memory Then I have to send now a message over our bus Like the qpi if you're on a xeon system to say test instead over here, right because I can't access my local memory Right, so this is going to be really really slow And now you're going to have all the network traffic trying to do cache validations and trying to update this and acquire the latch Right So everyone has to go to the same address the same cache line and you have the same message to do that And then the cpu does it does that underneath the covers So this sucks, right for simple things this is okay, but you have a lot of threads This approach is not scalable The better approach is called q-based spin locks Um, or sometimes if you if you google this that'll do called mcs skin blocks mcs locks the stands are meller crummy and scott It's like the two dudes that invented this and so what's going to happen here. This is this is much more efficient Uh, then a regular mutex because we're not going to the operating system But you're gonna get better cache locality than the the sort of naive test and set spin lock Because you're going to actually maintain multiple lashes and have each thread only try to acquire their local one So it looks like this so we have a base latch Uh, and it's going to have a next pointer And this next point is basically the location where we try to install uh a new latch And if we can do that then we know we've acquired this latch So if I my first cpu comes along I try to do a test and set on this this thing the next pointer points to nothing So it's going to be allowed to install a new latch in my queue in my chain And then now I've essentially I've acquired the latch this guy's acquired the latch for the entire thing So now if another cpu comes along he's first going to go check for this try to you know Check to see whether that's null It isn't so then he follows the pointer to this Then tries to do a compare and swap in its next pointer to install the next latch and then it just spins on this Right, so this would be local to its memory It's the only thing that's reading it trying to test and set on it While it spins so the cpu is going to put this on on a cache line that's close to him The third cpu comes along same thing check the first one not null not null Get to this one install another one and then spin over here Right, so there's no there's no never traffic now to do these spin checks the test and sets because it's always going to be At a cache line that's close to you Yes So his question is cpu 3 came here It tries to do a test and set So that one you have to send a never message Yeah, you can't avoid that Same here, right? No, the like the network message is this is the right the test and set is the network message, right? Correct yes, right so like the The cpu is smart to recognize that this guy keeps trying to read this thing It can move the memory location to be in the cache line close to this Right and it does that for you I mean there's there are some hints you can you can tell that you're you know You should move my memory closer to me We'll see this later with like numeric control and things like that But in general it'll do it on its own Uh, and then so the the big thing though is like this was one test and set It's not that big of a deal. It's the spinning part is what kills you So i'm going to spin on this and that's local to me right So now the tricky thing though with mcs Uh spin locks is that If when this guy, you know finishes he releases the latch release of the lock You know, we we have to get a notification to release this one and say hey, okay now you're allowed to go Um, the tricky thing though is if you have somebody in the middle decide i've spun enough i'm done And they want to stop trying You need to know, you know remove him and then make sure that the the chain still is is valid So that part is non-trivial But it it is doable So an example here i'm showing You know a pointer to a latch Something like this this is not something real and just showing this for that showing this as the explanation that you could wrap a pointer To some other more complex object Use using the atomic flag So this is what you get in linux. I think also too. This is what actually linux uses internally For its latches for its internal data structures as i'll think i think 2014 they've added this All right, the last of the latch is called a reader at a lock. Um, and this is more common when you have when you're allowed for concurrent readers um Basic idea is that we have a single uh single logical latch That can have two different modes whether it's in a read mode or a write mode And then for each mode we're going to maintain a calendar that says the the number of threads that either hold that Latch right now and or the number of threads that are waiting to acquire it So the first thread first thread comes along and it wants to acquire the read latch If it's not being held by anybody So it'll do a compare and swap to increment this and then now it knows that it holds it has access to the this read latch Second thread comes along does the same thing Uh recognizes that nobody else is is waiting over here. Then we do a test and set on our guy Increment the counter to and now we have the latch Now that everyone comes along and wants to do a write It would check and see that this is this counter is greater than than uh greater than zero So it'll have to add itself to the waiting queue here and it just spins and waits for this Now if another thread comes along, uh, and was quite a read latch, what should happen here? What's that? Just go and go go get what? He says go get read lock somebody else said waits. Yeah, so you don't you don't want to starve this Right, so you're actually going to have him wait add him to the wait queue and that way when these readers finish up You can give you give right access to it Now you're technically correct. You could implement it as you said where you just keep letting the readers go, you know one after another Uh, but you could lead to starvation. So exactly how you want to implement this is up to you But in practice people always, uh, they try to have fairness by ping ponging the modes back and forth all right Depends on what what you're trying to uh What you want to be emphasized or what is what is more important for you? All right, so these are the again, these are the four basic types of latches you can have uh the reader writer one and the sort of Easy spin lock are probably the most common ones, but the mcs one is is definitely a is better Cheese better scale scalability All right, so now given these Uh, sort of latch types now we can talk about how you actually want to implement latching in an index So the most common technique is called latch crabbing And the basic idea here is that We're going to have our threads acquire and release latches in the b-plus tree Whenever they're traversing our data structure Okay, and what what's going to happen is When a as our thread has a basically a cursor as it moves from one node to the next uh, if it recognizes that Its current node is considered safe Meaning it's not going to any whatever the modification i'm making is not going to call as a structural change to The other nodes above it in the tree Then it can go ahead and release its parent latch and any latches that that are even farther up in the tree All right, so we'll deem a Child node or current node threat as safe If we know that we're not going to split our merge when we update it and we can determine this by If it's an insert we know that we have space to insert something so it's not the node isn't full And then if we're deleting something we just check to make sure that it's not it's at least half full Or more than half full because if we delete something and then then we don't have to merge with our siblings So the basic latch Crabbing protocol is pretty pretty simple to do a search you always start with the root And and as you go down you acquire read latches on the next node And then you release the the lat the read latch on your parent Right, and then you reach reach the bottom you have your leaf node. You read whatever it is If you want to read for insert or delete you start at the root and you acquire right latches on the way down And then once you acquire the right latch to your child if you if it's safe then you can release the right latch to your ancestors So let's look at really simple examples here. So let's say I want to do a search in 23 The first thing i'm going to do is acquire the read latch on a Jump down to c acquire the read latch on that now it's safe for me to go ahead and release the latch on a because I'm doing a read only operation so it's not going to change the structure of the tree So at this point here i'm fine And then I try to acquire the read latch on on f I can release the one on c and then i'm down here and I can read the thing that I want to read Right and this protocol in itself is guaranteed to to ensure that No other no other thread we modifying the index that would cause my pointer to now go to garbage and memory So let's look at a delete So again, you always have to start off with the right latch at the root node Uh, then you get down to c and here we're doing a delete And so we know that in this case here at c I if I delete something below me because I at this point, I don't know what's what's below me I could delete something that would cause me to have to delete a key in here Um, because I'm I'm coalescing nodes down below But because I'm half full if I delete if I have to delete 35 Then now my node is entirely empty So at this point the this this node here is not deemed safe So we can't release the Latch on a because we may percolate our change up to a So then we get down to the bottom and g and at this point we see that We we we're more than half full So we'll be able to delete something and not have to have structural changes up above So it's safe for us to release the latches on a and c and then down below We complete the key that we want and then we're done we release our latch Inserts basically the same way right right latch on a Right latch on c at this point here. We have room to insert a new entry if need be so we can release the latch on a Then we jump down to g and at this point we see we're full So we can't release latch on c because we're because we're gonna have to split this So we can go ahead and create our new node h Apply our insert and then make our modifications to c and g as necessary And and then we're we're done here So one thing I was point out is I acquired the latch on c and acquired the latch on g But I didn't acquire a right latch on h y Right, it's obvious. I have an exclusive locker to be a right latch on c Uh, that means that nobody can read this anyway, so nobody can follow this pointer and see this So implicitly it's already latched. So I don't have to I don't have to do that Right, so then I release my latches and I'm done So I think I covered these examples in the intro class and I've asked this question before Uh And it should be sort of obvious, but in the case of doing insert or or delete What's the very first thing I always had to do? Get to lock the root, right? And so this actually becomes a huge bottleneck now in your system because if I'm modifying the index I can't even read anything like even though I might be modifying on the right side of the tree I can't read anything on the left side because I've always taken the the right latch on the root So this is going to become a bottleneck in the index So way to improve this is an algorithm from 1977 Where you're going to basically going to Optimistically assume that your path from the root to the leaf node will be safe And therefore you just take read latches all the way down And then when you get to the the leaf node you get a you get a right latch on that And then you do whatever it is the modification that you need to do But upon traversal if you realize that that your path is actually not safe Then you just abort the traversal come back and do it the regular way where you take right latches all the way down So this is also sometimes called optimistic lock coupling We'll see another version of how to do this basic idea in the art index from hyper on uh, wednesday next week All right, so basic works like this vow and delete 44 I take a relatch on a I take a relatch on c and then immediately I can release the relatch on a because I know I'm safe Then I get down here on G and I'm acquired the right latch I can immediately release the the relatch on c and then I go ahead and do my delete So I didn't interfere with anybody else that might be reading at the same time They were able to read anything the other parts of the tree without without causing problems Right, and then the insert basically works the same way so The key observation though I want to make about Latch-crabbing is that again this is a technique to protect the physical data structure of the index from concurrent access right Uh, but it's because we're releasing the latches From our index nodes immediately after we do our operation Uh, this is not going to protect us from phantoms because we're not protecting the logical contents of of the index So let's look at some problem scenarios All right, so the first one here. Let's say I have a transaction That wants to first check whether key 25 exists So it takes a relatch on a relatch on c gets down to to to this point here Where where 25 should be Sees that it's not there and then it goes off and does something else Right and say when it wants to come back and start something but it has to come back and reinsert it as a different operation So now for 20 transaction two It goes ahead and starts 25 and let's say it acquires right latches all the way down And then it can assert 25 But now if my transaction comes back and tries to insert 25 again It'll take the proper latches on the way down, but when it gets to the bottom Now 25 is there and that's a conflict So the latches protected the physical data structure, but the logical contents of the index were not protected because of this So this is a fan the thing that was we ran first time and it wasn't there We ran at the second time and it was there The second scenario is that say we want we want to have a range scan Let's say that I want to scan of the range between 12 and 23 So again, I'll take my relatches all the way down. I get here where the starting point is on my scan Uh, and I look for 12 And then I realize that I want to get to look up to 23 And so I will I will know that my balance I'll record in my index node what the min and max keys are for my node and I would recognize that for this node e I I have a value less than 23. So in order to complete my scan of 23. I have to jump over here and look at this so In this example here, although I'm showing pointers that can go in both directions In practice, you can only safely go in one direction along the leaf nodes Right So and usually everyone always has them go from from left to right Right, so I I'm allowed to go acquire the latch for this without having to come back all the other down Again, right because if everyone's always going in the same direction I can't have a deadlock because no one no one's else can be trying to come in this direction at the same time as me So I'm allowed to go in that direction Um So right is in this case here again, I complete my scan I see I get all the values between 12 and 23 in this case. It's just 12 and 23 Then I release my latches transaction two comes along and it starts 21 Right latches all the way down and then it starts it in this leaf node here f For come back and do that same scan again Again, I have the same problem where I'll come down here do my range scan and then the key 21 that wasn't there before is now there Right, so these are two examples of phantoms so what we need to now do with index locks is Allow ourselves to protect the logical contents of the index from other transactions that could be Modifying at the same time again I said last class or said earlier in this class actually in the case of hyper Hecaton and cicada they have additional mechanisms In place to avoid all these these problems And the things we'll talk about here are sort of these are the sort of the classic techniques from the 1990s that Some commercial systems like dd2 will implement but not all of them. There's a whole another class of algorithms From did you guys read about in the mvcc survey paper? The serializable snapshot isolation graphs from michael cohill and most in postgres For our purposes, we'll just ignore those for now. We can talk with those after class if you want But that's that's another example of another way of doing this But that ssi stuff came from like 2008 Whereas these these mechanisms are from again from the 1990s Okay, so these with these index locks is again the difference between index latches is that We're going to hold these for an entire duration of the transaction We're only going to acquire them on the leaf nodes of our of our index because we don't care about the logical contents of the of the inner nodes because in the vplus tree we said that They could just be copies of keys, but not the actual, you know, they don't actually have to be in the index But then another key difference also too with versus the two phase locking stuff We talked about before is that we're not going to store these index locks inside of our index All right, we can't do that because we're not going to know how to easily go find what what locks we hold With pointers and things like that because the location of those locks may change So we're at the maintain a separate data structure on the outside And this is the classic lock table that you would see in a disk basis, right? So you have some notion of a lock and then you keep track of What transactions hold the lock usually the first one in the queue and what mode they're in And then you have all the other transactions that that are waiting to acquire those locks Right, and this is just a hash table and we'll see this is going to be a problem All the problems we had with regular two phase locking in a disk based system that we talked about early in the semester These now all come back if you're using index locks because I need to protect this data structure with latches So I have to so as I tourist the index I acquire latches on the index and then I have to acquire locks But that's acquire latches on the lock table to acquire those locks, right? So this all becomes a big concurrency bottleneck Um, and this is why the in-memory concurrency papers we read before don't do this Okay, but this is one way to there's one way to solve the problem All right, so I'm going to talk about five different locking schemes for indexes Uh, the first one will be predicate locks from the original data system system r or the original relation one of the early relational data systems um We'll see in a second, but nobody actually does this but it's good to know about And then we'll talk about what people actually do and then the way to sort of think about these is that These are sort of building blocks we can use to construct more complex locking mechanisms or locking protocols By applying these techniques as we go down and as I said this The bottom one's here. This is this is more common in real systems So as I said many times in this class in other classes I I love the system r project because It you know, it's basically it got eight people apeshe's in the same room and they said build a relational database and everyone figured out how to do one piece of it, right? and Jim gray and others wrote this paper in 1976, which is the uh, they basically laid out the fundamentals of transactions and consistency and two phase locking and They recognized that there was the fan on problem that they had to deal with when they have indexes on their tables So they proposed a locking scheme called predicate locks in order to deal with this But as I said, this turns out to be too complex and nobody actually implements this And we'll see why in the next slide So basically what's going to happen is for all the queries that your transaction is going to execute Uh, it's going to extract out the predicates from their ware clauses or you know Or if it's the insert that the values you actually try to insert And then it's going to try to construct a multi-dimensional space and determine whether the Space represented by different predicates along different dimensions intersect with each other And if they do then you know you have a conflict and therefore you you could have a phantom So if you have a select query, you just extract the where clause same thing for delete and uh update For an insert you just extract out the values And so it's like regular two phase locking when you have shared locks for the selects and exclusive locks for the modifications And so you have to know that if you have two shared locks intersect that's okay But a shared lock with a right lock or a right lock with a right lock it would intersect and that's a conflict So let's look at a really simple example here. So let's say I have one table called account And then I have two queries running for two different transactions So the first query wants to compute the summation of the balance for all the bank accounts Where the name of the account is biggie and then the second query or second transaction wants to insert a new record for biggie Uh into the account table So the way to sort of think about these predicate locks is that Say this two-dimensional projection is the space of all possible records we could have in the account table And therefore if I take out the where clause I can represent The space here inside of it is corresponds to all the accounts within all records For that have the the name biggie So now for my insert query, uh, this is actually going to be a subset of the space for biggie Because it's where account's name equals biggie and the balance equals 100 So in this case here, it's obvious to see that these things overlap with each other and therefore I have intersection So if I executed this query once did my insert Exceed this again Based on this these predicate locks represented in this space here. I would know that I would have a phantom and therefore I have to do I have to make sure that uh, that That either one of these transactions support because otherwise scheduling them or executing them would violate serializable ordering All right, so this is really simple simple to reason about and imagine if you have a ton of attributes and very complex Very complex queries Then trying to figure this out on the fly as you execute queries And and determine whether transactions are allowed to finish can be quite expensive So this is why they they never actually implemented and nobody nobody else actually implemented So now the precision locks from hyper that we talked about last class This essentially is a simplification of predicate locks So what we're doing here is we're actually taking two predicates and seeing when they overlap In the case of precision locks all they had to do was look at all the queries that your transaction executed Generate a tree And then they don't need to look at other predicates of other queries They just look at the actual values that of the attributes that were modified in those delta records And then map them through the tree to see whether they evaluate to true or not Right sort of simplification of of these predicate locks um in the case of predicate locks also to the One advantage is that you can just look logically at the at the query itself without actually having to run it Determine whether you have a conflict. That's not true for all possible queries But for uh, if you don't have nested queries, this is true We have nested queries get tricky because The output of the inner query could be used as a predicate in the outer query And so you don't know what that actually is until you actually run the query So handling that in predicate locks, I don't think is is possible or triple to do either So that's why nobody actually does this so instead the techniques we're going to talk about are going to be Uh acquiring locks on the indexes as queries run Right whereas these can be computed offline for all the locking techniques we'll talk about next They have to be computed online as as transactions are running So the most primitive type of lock you can have your index is called a key value lock And essentially this is just a lock that covers a single key, right? So I can have a transaction that says I want to lock key 14 And this basically means that nobody else can can delete it because you hold the lock for it Uh, likewise say you if you want to be able to insert something later on with the exact same key You can have a virtual key lock key value lock that says I don't have 14 yet, but I will But so I therefore I hold the lock on it There's another example why you can't store this directly inside of the index because uh You know if 14 doesn't actually exist. Where do you maintain that that lock? Right and technically the the space of this is is of all all your keys is is You know to the low value and high value And if you have to maintain a slot for every single possible key key value you could have In order to acquire a virtual lock for it later on that's going to be uh tricky to do All right, so now we can start building on top of this so in addition to locking Exact values that either exist and don't exist in the key value lock We can have what's called a gap lock That allows us to uh a transaction to acquire the lock In between two keys that do exist um And I want to acquire the position in in between it so in this case here I can acquire a a gap lock between 14 16 to be here So it's basically the range 14 16 exclusive So if anybody tries to insert anything like 15 or any you know some decimal variation of 15 They can't do that because another transaction will hold the gap lock for this All right, so again The the this the size of these gaps are infinite And so you can't store them directly in the index node We then it may be a split or merge and therefore The what used to be a gap in this node may now be a gap in another node So that's why I have to have a centralized data structure in my lock table in order to maintain these things All right, so now we can build upon this thing further and now combine the key value locks and the gap locks And make what are called key range locks And it's essentially basically saying that you're taking a key that does appear in in your in your relation or your key space And then you're going to acquire a lock on that as well as the gap lock either that precedes it or Sustained it either comes with foreign after And so for this we're now going to be able to define you more complicated things like define what lock mode we want To allow us to increase the amount of community we can have So in my two other examples of the gap locks and key value locks I just assumed they were exclusive locks But now we start doing things like having shared locks and exclusive locks and intention locks To provide further hints to other transactions to say what we're actually trying to do with the things that we're holding the locks for So it sort of looks like this right so I can I can acquire A next key lock on 14 as well as the gap that comes after it So it's I held the lock from 14 to anything that appears less than 16 um I also can go the other direction with a prior key lock where I can say I'm taking 14 Anything less than that up up until 12 exclusive So you have to actually implement this either one way or the other you either have need to have prior key locks or next key Locks you can't mix them In practice as far as I know everyone always implements as next key locks I know my sequel does this. I think this is just because you know, at least in like, you know, the You know western western europe in america, whatever we read from left and right So and we scan from left and right So I think that's why everybody always implements it this way But there's nothing that scientifically says you couldn't you know, you couldn't go in this other direction Right, would you have to pick one or the other? Um, again, same thing if we have artificial values That don't exist yet, but we won't actually acquire the lock for them. Uh, then we need to have a virtual keys All right, so now with we're now with the key range locks Uh, we can expand our modes even further to do more complex things So instead of just acquiring, you know, a key value or A next key lock in shared mode exclusive mode. Uh, I can now combine them together Uh, and actually acquire locks or latches on the actual pages themselves. Sorry locks on the pages So with hierarchical locking is exactly the same stuff we talked about before With multi granular locking in their two-phase locking in the intro class Where instead of just saying I have again a shared lock or exclusive lock on on Individual objects or elements in my in my key. I can now have intention locks at the higher level elements So what I mean by that is now I can acquire a uh, a range lock in intention of exclusive mode for for the the range from 10 to to the gap right before 16 Right, and then inside of this now I can acquire an exclusive lock For uh, this range in here and that's compatible with with my because if I if I hold this then I can I can acquire this down below it So now what can happen here is another transaction thread can come along And it can also acquire an intention exclusive lock for this range here But then it can acquire an exclusive lock for a a a Individual key that does not conflict with this one and this is perfectly fine And we and what the intention? Lock mode allows us to do is it's a hint to other threads that may come along that may say Oh, I want to read something in this range Uh, but that won't be compatible with intention exclusive So I know that uh, I'm not even trying to bother locking this Um, because I I'm not gonna be able to get it So these intention locks are going to allow us to minimize the amount of time the amount of the number of times We have to go into the lock table and say hey, I need to acquire something So let's say I if I wanted to get an exclusive lock on all of these values um I would have to make an individual request in my lock table for every single key and every single gap But if I can acquire an intention lock on a higher range of things Then it's one look up in the lock table and say hey, am I allowed to acquire this if yes I'm done if no then I I know that somebody else is doing something and there's a conflict So as I said the lock table is not free And we're going to minimize the number of times you have to go into this But we there's a trade-off to making sure that we don't acquire too many locks That would prevent a concurrency and parallelism So these intention locks are a way to provide hints to others about what we're doing without having to Sacrifice parallelism so again We can use these These different types of index locks To protect our ranges that we scan protect keys that we read protect keys that we read it did not read And so that when we go to then validate our transaction, we know that nobody else could have Modified our range or calls the phantom for us because they would not be able to acquire the locks And Therefore, I know that I'll be serializable In the case of cicada, they don't have to do this because the nodes are side the index sorry the nodes are inside the tables You're already guaranteeing serializable ordering of your for your transactions on the the data tables themselves And therefore that that you automatically get that protection for your index nodes In the case of hecaton again, they would just scan this over Scan the exact same query scan exact same range again See where they get back the same result and if you do you know that nobody else inserted something and deleted something So therefore you are serializable. You're good to go Okay, so any questions about this So as far as I know no in-memory database Does this does any of these things this is only to be done in in sort of the commercial disk space systems Okay All right, so to finish up, uh, the the hierarchical locking is Provides basically predicate locks without the complication of dealing with that multimetal space, right? It's allowing us to lock ranges In our index and in our key space To prevent somebody else from deleting or inserting something in that and again the difference between this this explicit index locking versus the predicate locking Is that predicate locking can be done before you run the query? All these other techniques have to be done as you run the query right as you access the index right And then the latching techniques that we talked about before is allows us to ensure that we're going to always have a consistent data structure so the Case of peloton. We don't actually do any of this Uh, so we actually are not serializable. So if you actually had a range scan on us We we base or basically snaps to isolation In the mvc survey paper you guys read they talk about running everything at serializable isolation level But that just means that uh The way the student with the with the benchmarks is that there was never any range scans So you you were never not serializable, right which Whatever, okay All right, so next class. What are we going to talk about? We're going to talk more about indexes So i'm going to first start off talking about different ways to represent the actual values of data inside of your index I'll spend some time talking about how to do memory allocation and garbage collection because you'll need this for this the second project And then we'll go through sort of three different data structures for in-memory databases So t trees are the original in-memory data structure that people developed in the 1980s Oracle times 10 used this in the 1990s But nobody actually does this now and this is actually the exact opposite of what you wanted to use in memory database And this is because the hardware is changed Then we'll talk about the bw tree from from microsoft hackathon and then we'll finish up talking about the concurrent lock free skip list Which is in the second project and is what used in msql. Okay Any questions? Yes So this question is uh, is hierarchical locking in an index too difficult to implement and therefore, uh Only commercial systems do this So a lot of the techniques came out of ibm like the key value locks and the gap locks were from the aries project So that made it a db2 I actually don't know what how my sql enforce the serializability postgres does it does the ssi grass which is something different And oracle doesn't provide serializability I would say in general. I mean It's a lot of work. It's hard to get this correct and Again for some systems. They don't say we're not even going to buy the serializable. So they don't do it But again the all those in memory guys. We saw what last class all all provide serializable isolation But they just don't do it this way All right guys, uh, see you on monday. Take care Hip-hop related ride a rhyme and my pants intoxicated lyrics and quicker with a sip of moe liquor Trist on my city slicker play waves of liquor rhymes. I create rotate at a rate too quick to duplicate Fill a breeze at the skate mics at Fahrenheit when I hold it real tight when I'm in flight Then we ignite blood starts to boil. I heat up the party for you. Let the girl run me and my mic down Well, all your freckles still turn with third degree firms for one man I heat up your brain give it a suntan to just cool. Let the temperature rise To cool it off with same eyes