 So last night we sent out the the URL for the for the project page for project number two So the the the index skeleton files we give you for the skip list. They're all ready to go I was looking at the test case last night with Dana as well and like the the test suite We give you is not really obvious. It's a lot of stuff So I'm gonna try to clean it up for the next two days And then we'll send the email out how to do a rebase to pull down or fetch and rebase pull down our latest changes It just make it easier for you to like reason about what your your index is actually doing rather than just You know it spits out some answer to say whether it's correct or not but the one thing we're not going to provide you with our Integrity tests to actually check the internal structure of your of your skip list So because there's no standard API that we can provide you because we provide you the API That sort of basically explains how you should do it So this is something you're gonna have to write on your own your own right to make sure you don't have dangling pointers make sure you don't Things are ordered correctly Right, we'll have high-level tests to say like if I certain this thing Can I get it back or I do a scan and do I get things the right order? But we're not going to check the inside right we're only to check at the the index API level So I encourage you to write those things yourself All right, so today we're going to talk about index locking and latching and so the the Last you know three classes have been about concurrency control and this is dealing with how to protect The data inside your database at sort of a logical level like your tuple level From concurrent transactions and the idea is that we want to provide the serializable Guarantees of our databases or asset guarantees and the current coach team is gonna let us do that so now for today and The next two classes now we're actually be going at a sort of lower level and saying how do you actually protect the The index data structures so before but the currency told me sort of ignored the idea that you have indexes But in reality there are indexes and these means we have another point of contention another thing that could become incorrect if we have Concurrent operations, so we're gonna see how we deal with that All right, so for today We're discussed the difference between locks and latches in the context of indexes then I'll discuss the different ways you implement a latch How to do latch crabbing so how do you traverse a tree? And then we'll finish up talking about doing logical index locking and logical index locking you would need to do To prevent phantoms, but as you saw in the the silo paper or the hecaton work They weren't actually doing index locking they were doing sort of extra work at the end the validation phase So this is sort of a technique of if you're if you want to do this as you run To check make sure that you're you're not getting phantoms index locking is the way to do that All right, so I don't think at this point We really need to define what an index is and inside of a database or everyone should know what it is The the the main thing I'll say about it is index are gonna let indexes are allowed to speed up Accessing data specific items of data that we need But there's this trade-off between yes, we have an index to make our lookups go faster But we're paying of the cost of having higher overhead when we update the database and Higher overhead in terms of storing the indexes, right? So if you again if you have every single index you could possibly have in your database It's gonna run really slow when you update it because you have to go through and update every single every single index We talked about this last time when we talked about having to update the physical address and secondary indexes Even though you may not actually update the values of those indexes You still have to go update the where they're pointing to and then storage space obviously if you know if you're building an index on every single column that's gonna take up memory and That's gonna you're eventually gonna run out of space put put data in So the main thing the index is gonna provide for us is that we'll be able to quickly locate data without searching through every single road the The metaphor that I always like to use for an index It's basically the glossary in a textbook if I want to look up the chapter on trapping I would look up in the in the index or set the glossary and it would tell me what page Or pages I should go look at that has that that term so index is basically the same thing so we'll see examples Later in the semester about certain database systems where they don't let you to have any indexes at all So Vertica does something like this And the idea there is like there's two ways to think about this You just sort of pre sort everything on the case of Vertica And that's sort of the same thing as having an index because you know have a structured way to find the data You're looking for or there's other techniques like you have basically this this cursor that just is just iterating through the entire table Over and over again reading from beginning and doing a quick scan And then if you want to access something you just show up and jump on the cursor and scan along to you find the data You're looking for and sort of jump off so like a train going around in a loop So in that case is you don't need indexes because you're just gonna jump on the cursor and go where it goes But for our purposes here for all the TP you almost you always want to index because they're usually doing point Lookups right to find a small number of records So that's the context that we're talking about this class and the next two classes how to do is for all the TP applications so There's basically two types of index data structures you can have and again This is sort of a refresher from from the introduction course, but you have the order preserving indexes Things like the B plus tree the skip list to BW tree and these are gonna be like a tree like structure That's gonna allow us that's gonna maintain the keys in some sort of order And that's gonna allow us to do any possible type of query predicate or evaluation We want on it right we do the range scans you do the point lookups And and depending on how the data structure is implemented We can go in both directions but you know ascending and descending and so the for I think last class I said it was and log n it was as a mistake It was it's a log n searches to do to traverse the tree and to find the things that you're looking for And that's excluding if you have to scan the leaf nodes And then the other type of indexes you can have or hash indexes and again This is just like a hash table you expect in the Java or C And this is in the associative ray that's gonna map the hash of a key to one particular record in in the database And for this the nice thing about hashing indexes is that they provide o1 lookups because it's a single hash and you jump to The place you need that's gonna have the data you're looking for you may have to look at multiple entries if there's like You know collisions or chains, but in general you're not having to traverse the tree The the cost of doing this lookup is not as dependent on the size of your key space as it is with the order preserving index or tree index Right, so what's the obviously downside of the hash indexes? What's the one type of query you can't do on them? Range where exactly right because there's no way to because the keys are not gonna be actually stored It's just the hash of the keys tell you where to go. So for for the next three classes We're gonna be assuming we're gonna be dealing with order preserving indexes Because there's some interesting properties or the interesting Things we have to deal with in terms of locking and latching when we have these types of indexes We'll talk more about the hash indexes Later on when we talk about joins and other things So what I'll say too is like if you you know if you create a new database in my sequel or whatever your favorite database System is and you call create index Chances are you're always gonna get an order preserving index now whether it's a b-plus tree or something else It doesn't matter, but it's always me one of these And the reason is because if you call say create index on your brand new database You spend a lot of money on and it gives you a hash index and the first query you write is select count star over some date range It's gonna be super slow because it's gonna do a scruncher scan the hash index doesn't help you at all And that's usually you know not something people would expect So that's why I'm there's always the database system will give you one of these And it's not to say that the database system is not going to use hash indexes And so as we see as we go along for use all the time internally because it's not always exposed to you as as as the application developer so the It's a quick sort of background also to for for these order preserving indexes So the the venerable b-plus or I should careful the venerable b-tree Came out in 1972 and this is sort of the first you know The first definition or explanation of these sort of self-balancing trees And so in the original version of the b-tree from 1972 They would store those the keys and the values across all the different nodes of the tree All right, so sometimes people in databases they use the term b-tree and b-plus tree interchangeably And I'm guilty of this but there is a concrete difference between the two So again the b-tree stores the the nodes sort of the values and the keys and values all throughout the tree And then the context of a main memory database the values are always going to be 64 bit pointers to the tuple that we're pointing at and so this is more memory efficient than a b-plus tree because the Each key is only going to appear once in the in the tree structure right now contrast is with the b-plus tree where they're only going to store the values in the leaf nodes and The inner nodes of you know That's so that the non leaf nodes and inside of the tree They're going to be essentially as acting as guideposts that allow you to traverse the tree and find the leaf Nodes that have the the keys or the values that you're looking for the keys and values looking for So the reason why everyone uses b-plus trees as we see as we go along is that it turns out it's easier to manage Have concurrent access to a b-plus tree than it is to a b-tree because in this case here You know that the only modification you're going to make to insert or delete A new record or key value pair in the index is only going to be done at the leaf nodes And so therefore as we talk about crabbit You can see you can release locks or latches as you go down because you're not worried that someone else is going to You know change the internal structure of the tree above you because they're not allowed to do that without getting to the bottom first So you're just sort of having all the locks and latches you acquire as always in the same direction going down Right so think about this it head is in a b-tree You know say I was doing like I was inserting a new record and I was in an inter-level tree and then I Have to take latches on that and then someone below me insert something to it's on the same path as me And they were both trying to do splits or merges at the same time That would be really difficult to manage because you have to sort of handle deadlock and other deadlocks and other things like that So in practice the b-plus tree is always going to be The better choice and this is what everyone implements now. We'll see on Tuesday next week. There were some Database indexes that came out of the 1980s that were thought to be main memory optimized these things called t trees and They were used for maybe about a decade But everyone gave up on them because it turns out not only be B plus trees really good for disk systems They're also really good for in memory systems as well So again, even though for the most part almost every single in memory system and memory database system is going to implement a B plus tree Even though they're in memory and it was designed for it from the 70s when it was meant for disk so So far and everything we talked about we we we've talked about how to lock objects in the database So we talked about two-phase locking we talked about how to acquire locks on Tuples or pages or tables or whatever, right? But the tricky thing now when we bring in indexes is that we can actually treat them differently than the sort of logical objects And in our database right the tuples and tables All right, and the key thing is that we are going to allow other threads to make modifications to the index to change its actual physical structure and That's okay, and that still be correct as long as the logical contents of the the index are still consistent So what I mean is that if I do start doing splits and merges all through my index as long as I can still see the thing That I expect to be there. I'm okay with that Even though the structure has completely changed So a really simple example here say I have a one You know a height of one with one node B plus tree, right? And so we have this node a and has values 20 and 22 and we only have two keys in our index So the first transaction comes along and wants to read key 22. Well, it just does a binary search in this thing And it finds what it's looking for But now save another transaction at the same time while transaction one is still active. So transaction One has gone in red 22 and popped out now transaction two comes along and it wants to insert 21 Well, it wants to put it between 20 22, but I only have sport two keys in here. So I have to do a split And I'll move 21 22 over here and then modify the route to now have the 21 as the guy post here all right So now when I come back with transaction 3 I should be transaction 1 but transaction 1 comes back And once I read 22 again, it doesn't care that now it's 22 is over here on this different node right it only cares that the logical Contents of the index are consistent to what it was last time. I saw it's 22 before and I see 22 again I don't care where it's actually stored all right, so this is different than sort of the the The two-page locking stuff we saw in in for our tuples Because this is now actually the internal how things are actually stored right so the the survey paper ahead you guys read is I actually like it a lot because it really lays out like here's everything you possibly need to know about index lock locking Latching and it's sort of well written and you kind of understand it You know, it's it's a pretty easy read in my opinion. So in this work He defines it makes a distinction between the locks and latches We have to do this because in the context of database databases Right locks don't mean the same thing in like operating systems and other fields as it means in in databases We cover this before in our world a latch is equivalent to like a lock that people just use in the context of like an OS or other system So in for indexes the locks are going to be used to protect the logical contents of the index from other transactions And you have to hold them throughout the entire duration of the of the transaction itself And we need to be able to roll back any changes that we make to the index In case this transaction ever aborts Now contrasts again with the latch but that's getting that's me a low-level Protection mechanism or protection primitive that's gonna we're gonna use for the critical sections of the critical regions of the Internals of the index and we're doing this protect the index from other threads and this is important to think between with other threads and other transactions because a Transaction could be executed across multiple threads Right if you submit a batch of two queries exactly the same, you know for the same transaction The databases we can decide thread one's gonna execute the first query thread two is gonna execute the second query So then now they could be going at the same same index So we're protecting the latches are going to protect the data structure from having concurrent access from different threads Even though it's still in the hot same higher level of transaction And so for this we're going to hold the latches for the duration of the operation that we're doing and the duration is sort of the same You can think of this as sort of the API you guys are implementing for the project right and sort of record Read a record right that's considered one operation So you need to hold all your latches while you're going in and doing that one thing and then when you pop out you release everything and then for this we don't need to worry about rolling back any changes because We don't care or we don't know anything about the transaction Right, we just know that there's some operations that are going on Inside of an index and whether that's in the context of a transaction or not It's left to the higher level parts of the system right essentially left to the concurrency control component in the system So he has this other table that I really really like that lays out again the distinction between the locks and latches and But describes in the how it's so they vary in all these different parts of how you would use them Right, so you would say for an index lock It's going to allow us to separate user transactions and it's going to protect the contents of the database And we're going to hold them throughout the entire duration of a transaction And we're going to have different modes have shared exclusive update and intention locks And then if there's a deadlock we're going to rely on the concurrency control opponent of the database system to have a way to Detect that there's a deadlock and possibly resolve it right this is the deadlock prevention the detection stuff We talked about before a toothpaste locking and then we can the way we're going to do this It's either waits for our timeouts or our boards like the wound to wait wait and die That's all of these things here and the information we're going to keep about this is me held in in the lock manager And not the index itself And I can crash this with the latches again We're using that to protect our threads for the in-memory internal data structures for the critical sections and then we're going to only have two modes of locks when I have reads and writes and Throughout this lecture, I'll be I'll try to distinguish a latch in a lock by saying there's reads and write latches and You know shared and exclusive locks But they're essentially the same thing right a read lock is Is compatible with other read locks and the right lock that has exclusive access to whatever the data item it is and the key thing though, but how we're going to avoid deadlocks is Through avoidance and coding discipline So what I mean by that is you as like the database developer the software engineer working on this index It's up to you to be smart and clever and mindful That you write your code in such a way to avoid deadlocks when different threads try to acquire latches So that means that you can't rely on some other higher-level mechanism like a background Degadoc detection thread to come in and rescue you You have to be careful careful about how you order the the different threads for acquiring latches in your data structure So that you don't have two guys trying to acquire the this, you know different things that they the other guy holds and Again, this is going to be kept in a protected data structure This is going to be essentially where to store the latch information Inside of the the sort of index itself or the different data data structure components Whereas the index locks is locks have to be kept in a somewhere else and we'll see why as we go along so Another great thing that he brought some brings up to is this sort of commentary about What it what does it mean to have a lock-free index and again? This is something where the the language about this area is imprecise So there's no sort of one canonical definition. We can point out to me to say this is what it means to have a lock-free index So the two possible choices you could have are he means that either a lock-free index does not have any locks as we defined in the last slide And so this means that transactions are not going to acquire locks access or modify the database through the index But then we're still got to maintain some kind of low-level latches to allow us to install updates in the index With no latches it means that we can use like Shadow paging or copy-and-write techniques sort of like multi-virgining to allow us to Save pointers using a atomic apparent swap to install changes without having to acquire latches to do those things Right, but it's still at the end of the day You're still going to use some kind of locks in order to validate transactions if you care about running with serialized while isolation level So the main thing he would you know to get out of this He's basically saying there's no free lunch that either you can have a index that doesn't use any locks But you still need latches to protect it or you can have an index that doesn't have any latches But when you actually try to go to validate the transaction to make sure things are correct you your need to acquire locks So again when someone says they have a lock-free index it takes a while to understand You know what that actually means and so for our purposes the skip list you guys are building is essentially the second One here, right? There's no latches the beater tree will be latch free and I think the art index that will cover as well will be latch free So Before we now can talk about how you can use latches inside of index We need to talk about what how do you actually build it? What how do you actually build a latch or how do you create one? So we're gonna go through the the four different main types there are there may be other latch implementations that are that go beyond this list But I would say these four are definitely the main ones that are used most often There might be more mother exotic ones that I don't know about but for the most part everyone implements one of these So it is a quick reminder from what we talked about last class remember I defined the This idea of a compare and swap operation Remember that I said that was a single instruction that the CPU provides for you that allows you to examine the contents of some location in memory Check to see whether it has the value expect to have and if so then you're allowed to install your your new value Right, so in this case here I'm checking the memory location M to see whether it has the value 20 and it does so in this case here I can flip it and make it make it 30 and I said remember that this is a provided to you by the compiler as an intrinsic or a built-in Where it looks like a function call just like libc or something But in actuality the copilot is you can replace this with a single assembly instruction to perform this operation And this is way faster than having to write the you know the code the C code the assembly yourself To do a if clause and then and then the store for the the value to write in this memory location Now I'm pretty sure every modern CPU supports compare and swap like I know arm does power and obviously x86 So but if you don't have you're running on some kind of CPU that doesn't have this and what the compiler will do is essentially Recognize that you can't do the single instruction for this and just replace it with the actual assembly to do the to do it by hand And then they'll install their own u-tex or whatever you need to sort of protect this make sure that it's still atomic Again it for compare and swap. I don't think it's that big of an issue when we talk about more about SIMD you'll see there's certain cases where The if you don't have sort of the more modern SIMD instructions the compiler just replace them with the The assembly code to emulate it and you don't see the same performance benefit you get if you have the single instructions Okay, again, so this is the low-level primitive that's gonna allow us to do It's a lot to implement all the latches that I'm going to talk about now So the first way to implement a latch is to use a operating system blocking u-tex So when you took your operating system system course as an undergrad This is basically the thing that they teach you to start off with right and the reason because because they're simple to use and There's sort of they rely on the OS to do a lot of the hard work that We'll see later on that the data system has to do if you don't use this right The problem with the with the blocking mutex is that they are super super slow and non-scalable You should never actually try to you know use one of these mutexes, right? It's roughly about 25 nanoseconds per lock and unlocking vocation, which is a lot So in the case of how you actually implement this in C++ 11 they added in this in the standard template library They added now this is mutex keyword, right? So now they you know when they added when C++ 11 came out They basically sort of adding in you know multi-threading and other primitives that people have been using for years and all these different libraries But now it's been sort of codified and the actual standard itself was kind of nice But standard mutex lease on Linux is just a wrapper around p-thread mutex and the p-thread library has been around for you know decades And this is just a wrapper around the OS call a few tex They don't know what a few tex is Stands for fast user space mutex. So what happens is with a few tex is If no one holds it's basically a compare and swap flag At the high level so if your thread comes along it just tries to do a compare and swap to grab the boolean field to acquire the lock if No one else holds it you grab it right away, and you're done if there's contention on the lock meaning somebody holds it what happens is the The the your thread will then make a few tex call down into the operating system to add yourself to the queue of threads that are waiting to acquire this this this mutex and Then it notifies the scheduler to not schedule your thread anymore because you're not you can't do anything because you're waiting for this this thing Right, so this is why it's slow because as much as this calls now I mean if I try to acquire the Mutex I can't and I got to go down to the OS and walk myself anytime make a sis call into the kernel that's super slow And then if you're the thread that does own the lock and you release it That's another sis call to now go down back in the OS and say alright I release this mutex go wake up and interrupt any of the threads that are blocked on this and now start trying to schedule them to run So that's why it's super slow. You never want to do this But you know for for your own purpose or understanding this the way you would do this is you just to find Mutex here, and then you have the unlock and unlock and unlock commands Right and underneath the covers it translates that to the proper sis called to make this work So we haven't done this yet, but at some point in in Peloton We're going to add I'm going to add flags in the source code validator to throw an error if anybody tries to write a mutex You should not be writing a mutex for your skip list. That's a terrible idea Okay, so if people aren't writing mutexes And David says what's what's a better thing? The better thing to use is a what's called a test and set spin lock or just a straight spin lock And the idea here is that we're going to have that same sort of bullion flag in memory That we can then do a compare and swap to try to acquire and If we can get it we're done We have it if we don't then we'll just spin back around and try to acquire it over and over again So it sort of looks like this right so we have The the way you could define this is that you can then see plus 11 They define the atomic keyword or atomic type that that's templated and you can put in whatever the type It is you're trying to you're trying to acquire the lock on in this case here I'm using atomic flag and this is just a Specialization of doing atomic for a bullion and if you read the spec for C++ 11 They say that if you use atomic flag then this is guaranteed to always be lock-free where other data types and that might not always be the case So we define our latch and then we have a while loop that calls test and set This is essentially the same thing as compare and swap and if we acquire it. We're done If not then we loop back around and try to acquire it own over again This is called it's called a spin lock So you're just spinning back over try to get you know get get get over and over again So the question is now. What do we actually put in here inside the while loop? Right remember before we had the mutex I said that if you can't acquire the lock you get dropped down into the kernel and you get in your thread gets blocked And the scheduler knows not to run you again in this case here The schedule doesn't know you're trying to acquire a lock over and over again You can't acquire it could just sees you burning you know running instructions burning burning cycles So it thinks you're doing something So let's you keep getting quantum to keep on running when actually out and you're not doing anything So now inside of our data system we have to be careful about when we when we have these things We have to define what we actually want to do right? We could yield our thread back to the OS and let some other thread try to run Maybe that's the one that holds the latch we need and then it can can release it and we can pick up and get going We could try to retry over and over again and To infinity but it's usually a bad idea. So at some point we need to maintain a counter We've tried this amount of time but this indeed number of times we waited this long and at some point We're just a board and because we can't acquire it all right So these are super efficient to implement because again It's a single instruction to do this tested set because it gets down to be in compare and swap at the CPU But they're not scalable and they're not cash-friendly Then we think that's why they're not cash-friendly So let's say I have a single latch and I have three CPUs all trying to do test and settle on it To acquire it over and over again Let's say these three the threads are running on different sockets So every single time I want to invoke test and set I have to again do a lookup over whatever the bus is to the other socket to get to that location in memory and Try to do the test and set Or bring it to my local local CPU and do the test and set there depending on how what's going on at the motherboard so If I'm just spinning as fast as I can I'm sending all this network traffic to go to this one this one last location in memory and try to acquire it Right, so not like if I just doing a regular read if I read a remote memory location Then the the motherboard will put that the value of that memory A memory contents to my local CPU caches and all my reads will be real fast I'm reading the local thing But because this is some global state that's being maintained at another socket, then I can't really can't use my cache I have to go over the the bus to go get the get the contents and try to do the test and set So we'll talk more about numerous regions and and how you actually store memory in different different sockets And what what the implications of that are? It most becomes a bigger deal when you start doing OLAP queries We have to scan large chunks of memory, but in the case case of a spin lock This is this is a simple thing that shows up being a big problem So the way to fix this is use what called cube-based spin locks And these sometimes if you Google this sometimes to be called MCS spin locks And this is actually named for the guide two guys that invented it Melo crummy and Scott And this you get a spin lock in in Linux. This is essentially what you're getting one of one of these things here So the benefit we're gonna have a cube-based spin lock is that can be more efficient than a mutex and have better Less casual Cali than a spin lock So it looks sort of like like this. So It's gonna be we're gonna we're gonna create a chain or a queue of these different spin locks and have the threads that are trying to acquire the The lock the high-level latch They have to sort of there. They'll spin on just one latch That's unique to them or close to them rather than having all spin at the global latch so I have my base latch here and It's essentially gonna have a pointer to the next latch in in my chain or my queue And so there's no real type called latch I'm just sort of saying here's a here's a address to some memory location that'll have the next latch in our chain So when the first CPU comes along, they'll do a compare and swap on on the this memory location here And if nobody's holding it and it's allowed to acquire it and it's gonna go replace the location now with a pointer To the next latch in the chain All right, so now when the next CPU comes along they try to do a compare and swap on the base latch They're not gonna get that because this guy holds it and then we're gonna see oh well It's not no it's a pointer to some other latch. So let me try to do a compare and swap on that So it's gonna be waiting for this Now the third thread comes along it does the same thing it does tries to do a compare and swap on the base latch Can't get that tries to do base lat spin compare swap on this can't get that So then it ends up waiting on this one here and so forth and so forth. I said now What's happening is This guy's not spinning because he holds the lock But this guy's spinning on this and this will be local to him and this guy's spinning on that which is local to him So this sounds awesome right because now we're not worried about you know Cash coherence traffic or network check on the little CPU The tricky thing though is actually this is really complicated complicated implement because Remember if my if since these are just spin locks in my while loop I could do I could decide that I've waited too long and I want to abort So now if like this guy decides to abort I need to make sure that the chain is still connected So that this guy knows that he's now the next one waiting Right and that's not trivial not trivial to do but now if you could have you know really long chain You could have two threads the side they didn't want to abort at the same time You make sure that you don't have dangly pointers in that case. So Performance-wise these are better But this is this is hard to implement and hard to get hard hard to get correct And we don't support this in our database system But as I said, I think Linux if you in the Linux kernel this is this type of spin lock they use Okay, so I'm not saying that the I'm not sort of as I go through these different latch limitations I'm not saying one is better than the other when the immune text one is total garbage I can never use that right but it may be some cases where the spin lock is better than you want this if you know You don't have a lot of cores And then for the last type of latch again, I'm not saying that this is better than the MCS I'm saying this solves a sort of a different problem So the last type of latch we can have our reader writer locks And this is gonna allow us to have concurrent readers because we can have Multiple threads hold the same read lock at the same the relock at the same time So the basic idea is that now for our latch internally. It's gonna have two sort of internal latches You have a one for the reason one for the rights But they're also gonna have two counters to keep track of the number of threads that currently hold the the read lock And the number of threads that are currently hold the right lock and then likewise the number of threads that are waiting to acquire this So when our first query comes along our first thread comes along it wants to acquire the read lock It's not being held and the right lock is not being held So go ahead and acquire it and we just update this counter by one in this case here You're doing atomic add, you know to make sure that you're doing this in a single instruction without having to take a mutex for updating the internals of this thing itself Same thing next thread comes along must acquire the read lock. This is already being held by this guy So we can look we can allow it to also acquire it as well, and then we update just update the counter Now someone was coming along I'm once acquired the right lock and here we're gonna recognize that our counter is greater than zero So therefore we're gonna have to stall and wait And then now the last thread comes along and it says I want to acquire the read lock. What should happen here? Think you guys Did it acquire the locker now? Right, yeah, no because it knows that this guy is waiting for it to acquire it So therefore it has to stall and we update the counter to say here's another thread that we're stolen And eventually these guys will finish off and this guy can acquire the the right lock So again, the way you implement these are just using the same sort of primitive spin locks that that I showed before The tricky thing though is you've got to make sure that You don't starve out one side of the lock or another, right? And these able showed here. It was really simple heuristic to recognize. Oh, well, I know that there's at least one right lock So let me go ahead and acquire it right and you could do the same thing the other way. I'm holding the right lock and There's another guy behind me and when this guy releases I can decide whether to allow the read guide goes or the right guys go and Which way you do depends on what you're what you're what how you're trying to use this latch Right, you may want to prefer to have readers go or writers go and really again really depends on the application It depends on what the data structure You're trying to protect Okay, these are also very common, but we don't implement these yet in our system all right, so So any questions about like how you actually implement a latch the spin lock is the easiest one and understand the The MCS one or the reader writer one is more complicated, but those are very common So now that we know how to implement a latch Now we want to use them Right, and we want to use this now to we're gonna use latches now to protect the internal structure of the of a B plus tree when we have multiple threads Going through them and traversing them so the So the method or protocol you would use to safely traverse a B plus tree and for this purpose is since B plus trees So we're always taught an intro class I'll assume everyone already knows what B plus tree is and then we can talk about more the other type of data structures In next class before our purpose is here We can understand latch grabbing the context of B plus trees Without having to you know go through and describe what the extra data structure is over and over again All right, so with latch grabbing the basic idea is that we're gonna acquire latches as We go down from one node to the next into the tree to get to the leaf nodes because that's where they're going That's where the values are so the data is the pointers we want And as we go down where we will be allowed to release the latch that we hold for the parent node If we're at a child node that we know is considered to be safe And we would say safe would mean that we know that there's not going to be that that child node will not either have to be Split or merged due to some change that occurs below it in the in the index I So I'll go through I'll describe with me that how you the search and insert and delete and I'll show examples of this So do a search you're gonna start at the root because that's where they always the starting point is for the B plus tree And you're gonna go down and you acquire read latches all the way down And then if you know that are you if you know in this case here You know that's always gonna be safe because you're not modifying the index So once you acquire the read latch from one node you can release the latch you hold for its parent node But to do an insert and delete you have to take right latches all the way down But once you acquire the right latch for the child you sort of current location in the tree if you know, it's not going to be Split or merge Then you can go ahead and release it the right latch you hold on the parent again a split would occur if the Then the node is full meaning we can't put any more keys in it and the you have to merge if it's less than half Empty right in practice. Nobody actually you know in your class we always say that the the merge algorithm always gets invoked when the node is less than half full Or half full and less in practice They actually relax that constraint a little bit and doesn't affect the correctness of the index and they do that for performance reasons Because it's sort of less less churn in the index All right, so let's look at a simple example here. So we have our B plus tree And I've labeled the nodes ABCD to G So the first thing we want to do is do a look up on 23 So again, we always start at the root and we're gonna acquire the read latch on a and then we traverse down to C And once we acquire the relatch on this we know it's safe to go ahead and release the latch on a Because we've reached this point so we go ahead and do that and then get down to F Release the latch on C and then we go read the thing that we want This is sort of why it's called latch crabbing because it's like it's supposed to be how crab walks So you're you're acquiring latches, you know over you know one after another, right? It's a stretch, but whatever So now let's talk about how you actually when you modify the index So in this case here, we're gonna do a delete on 44 which is down here in G So we always have to start off at the root taking a right right latch And then we get down to C We would recognize that We since we're doing a delete and this guy is half full or less We don't know what's gonna happen below us So we may have to merge or coalesce this this node here So we it's not safe for us to release the latch on a and again Visually you see this that you know that you're not gonna have this problem But in actuality when you actually running through this data structure when you're at this point of the tree You don't know what's below you so you have to be careful So then when we finally get down to G. It's here. We recognize that all right We can delete 44 and we're still gonna be half full So we know that we're not gonna have to Merge anything so it's safe for us to release the the right latches on C&A and Then we can go go ahead and delete delete our record That's sort of clear. How are we doing this right? You're releasing that as you go down this one more example is do insert on 44 again same thing at the top We take a right latch we get to see and now here we recognize since we're doing an insert not a delete that Since this guy is not half full We know that it's if if there's a split below us we can still accommodate one more key here So that point we know it's safe to go ahead release the latch on a Right because a could never be affected by whatever happens below C because C always has you know free space Then we get to G same thing here. I should know in this case here Since you want to start 40 and that would be between 38 and 40 and we already have two keys being used We don't have a third one. So we're gonna have to split G So therefore we still have to hold the right lock on C So we go ahead and create our new node update our pointers and then we can insert our new entry here right, okay, we can update the Update the the inner node up in up in here to now point it say you went 44 or more you go over there So here again to do this insert with a split. I had to maintain the right latch on G and write that on C Why don't I have to acquire the right latch on H? Disney is so I made it and no one could see it because I have an exclusive lock on C So if anybody comes and tries to read this they can't so they're not gonna be able to read read my Read read my new node at this point and then that point I can I can release all my latches so in crabbing unlike in two-phase locking with With multi in a hierarchical multi granular two-phase locking where you have to release locks I think from bottom up in the case of In latch crabbing, you don't have to do that And you just want to release the latches as soon as possible So the ordering in which you release the latches doesn't actually matter So what I'm trying to say is in this case here It's okay for me to least release The latch on G first followed by C or C first followed by G and that doesn't affect the correctness at all Right because I've made sure I installed all my pointers So the the structure of the of the index is still still valid still sound There's no pointer pointing to some memory location that goes to nothing and we would get a second fall so In these two examples that I showed what for the insert and delete What was the first operation the first step that I always did? As I was doing latch crabbing I was going down. What's the very first thing I did? Exactly get a rat latch in the root right something like and so this is bad because in this case I'm acquiring I'm acquiring a right latch on the root So that means anytime anybody wants to update the index they got to go through basically this main bottleneck here right and This you know people observed obviously in the early days of people series that this was a bad idea Because now you're basically running it almost like in serial ordered update index which was meant to be concurrent So there's a better latch crabbing algorithm that came out in the late 1970s 1977 so again the first sort of B plus 3 came out in like you know 72 So then this is the an update to 77 how to do this latch crabbing better so what we're gonna do is we're gonna optimistically assume that the The leaf node that we're gonna do an insert or delete on is gonna be safe Meaning we're not gonna have to split it. We're not gonna have to merge it So we're gonna take read latches all the way down Doing the same crabbing thing we release our parents is required the child node And then we reach to the leaf node will acquire the exclusive lock And then we'll just verify that our assumption that it was it was not gonna have to split a merge is correct And if it is then we just do what we did then we need to do and we're done If it not then we just go back and do the same algorithm that I showed before We require the right latch on the root as you go down So it's pretty simple to understand. So let's just do one example do that delete on 44 So again, I'm gonna take a relatch on a Relatch on C and at this point I can release the relatch on a and then now I acquire the Exclusive lack on G and at this point. I know that to do this delete. I don't have to merge So therefore I can release the latch on C. So my assumption that was valid assumptions. It's correct All right, so this is way better because you're now for better parallels and better concurrency Because you're not bottleneck on everyone trying to acquire the right lock on on a Right, this looks a lot like, you know, optimistic concurrency told and actually this idea that you're gonna be optimistic about assuming How you acquire a lock shows up a lot in other areas of computer science as well This is essentially the same way that transactional memory works right transactional memory allows you to do whatever you need to do to do your rights into memory assuming that's not Any problems and at the end you just validate make sure everything's okay, and if not then you go back and Do the take heavyweight locks to do the thing that failed the first time Right, so we'll go fast assuming everything's be okay, and if that fails we come back and do more heavy heavyweight heavyweight search And this makes sense right because most of the time most your inserts and deletes aren't gonna require you to to you know Do splits emerges right in our case it for our BW tree we support 128 keys per node right, so you're not really gonna be churning through New nodes all the time or having to reorganize everything right, so Craving is always to assure that we don't corrupt the internal data structure of our index during modifications So that's good That's all the problem that we wanted to solve by having multiple threads going into the same data structure at the same time But memory said that in the case of latches Transactions or threads are gonna release the latches that they hold as soon as they finish the operation that they wanted to do So in the case my my do my delete I took latches all the way down I took the exclusive of the right latch at the leaf node soon as I did my delete. I released it Right, but now the problem is this is this will guarantee that the data structure will be correct But it's not going to guarantee the logical contents will be correct In practice we can we we can incur phantoms So it's like one scenario here right same tree that we have before I'm gonna have two transactions running So the first transaction wants to do a conditional insert It wants to see do I have if the key 25 doesn't exist. I want to go ahead and insert it So it starts off by doing a check on 25. So the same thing it does latch crabbing all the way down To get to this this this node here, and it sees oh 25 should be here, but it's not So I now go ahead and insert it But before it can come back and insert it transaction two comes along and he wants to insert 25 And he goes down does crabbing gets to the bottom and Inserts 25. Well, he's done and then now when Transaction one comes back and tries to insert 25 he's gonna come down and he's gonna try to insert this But then he's gonna hit a collision because the other transaction came along and inserted before he could Right because between the first check and then the insert we didn't hold the latch for this because that's not how the protocol works Okay, now you could imagine if you could provide conditional insert where I'll do the search If it doesn't exist only inserted then but there's you know other scenarios where you know you could hit the same problem Let's get another one so this one here the transaction one was to scan the range 22 to 23 So this is again in an intro class This is not something that people actually we don't we don't teach or talk about but this is actually very common because usually you know Sometimes you can use scans along leaf nodes to find the data you're looking for rather than doing scans of the tuples and the heat so we're gonna do a latch crabbing to get down to node e here and We would start scanning at 12 and recognize that we Need to go look over here in this node over here So we need to acquire the read lot the read latch for this in order to complete our scan now in this example I just jumped across the the sibling pointer and Acquired acquired that the read latch in practice You can't do that because someone could be coming the other way and you'd have deadlock and this you your pointer to the To this location may have changed because something might have occurred over here to cause this node to get deleted replaced with something else But for you know for simplicity, I'm just saying that yes, you can go across here, but in practice you can't At least not easily so at here again, we completed our scan So we saw all the values from from 12 to 23 in this case. It's only 12 and 23 Transaction 22 comes along once I insert 21 takes read latches all the way down And then a concert its new value here and then the same thing we come back now the same transaction does the same scan And it would now see that 21 was here. It's an unrepeatable read It seems something that are getting there. It's a phantom read it's seeing something that did not exist in the range When it when it scan the first time So latch crabbing doesn't help us here, right? We need something else All right, and this is what in a index locks are going to do for us So in next locks are going to allow us protect the logical contents of the of the index And to avoid phantoms or interference with other transactions running at the same time So we'll be doing this concurrently with the latch crabbing or taking taking latches in the data structure Right. This is sort of a high-level construct to make sure the logical concepts are correct But when we go inside the index we still have to take all our latches So this is not meant to replace latches. This is used in conjunction with latches So and the key difference we talked about the beginning between locks index locks and latches is that these index locks are going to be held for the entire duration of the Transaction and we're only going to acquire them at the leaf nodes of the index in our b-plus tree Remember in the latches we can acquire them all all up and down the tree But for these locks we only acquire them at the bottom and Then they're and the reason why we do this is because they're not The the the the keys in the upper levels of the tree may not actually be you know to real values But you could delete a key from the index and we get deleted from the leaf node But it may still be you do as a guide poster or a search marker in the upper levels of the tree The last key distinction is that the index locks that we'll talk about are not physically stored inside the data structure itself So the latches that I was showing you here We're doing latch crabbing those are usually stored inside in memory inside inside the node for each each leaf node, right? With these index locks they can't be stored inside of the the index because the physical Organization in memory of the index may change So think about this. So if I'm going to take an index lock say a simple index lock is I want to take a lock on key 5 if now I acquire that lock and Some other transaction it starts a bunch of stuff Key 5 is still valid, but now the data structure has changed because we've done splits and merges If I'm inside my transaction context handle if I'm pointing to that lock to know that I have it It may have gotten moved because that the the index of the B plus three you got reorganized So now you say all right Well, I don't want to store pointers to my locks that I hold in my transaction context handle I'll just only store them inside the data structure, but now when a transaction commits You don't know what locks you have therefore you need to validate and release you have to scan the leaf node of every single index to find The locks that you hold This is clear So this sort of goes against what I said before about how in an in-memory system We can maintain the the locks that we have to hold on tuples themselves directly in the header of the tuple Right now that I'm saying inside in memory index You can't hold these locks inside the data structure They have to be stored in an auxiliary auxiliary space because the physical contents of the physical organization of the index can change Whereas the tuples don't change And so the lock table sort of looks like this is pretty basic usually there's some kind of It's usually got a latch free hash table and you have some way to map the the the logical key So some location the hash table and it'll point to a queue of transactions that either hold or Waiting to acquire the lock. So typically what happens is always the the first element in the queue Is the one that's holding the lock and everything comes behind it is waiting to acquire it so this guy here is a transaction one holds a exclusive lock whenever this lock is and All these other guys are waiting for this one here since the shared lock is compatible with other shared locks All these three transactions also all hold the same lock at the same time Right and there may be other transaction behind this trying to acquire, you know an exclusive lock So the the nice thing about this is though you can actually implement this There are lock free Hash tables and you can have sorry latch free hash tables and latch free queues So it's not it's not like it's a choir a latch to go into this depending on how it's implemented every single time But again, this is not free right because it's a look up to go find the thing you're looking for then you have to traverse the queue and Check to see whether you're the only one here or whether where you need to block and block and wait yourself wait Block and add yourself to the back of the queue All right, so I'm going to talk about five different types of locks And these are all sort of described in the paper you guys read So we'll start off with predicate locks and these are sort of like a thought experiment a nice thing But nobody actually implements them and then we'll have the real ones that will build up to the final one the higher Hierarchal locking which is what everyone actually implements in their indexes if you if you're doing latch crowding and locking inside of index So predicate locks were the first sort of locking scheme for transactions and nexus That came out of the system our group at IBM Again when I really love about the system our project It's just sort of the way it sort of it came about like they took eight people that had they were super smart and had PhDs they put them in a room and said make me a relational database and they all sort of broke off and in every person You know one of those eight people worked on you know some different part of the system They have sort of all combined it together like one woman worked on the cost-based query optimization somebody else worked on the sequel language and Jim Gray and his people worked on Coming up with locking scheme and concurrency control and you know the early days of all this So in the in the one original papers from system are from 1976 they describe They're the predicate locking scheme in it But as I said, this is actually difficult to implement in practice. So nobody actually does this There is something called precision locks that hyper users that comes pretty close But as I'll describe here in the original paper, no one does this So the basic idea is that you'll take shared locks on Predicates that are in the where clauses for a select statements and then you'll take exclusive locks on predicates For the update and delete queries in the case of an insert query There's no where clause you just take a conjunction of the values You're trying to insert and that's equivalent to getting a where clause So it's kind of better maybe to visualize this and see what it looks like So say I have two queries here All right this is computing a sum aggregate on the balance for the accounts where the name of the account holder is biggie and Then we have another query that wants to insert into accounts to add a new record for biggie with an amount of a hundred so down here See think of this sort of as a 2d projection of the key space for this for this table count So for the first query here, we would look at the where clause and see name equals busy biggie and then this Sort of this space here represents all the keys that exist in the accounts table with the count name equal equals biggie So now when I will look at the the insert statement again, there's no where clause So I take a conjunction on name equals biggie and About balance equals a hundred and I see that this space overlaps with the the larger key space where name equals biggie Right because this is we're qualifying this to say here's the all the keys or the the the elements of the table Where balance equals a hundred and name equals busy and that has to be a subset of all possible keys Where name equals busy biggie and so if you see that you have an overlap You would know that these two predicates a conflict and therefore you have to be mindful of that Like if this guy comes first you can't acquire the this other guy can't acquire the lock on this space because he overlaps with this other one here Right, so it's intuitively. This is kind of easy to understand, but in practice. This is really hard to implement Part of the reasons now because you know, it's I'm showing the two You know a table with two elements or two attributes You can have a ton of different attributes and now your space is not too 2d. It's multi-dimensional and you have to see whether You have these different spaces intersect the other issue also too is that the Output of if you have a nested query and the output of one inner query could be used at the as the input for the for the outer query Then it's hard to derive what those with what these predicate locks should look like in that case there Right in this example here What's nice is that I can actually derive what the locks are without actually running the queries Because I'm just looking at the the logical semantics of the the where clause of the predicate in The all the other schemes that we'll talk about next You only know whether the the you don't even know you only know what that you would only know What elements the query is looking at until you actually execute the query? So you can do this prior to the query and prior execution all the elements have to be done as execution Like as I said, nobody actually implements this but it's sort of nice thing to understand All right, so now we're gonna talk about how they actually build up now The type of locks we can take in our indexes to see how we actually Handle phantoms and in harder cases So they most simple type of index lock you'd have or key value locks And this is exactly what you we think would be for a single value in single key inside of the In a leaf node we can acquire the lock for it. I said this case here We have the transaction acquire the lock for from 14 to 14 exclusive So this some transaction holds the lock to this thing here and nobody else can can can delete it So the trick even though is that we we need to handle the case where there are There could be non-existent keys And so this is part of the reason why you have to know another reason might destroy everything in a separate locking table Is like there's no element 15 here So I can't store a lock for it because it doesn't exist. I say this is 14 to 1 billion I don't want to pre-allocate a billion slots for keys Because someone might acquire locks for these elements if they're never going to happen, right? So you store it outside of it and it has you more have more compact representation or more come back You have to sort of less data to store the non-existent keys So the way you handle non non-existent keys is through gap locks So it's essentially the same thing as a key value lock except you're locking the space where there are no keys Right. So in between 10 to 12 there's a gap 12 to 14 there's a gap and so forth So I can have a transaction acquire the gap lock from this base from 14 to 16 Exclusive so now if someone tries to insert 15 here and Another thread holds the gap lock they can't do that because they'd have to acquire the gap lock first in order in order to do that Now we can combine the key value locks and the gap locks to come up with key range locks And this is again this allows it to have a wider range of key space that we can acquire inside of the leaf though here so Wait, sorry So the basic idea is that we say we jump to some Location in the key space and we can acquire the lock for that key value as well as the gap lock next to it So in this case here a next key key value lock or key range lock would be from 14 Inclusive into the gap but not over to 16 We can go the other direction do prior key locks in this case here It'd be from the gap after 12 up until 14 inclusive So you can take next key or prior key. There's no real difference of the two But the key thing is that you can only go in one direction So you can only have next key lock and you can't mix them together with the prior key locks Because that causes problems and in practice most most systems actually implement the next key locks Right and this is sort of byproduct of us, you know If you know maybe reading English or just understanding how the index works Like most of the time you're scanning in this direction. So it makes more sense to sort of start it here and take the gap next to you But again, like I don't think from a from a scientific standpoint I don't think there's any any major difference. This is just the way everyone does it All right, so now we have we have key value locks We have got locks and we combine them together into key range locks And then the last piece that we need are is to do hierarchical locking So what I showed before was we were acquiring locks on just so one key value and the gap next to it But if we now need to acquire locks for multiple gaps and also key key values Then we have to acquire multiple locks and every single time you acquire your lock You have to go to the lock manager traverse the queue and figure out whether you can acquire the lock or not and then add yourself So with hierarchical locking, this is the same idea that we had in multi granular locking in Tube is locking that we discussed in the intro course is that we can acquire a higher level lock for some multiple elements or multiple key key value and gaps and Then we can take more refined more exclusive locks inside of the range. I'm gonna go down Let's say that we have a transaction and what's acquired an intention exclusive lock for this entire range I want to acquire that then I can take the exclusive lock for just this this for this key in this gap here So Everyone remind me what the intention exclusive means from from last semester. What does that mean? Correct, so he said that this there's intention exclusive means that at a higher level We're acquiring a lock that says hey someone at the lower level is a client exclusive lock Just be mindful of that and so the intention exclusive lock is not compatible with the shared shared lock So no one could come down and try to read all this While while we hold this and that can help us prevent phantoms because someone may try to Insert something in the range we're trying to insert on but intention exclusive is compatible with intention exclusive So another transaction could come along and they can modify You know take a key value exclusive lock on one of these elements and that's okay because we're not tripping up over here Right because in this guy try to also insert maybe 15 He wouldn't be able to do that even though he has the intention exclusive lock of above He can't acquire the exclusive lock on 15 here. So by having Hierarchal locking laws have better concurrency because we can Better concurrency while also taking Fewer locks or fewer round trips into the lock manager I mean the same argument that you make and in multi granular locking with two-phase locking in at a Concurrential level at all that all applies here and inside of our indexes So Given I just spent you know a half an hour discussing Locking schemes inside the index What I'll say is that the more modern in many systems don't actually do what I just described and part of the reason is that they want to have these They want to avoid having to take heavy locks inside of indexes So we saw all this in silo right silo basically does Does revalidation on the scan set of nodes inside the index to check to see whether it's been modified since the last time you Ran the query Heccaton they actually do take locks put on the on the hash shape that's organized in the buckets of tuples But they don't actually take the index locks at the and so the internals of the BW tree that I'm showing here So hierarchal locking is nice and it's essential because it provides us essentially with predicate locking without having to To figure out what you know what what predicates overlap with other predicates Because we know exactly what the predicates trying to do because in order to grab the tuple or grab the keys that they need for the scan operation or in sort of updated leap We can acquire the locks on the low level elements or the keys inside of our index And then reminder that we only take the locks at the leaf nodes in the tree But in the everywhere else we have to take a lot latches to ensure that we have a consistent data structure So now currently in our system, we don't actually support serializable isolation We have range scans so you can do range scans on our BW tree and this will be required for you to do in your skip list But there's not going to be a high-level mechanism in the conventional protocol to actually do index locking or revalidation to make sure that there are no phantoms Which is something we simply just don't support at this at this point, but there's standard techniques We can make this work So any questions about index latching or index locking? Okay, so Next class I think I get real excited about because now we start talking about different types of indexes, right? So again, we spent today talking about the B plus tree And that's again, we take an intro course. This is usually what we ever teach you But there's way other trees you could use and then in database systems that no one ever teaches So we're going to cover a lot of these different things So we'll talk about tea trees that came out of the 1980s and was used in Oracle times 10 in the 1990s But no one actually implements us now and then we'll spend some time talking about how to do concurrent skip lists And this is again This is what project who's all about and as far as I know the only database system that uses this as the primary index Would be mem sql. So the assigned reading for you guys for next Tuesday is a blog post by some guy Well, actually, I don't know what his real name is because you it's not published anywhere But he basically has a nice Description of how you actually make a skip list high-performance So the original skip list as you'll see on Tuesday is actually really slow on a multi-core architecture Because it's not very cash-friendly So the assigned reading will show you how you actually make it good and using a lot of same technique techniques that the mem sql guys did We'll also spend time in the beginning Awesome We're almost done anyway We'll set the time in the beginning talking about How do you represent the keys inside of the index, right? Usually again one of the example I showed here I'm dealing with single integers, but if you have to have multiple integers or mix with integers and strings How do you actually do that? How can you actually do lookups and comparisons of these things to make it really really fast? And then we'll talk about how you do Memory allocation and garbage collection inside of it next as well if you have a latch re-index Again, this is something you're also gonna have to implement in project number two In the case of the index key representation as you see in the project specification We do all this for you So you just you just give it a key type and you just stored it as a blob and you're given a class that allow you to Comparisons and all that sort of taking care of it for you, but I'll show you how you can actually make it work really fast Any questions? All right awesome guys. Thanks