 All right, today's lecture is going to be about index concurrency control. Before we get into the details of that, let's start off with administrative stuff. So project number one was due last night at 11.59 p.m. We can't finalize the leaderboard just yet because people still have late days so they can turn in technically up to four late days. So we'll be able to finalize the leaderboard by the end of the week. Homework number two is due Sunday, October 3rd, also at 11.59 p.m. And project number two, which will be about hash indexes, is going to be released today and it will be due on Sunday, October 17th. So before we get into the concurrency control stuff, I wanted to start off by answering a few questions that were brought up last class. So the first is about non-prefixed lookups in multi-attribute B plus trees. And the second one is about efficiently merging B plus trees. So in the last class I showed this slide that talked about selection conditions on a multi-attribute index. So in this case, the index is built on three attributes, A, B, and C. And the two types of operations that I said were supported were A equals five and B equals three, as well as the second one that's B is equal to three. So the question was specifically about the second case, how could we support this B is equal to three? So this was the original tree that I showed in the lecture. And I showed these two operations and I didn't show the search on star or wildcard B because this tree was too simple. So we'll throw out this tree and we'll hopefully give a better example that illustrates that second point that I wanted to make there. So instead of naming the columns A, B, and C, instead we're gonna call them call one, call two, and call three. And the types of values that they can have are A, B, C, and D. So this is the entire alphabet that we're going to allow, just these four characters. So each column can either have an A, a B, a C, or a D. And now the operation that we're specifically focusing on in this case was is column two equal to the value B? So what would this look like? Well, as I said, we need a more complicated tree. Sorry if that's a little small, but the slides are also available online. You can take a look at them there. But the tree is substantially more complicated. It's also fully packed so we didn't leave any extra space in there just so we can fit it all in the slide for the example. But the way that this is gonna work, as I sort of alluded to last class, involves starting at the root node. So we're going to start here. And we're going to try and, as we descend the tree, exclude subranges that the key, call two equals B, can't possibly be in. So we're gonna start here in the first key range and kind of just descend the tree and try to rule out any possible subtrees or subranges that that key can't, we know, can't possibly appear in. So again, we kind of start there and we work down the tree. We're gonna get to this node, which we'll see it covers the range of AAA. So the values in all of the columns that's a max possible value we can have. So the keys are greater than or equal to that and they're strictly less than ABA. So in this case, we know that we don't need to look at this node here, because there can't possibly be a key that has B in the second position. So going back up to this inner node here, we're gonna look at the second range that's covered. Which is from ABA, inclusive up to ABD, exclusive. So of course, since B appears in the second position here, we do need to check that leaf node, so we're gonna grab that node there. Again, if we look at this range here, it's gonna be from ABD to ACC. And of course, there could be a value B in the second position there. So we need to grab that leaf as well. Now we get to the last one, which we know is between ACC and ADB. So there can't possibly be a B in there in the second position. So we don't need to check that leaf node. So kind of you can continue with the same intuition across the full span of the tree. And we can sort of rule out these sub ranges or leaf nodes where we know that the value B can't possibly exist in the second column position. So this is sometimes I think in Oracle, maybe some other DBMSs use the term skip scan. Because the idea is that you can sort of skip along only looking at the pages that you're interested in because you know something about the sub range that they cover. So I hope this makes a little more sense than the explanation I was trying to give verbally last class, this picture. And again, the slides are online so you can take a look. I know the letters are a little small, but we needed a sufficiently complicated tree to kind of illustrate this technique. So are there any questions about this before we move on to the other question? Yes. So the question is you can extend this to any subsequence. So do you mean any column value that's supported or any sub range? Sorry, I didn't. Right, so the question is, you can only do the skipping on the not the last column that you're ordered by. Yeah, that's correct. So because you don't know what's going to be in the last position there, you can only do these sort of filtering on columns that are more important for the sort order that's guaranteed. But you can, as you're descending the tree, I mean, imagine this has three, this is an index built on three columns you could generalize to an arbitrary number of columns. And you may be able to, so all of these filterings take place at the leaf nodes, you may be able to actually perform like earlier stopping during your traversal, you're going to have to traverse all the way down the tree. Does that make sense? Are there any other questions about this? Okay, the second question was about efficiently merging B plus trees. And I said in the last class that I wasn't aware personally of any algorithms, but I'm sure that there has been some work done on it. And I think that the most basic case, which I'll talk about here is kind of since the trees give you this sorted order, you can perform kind of like the merge phase of a, since they're in sorted order, you can just do, I was going to say the merge phase of a sort merge join, but we haven't talked about that yet. But since both of the inputs are in sorted order, you can just kind of pull one, the smallest value from the head of each input to merge them together. So that's kind of what this approach number one here is. And the paper I referenced in 2005 was published in a database conference. And it kind of enumerates all the different techniques that you could have for merging B plus trees. So they talk about these first three and then they present the fourth lazy approach in the paper, which I'll explain in a second. So the first approach is kind of to do things in an offline fashion where you're blocking. So you block all operations on both of the trees. You don't want to allow any modifications or reads to the trees. Basically, you do that by putting an exclusive latch on the trees, which we'll talk about in today's lecture, but you're going to lock the trees, prevent any concurrent access from happening. And then you're going to perform the merging, pretty much what I described, merging the leaf nodes and then building up the tree in an offline fashion. Then when you're done, you have a fully merged B plus tree. And you can let other threads start using it again. The second approach is called the eager approach. It's basically like an incremental approach where if you assume you have two B plus trees you want to merge, the threads that are querying them need to access both trees. So you access the first B plus tree, look for whatever values you need there. You access the second B plus tree. And what you're going to do is you're going to move batches eagerly between them. So you're going to leverage the accesses of other threads in order to merge the trees. So every time that a thread gets to a range that every time that a thread accesses the tree, it merges in the values from one of the trees to the other to make sure that we can get them synchronized together. The approach number three is the background approach. Basically it's sort of similar to the first offline approach, except you're not blocking the trees. Basically you're creating a copy or like a snapshot of each of the trees applying, doing the merge offline somehow in the background. So you build up this merged tree. And then you go back and there may have been modifications to each of the individual trees since you did your snapshot or your copy. So you'll need to go back and apply any updates that you may have missed. Then you get this final merged tree at the end and there's some kind of, you know, quick switch out where you can switch out the other two with the new one that you've constructed. You just need to have some short term latch to prevent modifications while you're replacing the old two B plus trees with the new merged one. And the final approach is approach number four, which is presented in this paper and their idea is basically that they're going to designate one of the trees, one of the B plus tree indexes as the main index and one as what they call the secondary index. And usually they choose it so that the main index is the larger one and the secondary index is a smaller one for efficiency reasons, but either way would work. And what the algorithm is going to do is essentially if the leaf, if you're only going to access one of the trees. So unlike in the eager approach, you have to access both. You're going to access one index, the main one, and when you get to a leaf, you're going to ask, has this leaf been merged with the leaf from the secondary tree? Have we merged in those values from the secondary B plus tree to this leaf node? If it has, then you're done, you don't have to do anything. If it hasn't yet been merged, then you'll have to go over to the secondary B plus tree, grab the corresponding range that's covered by the leaf, and then merge it into the primary. So it's done in this kind of lazy way. You only do it when you access something and you only have to access the main index. You don't have to access both. So this is, as far as I know, a pretty good summary. The paper is called Online B Tree Merging. If you're interested, you should take a look. There may be, as I said, more advanced techniques that you can apply here, but I think this is a pretty good high-level summary of some of the different trade-offs you have. So are there any questions about this? Okay, cool. Now we can talk about concurrent data structures. So in the last two classes, we've talked about the different data structures that we use as indexes in the DBMS, primarily hash tables and B plus trees, or trees in the B tree family, primarily B plus trees. And for simplicity in all of our conversations, for the most part, we've been assuming that all of the data structures are accessed only by a single thread. So there's no concurrent access, there's no concurrent reads, writes, there aren't any changes or modifications going on to the tree at the same time. So everything happens in one single thread, and for that reason we didn't need to worry about either the layout of the data structure changing, or trying to do traversals or reads or something, or any types of anomalies coming up from concurrent access. So now if we're going to allow multiple threads to run in our DBMS, which is important in order to leverage parallelism in the CPU, so if you have a multi-course CPU you can have a lot of concurrent threads running, and to hide the latency incurred by disk stalls, IO stalls. So if I have a thread that's blocked waiting for reading from something from disk, I don't want to have to wait around for that to complete. I can schedule another thread in there while that IO operation is proceeding. So if we're going to allow multiple threads and we want to operate concurrency concurrently in our DBMS, we need some way of designing the DBMS including the data structures and everything in the internals in order to protect itself from concurrency errors. So there are a few notable examples of systems and we won't really talk about them in this class. They're primarily in memory systems, but there was some research probably about 10 or 15 years ago that looked at the time spent in DBMS internals, so what the DBMS software itself was executing, and there was a lot of overhead spent in latching and concurrency control and that sort of code path in the DBMS, so kind of the key insight behind some of these approaches was if you're in memory then you don't need to really worry about disk IO stalls, so the idea is that we can get rid of concurrency control and the way that we'll avoid any kind of concurrency errors is by partitioning the DBMS into disjoint sub ranges. So for example, imagine you have keys, you could split it up into the keys from A to D and then E to J or something, you can kind of partition up the range in this way and let each partition operate with only a single thread. So this is kind of the idea behind shared nothing systems where you kind of partition up the data in a way such that you don't need concurrent threads running on the same data partition. So again, these are kind of more advanced systems, we're not going to talk about them really in this course, but if you're wondering about any of these, you should check out the papers, they're pretty interesting. So just at a high level to kind of explain what it means by concurrency control, the high level idea is that the DBMS is going to ensure correctness and the term correct is in quotes because that's the key word there and we'll explain what that means in a lot of detail, but the DBMS needs to guarantee correctness by enforcing how the different concurrent threads are going to access it. So it needs to make choices in the data structure and algorithm design in order to make sure that we don't introduce any of these concurrency errors by having threads operate concurrently on some sort of shared object. So when we're talking about correctness, we can be referring to kind of two different levels and it depends on the context. So having a protocol with some correctness criteria, we could be talking about logical correctness. So as the first type, it means basically, imagine I have a data structure like an index or something and I want to insert a key like key five so I perform the right and then if I go back to read key five from the index, I should see it. So I should be able to read my own rights. So this is kind of like at a logical level what we expect the requirements of the correctness protocol to be. At the physical level, we're actually talking at a little bit lower level and we really mean to protect the internal representation of the object. So things like the pointers in the DBMS or the layout of a tuple and the keys and the values in the tuple, that kind of stuff. We want to make sure all of that is protected and doesn't get corrupted by concurrent modifications. Things like if you wind up with a pointer to an empty or an invalid page or something that's bad because you're going to wind up with a segfault or reading wrong data or something like that. So what we're going to be talking about today is physical correctness and how we can ensure that in the index data structures that we've talked about in the previous lectures. And we'll cover this logical correctness idea later in the semester I think after the midterm. So at a high level, today's agenda is we're going to start with kind of a review of latches and an overview of what we mean by that, how to implement them basically at a low level. Then we're going to talk about how to use latches in a hash table to make that a concurrent hash table and then some more advanced usages in a B plus tree. So hash tables kind of a little easier to make concurrent turn into a concurrent data structure B plus trees are a little more difficult and then finally we'll just end with an example of some of the problems you can run into in B plus trees with concurrent access. So I think I had this slide in an earlier lecture. I just kind of want to go over again the difference here. So again, I mentioned in the previous lecture that there's this difference between locks versus latches. So if you're from an OS or systems background, what we in the database world refer to as a latch, they call a lock. So you may run into some confusion there, but there's a reason that kind of the database world has come up with this naming convention. So what we in database call a lock is actually a higher level construct that's protecting the database's logical contents from concurrent transactions. So what does that mean? Well, it's protecting things like tuples or pages or tables or those sorts of abstract or abstractions that we have in our system, but they're not low level physical details. So the locks are held for a longer duration, typically the full duration of a transaction. So again, we haven't really talked about what a transaction is yet, but basically we want our transactions to perform some kind of consistent and atomic update of the database. So in order to ensure that one way to perform it is through locking. So we can hold locks for the duration of all of the changes that we're making to the database. And finally, as I said, since we want these changes to occur atomically, we need to be able to roll back changes if we get part way through and we need to abort or we crash for some reason. So we need to be able to roll back the changes that we made. So that's kind of locks operating at the logical level. At a lower like physical level is what we call latches. So these protect like the critical sections of DBMS internal data structures or algorithms or whatever it is you're executing critical sections of that code from current modification by other threads. And they're held usually for the duration of the operation. So for as short as possible, we just want to acquire the latch, do whatever our critical section is and then release the latch so that we don't block other threads from making progress. And we don't need to have any notion of rollback in this case. So if we make a change, then the change, the scope of the change is only valid for the critical section. We're not going to go back and undo changes or modifications that we made in other times when the latches were acquired. So they're held for much shorter duration. This is to allow concurrent threads to make progress without blocking them. So from this book that I mentioned last time, Modern Bee Tree Techniques, there's this nice table in it that kind of breaks down the high level differences between locks and latches. The way to read this is you can look at the action in the left column there and then see what applies to locks versus what applies to latches. So as I said, we're going to be focusing on latches. So latches separate concurrent threads. They are protecting our in-memory data structures. So we're only acquiring latches on data structures that exist in memory, things like indexes. They are only valid during the critical section. So we only want to acquire it for the critical section and release it as soon as possible. We only have two modes, and this is going to be important for what some of the algorithms we're going to talk about. So there's a read mode and a write mode. So there's no notion of deadlock detection or kind of resolving deadlocks in latches. So the way that we avoid deadlock is strictly through coding discipline. So we need to very carefully write the code, whether it's the index traversal code or whatever it is inside our DBMS that we're writing that needs to use latching. But we need to be very careful writing that code in order to avoid deadlocks because that's the only thing preventing them from showing up. And then the final piece is where locks versus latches are stored. So latches are usually embedded somehow directly in the data structure that you're doing the latching on. So like I said, we're going to cover this idea of locks in a later lecture after the midterm. So are there any questions kind of about the differences here before we move on? Okay, so I mentioned there are these two different latch modes. So in read mode, basically multiple threads can read the same object at the same time if we're both just reading. There's no concurrency problem that can come up because the values aren't going to change. And a thread is therefore free to acquire the read latch if another thread, another concurrent thread already has it in read mode. On the other hand, in write mode, only one thread can have the write latch at a time. So only one thread can be accessing the object in write mode and other threads who want to acquire a write latch or a read latch have to block. So they can't acquire a latch if another thread has it in write mode. It's like an exclusive latch. So you can think about it in terms of this compatibility matrix here that shows when you can have two threads concurrently hold locks. So the only case where it's safe to do so, or sorry, concurrently hold latches, I'll probably mix that up a few more times in the lecture. But the only case where two threads can concurrently hold a latch is when they're both read latches, otherwise it's not valid. So as I said, we're going to start with just kind of looking at a very high-level overview of the basic concepts behind how you implement latches. You can go much deeper and there's much more than what we're going to talk about here, but this is just kind of the quick summary of the key concepts that you need to understand. And we're going to look at these three basic implementations. So the blocking OS mutex test and set spin latch and reader writer latches. So the first one, as I said, is the blocking OS mutex. This is the most common, this is what you get if you declare like STD mutex and C++. They're really simple to use. The overhead, maybe 25 nanoseconds doesn't sound that bad, but if you're doing a ton of latching operations on a really large data structure to scale up, this overhead can add up. So it's certainly something that you don't want to be making frivolous latching calls here. So just imagine we have this program here where we're defining a mutex and we want to have our critical section protected by the mutex. Does anyone know how this mutex is implemented in Linux? You can shout it out if you know. So it's a type def for this p3 mutex, which is actually something called a few text, which is a fast user space mutex. Basically what a few text is, it has these kind of two latches inside. So there's this fast user space spin latch, which we'll talk about in a second, but it's something that you can access very efficiently in user space. And then there's also this more heavyweight blocking OS latch sort of as a fallback. So I'll explain kind of how this looks and why it's set up this way. So imagine we have these two concurrent threads that are running and they both want to acquire this, they want to call walk on the mutex here. So they're first both going to go to the user space latch. And as I said, it's a lot lower overhead to access than this OS latch. So they're both going to go there and let's say this one on the left here wins. So that thread is going to get the latch and the other thread is going to have to block and what it's going to do is going to go drop down into the heavyweight OS latch. So kind of what has happened now is when this first thread is done, it'll call unlock and then the other thread will get woken up when it is able to use the latch. So I mentioned that there's this user space latch and that's the second one we're going to talk about. Basically this is implemented as what's called a test and set spin latch. So it's again, as I said, very efficient. It's just a single instruction. You can use atomic instructions to do this. We'll talk about atomic instructions in one second. But basically it makes sure that the test and set, so that's technically two things. First you have to test and you have to set the value. It makes sure that that happens atomically in hardware. So this is going to exist entirely in user space codes. We're not going to have to go to the OS at all. That's why it's more efficient. But there are a lot of problems that come up when you're using this approach. So it's not for free. You have kind of cache coherence problems that can come up because you have kind of these multiple threads that need to cross kind of these memory boundaries in order to acquire the same latch. And you can also run into this contention problem. So if you have a lot of threads that are all trying to acquire the same latch, then what they're going to end up doing is kind of just looping forever in this loop kind of burning cycle. So basically every time we call latch test and set, if I don't get the latch, then I drop into this loop and I kind of just keep going and going and going. And since there's no visibility into what instructions I'm actually executing, I mean in this case it's just me looping, waiting for something. Since there's no visibility into that, the OS doesn't know what my code is doing. So there's no way for the OS to put my thread to sleep like if I were using an OS level latch in the previous approach, number one. So that's kind of the danger here is if there's a lot of contention, you just have a bunch of threads running, spinning, spinning, spinning, trying to acquire the latch and they're kind of just running fruitlessly wasting cycles. When you could, I put them to sleep and be doing something else. So that's why this is a favorite thing of Linus Torvalds, who is the creator of Linux. And you might also know he has some particularly creative ways of voicing his opinions. So basically he is very against the use of spin locks in user space unless you actually know what you're doing. I don't know who's the judge of whether you actually know what you're doing. I like to think that in the database world, we know what we're doing when building a DBMS, but I think the concern here is about the dangers I mentioned in terms of the scalability and the contention problems. So the third latch implementation we're going to talk about is what's called a reader-writer latch. So basically you can think about the first two approaches that we had as kind of primitives. So those give us individual latching primitives and a reader-writer latch is kind of like a higher level construct that we've built from where we can build from these lower level latch primitives. So what this is going to do is again remember we said that you could have concurrent readers with no concurrency, without introducing concurrency errors in your code. So if we have reader-writer latches, it's going to allow us to have concurrent readers. And kind of the thing that we need to do in order to enable this is that we need to manage read and write queues ourselves. So we need to keep track of kind of which, we need to keep track of whether threads are requesting latches for reads versus writes and that can lead to some problems which we'll see here in a second. So basically imagine you can imagine it like at a high level like this where we have a separate read and write latch inside our big reader-writer latch. And we have basically these two counts for each, which is the number of threads that have successfully acquired either a read or write latch and the number of threads waiting on either a read or write latch. So imagine we have this first thread show up and it wants to acquire a read latch. So what we're going to do is give it out. There's no reason not to. No one has any existing write latch, so there's no problem there. So we're going to give out to that thread the latch and then we're going to increment our count of threads that have read latches active by one. So again, imagine another thread shows up here. It also wants to request a read latch. Well, there's not going to be a problem here because we can have multiple threads with read latches out at the same time, so we can give that out and again, increment the count by one. And now let's say we have another thread show up and request a write latch. Well, we have these two read latches outstanding. We know we can't have a concurrent writer, so we need to block that thread and we're going to put it in the waiting queue. So what's kind of the problem with this approach that I show here? Yes, so the answer is starvation, which I guess I also mentioned in the slide there. But the idea is imagine this other reader shows up and the reader thread wants to acquire a read latch now. So we could have a situation where kind of just these reads keep building up and we're never getting the read latch count down to zero, so that way one of the writers can take over. So what we could end up with is the writer's being starved, while these readers just keep going forever. Yes. Sorry, could you repeat? So let me try and rephrase it. The question is about why does this make the state inconsistent? If you have these reads and writers coming in at the same time. So it's not about the state of the latch that we're worried about becoming inconsistent. It's whatever the latch is protecting. So the latch is just an object to prevent concurrent accesses to some other thing. So it could be like, for example, an index page or a leaf node in an index. And if we have readers that are running around in the index reading stuff at the same time as we're making changes to the index, we could end up with some concurrency errors and we'll kind of go through cases when you can run into those. So it's not about the state of this latch here. It's about whatever the latch is protecting. So the question is if the two reads arrive and then the writer arrives after them and then this other reader arrives, the latest reader on the end here should be reading whatever modifications the writer makes. So this is an entire other class. I don't know how many lectures it is, but it's a whole other class about concurrency control theory and what the guarantees are provided by the DBMS. So the short answer is that we don't actually care about the real-time order in which transactions arrive. We don't care about if, you know, imagine a time step and we have the first reader show up at time step zero, the second one show up at time step one, the writer shows up at time step two, and then another reader shows up at time step three. As long as we can guarantee that each of the readers sees a consistent snapshot or a consistent view of the data stored in the database, we don't actually care if they execute in the exact serial timestamp order in which they showed up. So we can execute them out of order for lots of reasons, but mostly to increase throughput, to increase performance, as long as they see a consistent view of the data stored in the database. So we're not trying to ensure that everything executes in exact chronological order that it showed up in. Does that make sense? Are there any other questions? Okay. So like I said, to prevent starvation here, basically we would want to build in some logic that's going to put this reader that arrived to sleep. I mean, maybe you could look at... There are all sorts of different ways to do it. Maybe you could look at the number of currently outstanding readers versus writers, and you want to let some writers through. So you're going to put the reader to sleep and let some of your writers execute now. So that's kind of how to prevent starvation. We're not going to go too much into it here. So as I said, we want to take our data structures that we talked about in the previous lectures and figure out how to use the latches that we've been discussing in order to protect our data structures and make them concurrent. So we're just going to start with a really simple, linear probing hash table, and it's easy to do it in this case because we're restricting the direction that the threads access or move through the data structure. So recall that when you're doing the linear probing, you're always scanning forward until you wrap around, but you're always scanning forward from the place where you hash into the table. So all of the threads that are accessing the hash table are scanning forward in this same way. So the important thing here is that deadlocks aren't going to be possible. And as I said, there's kind of no mechanism to detect or resolve deadlocks, so we're going to have to be really careful about how we implement the algorithms and data structures in order to prevent them ourselves. So the way that we're going to do it in this case is by enforcing this ordering where all the threads are going to scan forward through the hash table. And then if you want to resize the hash table, then we'll do something really simple. We'll take a global, like, right latch on the entire table and that will prevent concurrent problems there. So we're not going to talk about resizing, but if you have the different hash table implementations that we talked about, extendable hashing, that kind of stuff, there are different ways that you can do it more incrementally. So basically the two different ways that you can implement latching on the hash table are either to latch pages or to latch individual slots. So this is a good example of a trade-off basically between more compute versus more storage. So on the one hand, you know, if I have more latches that I need to store, for example, in the slot latches, because they're finer-grained in the page level, I'm trading off storage for more fine-grained access on the other. And if I'm at the page latching level, I might be latching whole pages that are blocking out concurrent threads from accessing them. So kind of there's this trade-off we need to consider in kind of how we design our latching algorithms to balance these computing interests. So I mean just as an example in the page setting, you know, two threads might need to access different slots, but only one can proceed or only one would be able to proceed at a time if there's a right lock on the page. So just to illustrate these two approaches, we have the first page latch for the linear probing hash table. Imagine we have some transaction one that wants to find the key d, so we're going to hash d, and let's say d hashes to there. So now we have to scan forward, but before we do that, we need to be able to acquire a latch on that page saying that we're reading it to prevent another thread from coming in and currently modifying it. So we're requesting the read latch, let's say we're granted it, then we're going to be able to start our scan of the page. So now imagine that while we're in here looking at this page in transaction one, some other transaction, transaction two comes along and wants to insert the key e. So again we're going to hash e, and let's say e hashes to this slot here, but of course the t1 already has a latch on the page, so we're going to need to block t2 while t1 is doing whatever it needs to do. So t1 is going to read along, and it doesn't find d in that page, so now it wants to come down to the next page. Well, we are down here at this next page, we've already checked it, we know d is not in there, it's not going to be in there, so it's safe for us to release the latch on page one, so that way t2 can make progress, t2 can become unblocked and start executing its insert operation. So we're going to release our latch, now we're going to request for t1 a new read latch on page two. So kind of now we can go through this scanning process, and t1 finds what it was looking for, so we're done there. Now since the latch on the previous page one is released, t2 can acquire the right latch, gets the right latch, and then it can start doing its insert operation. So it tries to insert at C, we see that there's no room for me to insert there, so we have to scan forward, we come down here, since t1 is finished we're not blocked, t2 can acquire the right latch on page two, and now do what it needs to do to insert E. So are there any questions about this page-based latching? Yes. So the question is when if you're performing a deletion and your algorithm involves shifting, do you need to acquire latches on multiple pages? Yes. You may even need to acquire latches on the whole table because depending on the order you might wrap around. I guess you could... So if you're doing shifting, you're going to need to acquire latches in order when you're doing the shifting. If you're just installing a tombstone, then you don't need to acquire latches on pages because you're not moving anything between the pages. But anytime you're doing any kind of compaction or reorganizing across multiple pages, you need latches on all of them. But again, as I said, and this is important, so if you're doing a deletion, you always need to make sure that your latches, if you're doing a deletion and a shift to a compaction, then your latches always have to be acquired in the scanned forward order. You can't acquire them out of order because then you can wind up with deadlocks. Are there any other questions? Okay, so the slot-based alternative is pretty straightforward. Again, I'll say T1 wants to find D so we're going to hash D to this slot here, acquire the read lock this time instead of on the whole page on the individual slot. So this is much finer grained and then we can do our read. Now what this allows is a transaction 2 where we want to insert E. E hashes to a slot also in page 1 but because we're doing the latching at the slot level, there's not going to be a problem here. So transaction 2 can acquire a right latch on this slot in order to try to perform its insert. So now when transaction 1 reads or wants to go and read the next slot, it's going to block waiting for transaction 2 to release its latch. So we've already checked the value in the first slot for transaction 1 so it's safe for us to release this latch since we don't need to go back to what we looked at it. It's not what we were looking for, it's not key D so we want to move on. But while we're blocked waiting for transaction 2, we don't want to prevent other concurrent transactions so imagine there's some transaction 3 that wants to come read A. We don't want to prevent that from making progress so we want to give up that latch as soon as we can. So again, as transaction 2 proceeds, it's going to give up its latch on the previous slot and get a latch on the next slot so now T1 can become unblocked and start its scan forward. So again, T1 has to wait for T2 to finish whatever it's doing. T2 gives up the latch, moves on to the next slot which is empty so it acquires the right latch and it can write the value or key E to that slot now. And of course transaction 1 can proceed and scan down to find D. So does this make sense? Do they have any questions? Okay, so in the hash table lecture which is lecture 6, I'm sure all of you remember at exactly 22 minutes and 48 seconds I said that you could implement this linear probing hash table using no latches. So we just saw two examples how to do it with latches. I mentioned that you could do it without latches. I think my exact quote was you don't need latching for this. This is atomic compare and swap operations. So what is a compare and swap operation? Well, a compare and swap operation is a hardware instruction so it's an atomic instruction that is going to compare the contents of some memory location M to a given value V. So there's going to be two possible things that can happen. If the values are equal so if they match then we're going to install the new given value V prime into M. Otherwise the operation is going to fail and this is going to be the basis of our test and set latch which is going to spin in this loop trying to test a value and then set the value at the memory location atomically until it succeeds. So if it keeps failing it's just going to keep spinning. You can do the same thing to update the slots in the hash table. So just as an example let's say we have the memory address we're going to update there, we want to compare it to value 20 so if it exists as value 20 we want to set the new value to 30. So in this case if that's true we have 20 stored in the memory location so we're going to succeed and we're going to update the value to 30. So the way that this would work for like this insert example here is that we could perform a test and swap atomic instruction that says is this slot equal to some empty value? So let's assign a special empty value maybe the max possible integer key that we're going to assign here and we say is this slot empty? If that succeeds then we'll atomically replace it with our new value if it fails then we just continue scanning on to the next slot. So that's compare and swap. That's the end of hash tables so are there any questions about that before we move on to the B plus tree? We might run a little over time here so if we don't get through everything we can just push it whatever we don't get to to the beginning of the next lecture. So B plus tree concurrency control I said that we're starting with hash tables because they were pretty straightforward to do relative to B plus trees. So now let's see why these are trickier. So the key idea again just like the hash table we want to be able to allow multiple threads to access. The hash table we want to allow multiple threads to be able to read and update the B plus tree at the same time. So again updating here becomes a little bit trickier than just with the how we're doing the updates in the hash table because we need to worry about kind of the different types of reorganization that can occur in the B plus tree. So specifically we have to protect against these two types of problems that can come up. So one is that threads are trying to modify the contents of a node at the same time. So imagine you know we want to insert into a particular node and we have two threads that have conflicting rights in that node. It's one issue that can come up. Another problem and this is trickier to deal with is that one thread can be traversing the tree while some other thread is doing some type of reorganization like splitting or merging nodes in the tree. And you know during your traversal if you're not careful you can wind up with going to wrong or incorrect locations. So kind of we'll see an example of what this might look like. So imagine we have a transaction T1 that wants to delete key 44. So key 44 is down in the corner there. So basically we just go through the usual B plus tree search algorithm. That's fine. We're going to search down looking at each of the division keys to see you know which slot we go into. And we're going to get down to this leaf in the bottom I and we're going to delete the value 44 from that node. So far we don't have any problems except we notice that transaction one has deleted something from this leaf node and now the node is empty. So what do we have to do? Rebalance the tree, right? So this is going to trigger a rebalance where just as a simple example here let's say we're going to borrow a key from our sibling H and move it over so that we're not empty. So we want to take this value 41 from H and move it over to our leaf node there and then we're going to have to update the key in D so that we know which node to go to to find 41. So this is the operation we need to do. We've done the deletion. We now need to do this rebalancing. But let's say that right before we do the rebalance we get put to sleep. So thread one is sleeping and now this other thread T2 comes in and it wants to find the key 41. So that's the key that thread one was going to move over. Thread two wants to find key 41. So this is fine. We'll start doing our traversal. No problem, gets down the tree, gets here. It says okay great 41 is greater than or equal to 38 and it's less than 44 so I know I need to go to leaf node H. So now let's say that thread two is going to get put to sleep. So thread two goes to sleep while it's at node D. It already knows it needs to go to leaf node H. It has the pointer right there for where to go. But now it's asleep. So this thread one wakes back up and it continues what it was working on which was moving key 41 from the sibling node and updating the key in D. And now when thread two wakes up again it's going to come down here and it's going to say hey I thought key 41 should have been in here it's not here so it must not be in the index. But thread two doesn't realize that in the meantime thread one has moved it over. So this is one example of a concurrency problem that can come up if you don't have any latches or protections around these nodes in your tree. You're allowing concurrent reads and writes to go on without any kind of protection mechanism. So this is kind of the one case is T2 gets a false negative it thinks okay T 41 the key 41 isn't in the tree. So it's going to give back a wrong answer. Another thing that could happen is we could have somehow moved node H it could have gone moved or rebalanced around by concurrent writers and then we end up with a seg fault because now we have a pointer to a bad location node H doesn't exist anymore maybe. So the way that we're going to get around this in the B plus tree is through a technique called latch coupling or latch crabbing. I think latch crabbing is the old timer term for it. I think on Wikipedia you'll find latch coupling. I think it's called latch crabbing because of the way kind of like a crab walks on the beach moving forward in this kind of fashion. But basically the high level idea is it's a protocol or an algorithm that is going to allow us to have multiple threads access and modify the B plus tree at the same time. So the basic idea of the algorithm is that for each node in our traversal we're going to get a latch worth parent, we're going to get a latch for the child and we're going to release the latch only if the parent is safe. So what does safe mean? Specifically a safe node is one that we know is guaranteed not to split or merge based on the update that we're going to make. So if we're going to do an insertion or a deletion of a particular key in a node we can call that key we can call that node safe if we know that our insertion or deletion can't possibly trigger some kind of rebalancing or reorganization. So on insertion if the node isn't full then we know that there's going to guarantee to be room if we get the latch there's going to be guaranteed to be room to do the insert we need to do. If it's we're going to do a deletion we need to make sure that we don't need to merge the node if it gets too empty. So the high level idea is that we only want to release the latch as soon as we know it's safe and as soon as the thread that's performing whatever operation it's performing knows it's safe to release the latch. So this sounds really complicated, why are we doing it? The answer is that it improves concurrency substantially for B plus trees. So imagine we only allowed one thread ever to read from or write to the B plus tree that would really bottleneck our system. So we want to be able to allow these concurrent reads and writes to happen but we still need to make sure they happen in a safe way, error free way. So kind of the more concrete algorithm with the specific latches that we need we're going to always start at the root node and go down repeatedly so this is going to be important because as I said in the hash table case where we're always scanning forward in the hash table we're always accessing the data structure in the same way. In this case we're always descending the tree from the root node so we can ensure that all the threads are accessing the data structure in the same way. So basically we're going to start at the root node and we're going to acquire a read latch on the child and then as soon as we have the read latch on the child we know it's safe to unlatch the parent because we don't need that anymore. For inserts and deletes we're going to still start at the root node and descend the tree and we're going to be obtaining write latches, W latches as needed but as soon as we latch the child we want to check to see if it's safe. So if we check the child that we just latched and as I said in the previous slide we can determine whether it's an insert or delete whether the child is safe then as soon as we do that we want to release the latch on all the ancestors that we... the write latch on all the ancestors that we had as soon as it's safely possible so that we can let other threads make progress in the system. So just as an example if we have this tree here and we have write latches all the way down the tree starting at the root node we want to kind of get rid of our latches as soon as possible so we can let other threads access the data structure because otherwise we're kind of holding all the latches for ourselves. So we'll go through a few examples quickly and if you have any questions during them just stop me and we can talk through them. So the first one is just really simple we want to find the key 38 in the tree so we're going to start as I said at the root node going to acquire a read latch on the root node figure out which direction I need to go we know that 38 is greater than 20 so we have to go right so we're going to acquire a read latch now here on the B node and because we know that we're here at the B node we're safe, we have the latch that we need we can release the latch that we're holding on A because no one is going to come in and mess up the B node we have the B node already, we have the latch on it so we can safely release our read latch on A and other threads can come in now and do the same latching on A that we just did so again 38 is greater than 35 so we're going to come down here and again we have the latch on node D so we can release our read latch on node B and now we get down to this leaf node and again we can release our latches kind of in this coupling fashion and we're down here, we can read the value of 38 that we were looking for so we're done are there any questions about this? okay let's do something a little more exciting let's do a delete, so we're going to delete key 38 so again we're going to start at the root node take the right latch move down, figure out 38 is to the right so we have to go down the tree on that side we're going to get the right latch on B so the question here is whether or not we need to coalesce B or we need to change the structure in the subtree since we're deleting 38 we don't know what's going to happen below it we may need to reorganize so since we may need to do this reorganization at this level we can't release the latch on A yet we need to hold on to that for now so again we're going to move down here and now that we're at this node D we can see okay even if we delete 38 D is plenty full we're not going to need to do a rebalance so we know that D won't have to merge at all we don't have to change any of the structure we can just remove the key 38 from the D node so it's safe for us to release the latches on A and B so now the question is how should we release the latches what order do we want to release them in so by a show of hands how many people think it should be in reverse order so popping back up our tree in the reverse order like a stack a few okay how many people think it should be in the other order first in for like a queue okay how many people think it doesn't matter just one okay the answer is actually the well there's two answers so one answer sorry one answer is it doesn't matter we can release them in whatever order we want and it'll be logically correct the other answer which is for performance reasons we want to release them in the order in which we acquired them so you want to start at the top of the tree and release them in that order and the reason is because as I sort of alluded to earlier the latches at the higher level of the tree kind of prevent concurrent access to those larger sub trees or sub ranges of the tree so you want to as much as possible release the earlier acquired locks as soon as possible so the ones that are towards the top of the tree so again kind of just finishing out the example move down here perform the delete and we can just remove it and we're done release our latch and now we're fully done okay so that was a delete now let's do an insert so let's say we want to insert key 45 again it's the usual procedure here we start at the top we start acquiring these right latches in this case we know that if D needs to split that B has enough room to accommodate it so it's safe for us to release the latch on A in this case so again we can kind of keep moving down the tree acquiring our latches and we see here you know the nodes not going to split so we can release those latches on B and D that we had and then finish our operation so again kind of even before you perform the insert as soon as you get the latch you want to check to see if it's safe for you to release any of your previously acquired latches because again as much as possible we want to maximize the amount of concurrent threads that are able to access the data structure okay so I think probably we'll do maybe one more and then we'll roll over the rest of the lecture in the next class but so in this example is another insert we're going to insert key 25 and again it's the same procedure we work down the tree here until we get to the slot that we need to go in but now we have a problem because we're going to need to split F so we need to hold the latch on the parent node in order to prevent someone else from coming in and accessing the parent node until we're done doing our reorganization so we're keeping that latch around we get the latch on the parent, we get the latch on the leaf and now we're going to do the split here to put in the key so ignore the sibling pointers for now but this is the way that we do it now that we've done the reorganization we can release our latches so are there any questions about this? okay so I think we'll leave off with this observation so the observation is that what is the first step for all of the updates that we did in the tree? how does our algorithm work? what's the first thing we need to do? so the answer is take the latch on the root node that's exactly correct so in all of these cases we're always taking a right latch which again is more exclusionary than a read latch we're always taking a right latch on the root node so every single time we're taking this right latch on the root node and preventing other concurrent threads from coming in and accessing it so this can become a bottleneck if you have a lot of concurrent threads they're trying to access the tree and it's going to this algorithm here while it keeps the tree safe and it prevents concurrency errors from coming up it prevents us from getting really high concurrency so at the beginning of next class we will talk about a better latching algorithm that can help you get around these problems so I will see you next time S.E.K. talkin' about the St.I.N.I.s groove run through a can of two share with my crew is magnificent bust is mellow and for the rest of the commercial I passed the mic on to my fellow no need for a mic check, plus it the bees all set to grab a 40 to put them New Yorker snappers next St.I.N.I. take the sip, then wipe your lips cue my 40s gettin' warm I'm out, he got stiff lip drink it, drink it, drink it then I burp after I slurp ice cube I put in much work with the BMT and the e-trouble get us a St.I.N.I.s groove on the jump