 But he's Ram Poster DJ, TPL. Recipes, 5th dog, right? How's life? Good? Did you know that Jizzah, before the age of 27, had Wu Kang and his solo album under his belt already? I mean, yeah, the Wu Tang clan people were pretty young. But that's Jizzah's older, I think. And his first album was Prince Rakim. That was kind of goofy, but then he got s*** with Wu Tang. There's no Wu Tang questions on the exam this year. Previous years have been. So don't feel like you need to know these things. All right, again, awesome. Thank you DJ TPL. For you guys a lot to go over. The list is getting longer. So project one is due this Sunday at midnight. Again, we're having special office hours on Saturday. That's in person. They announced it on Piazza. I think it's on the 5th floor. In one of the carols. It's been bumped out to October 4th. It's a Wednesday. Make sure it may not have to be the same due date as project one. Homework three will be out this week. That's going to do four days later. And again, that kind of sucks that it kind of crammed it so quickly. But because the midterm exam, which will cover things. We'll cover things that in the homework three, we want to get that back to you guys and graded before the midterm exam on Wednesday. So midterm is going to be in class here on October 11th. It's a Wednesday. It'll be during the regular class time. If you need accommodations, please email us. And so we can start organizing and taking care of the logistics. Don't post on Piazza. Yes. We're trying to get homework three out now. It should be out today. I don't know if it's out today. Yes, that's the plan. But the trouble is like it doesn't cover things. It's going to cover like sorting and joins, which will cover like next week. All right. Any questions about any of these things? Yes. Whatever it says on the website now is the current plan. Yes. Yes. At the end of fall break. All right. We'll double check that. I thought we moved it so we didn't have to do it in fall break. Yes. We will take care of that. Other questions? All right. Cool. All right. So the last two classes, we've talked about data structures. We've talked about hash tables and then we talked about B plus trees. And I prefaced our conversation going into discussing the hash table and B plus trees stuff to say that to simplify the discussion and the explanation of these data structures and the algorithms that are used to manipulate them or work with them, we're going to assume that it's single threaded because that just makes your life easier. But of course, in any modern system in today's hardware, you need to support multiple threads or multiple workers running at the same time. Again, I'm going to try to use the word workers because that can mean either thread or process. Postgres is not multi-threaded, it's multi-process. Most modern systems are multi-threaded, but the idea is the same. But still, again, we want to be able to have multiple workers running at the same time be able to access these data structures so that if one of them has to stall because they're going to disk, we can have other workers run at the same time and do useful things. A system will look very unresponsive. If you only have a single worker, again, assuming it's a thread, and then I'm going to go access and run some query, and then as soon as I have to go my page table and the page I need isn't there, I have to stall because I've got to go to disk and get it. Well, while we're stalled, the CPU is essentially stalled, we can have other threads or other workers do useful things. So that's the goal of what we're going to talk about today is how do you actually make these data structures thread safe. So what I'll say is that this is how most systems are going to be implemented. Most systems will try to take advantage of multiple threads. There's a sort of category of systems that actually don't do any of the things we were talking about today, and the most famous one is probably Redis, that it's a single process, single thread, so all the latching stuff we're going to talk about today, they don't have to do because they know no other threads running at the same time. There'll be other systems where they'll have, they'll still be multi-threaded, maybe have only one writer thread, but multiple reader threads running at the same time, and that simplifies a bunch of things, but you still need the latching protections that we're going to talk about today. So the thing that we're going to use to enforce the threads or workers to behave a certain way so that we don't end up with corrupted data and invalid data structures, is going to be called a concurrency tool protocol. And again, for today's class, we're going to see how we do this for workers after the midterm, we'll discuss how we use concurrency tool to coordinate transactions. And so you can sort of think of this concurrency tool protocol as like the traffic cop of the system that allows you to tell different workers who's allowed to do what at what given time, right? And the idea is that we're going to be operating on some shared object or some critical section, and we don't want to have them interfere with each other and cause problems. And the two types of problems you could have are logical correctness and physical correctness, and I think I mentioned this last week as well. So the logical correctness, the idea is that if I insert a key into my B plus tree, I insert a key five, and then if I come back and now try to look for key five, I should see it. Or the other example I said was if I delete key five and I come back and try to look for key five again, I shouldn't see it, right? So at a logical level, that's, you know, we want to make sure that we're seeing the things we should see in our data structures. The thing that we care about in today's class is the physical correctness, meaning how do we ensure that if we're walking through a hash table or traversing the B plus tree, and at some point we've got to follow a pointer, like a page ID to take us somewhere else, that page ID is correct. Like it's not going to take us to, you know, a table page that has a bunch of garbage in it. All right? Because what happens if you go follow a page and you start looking at data that doesn't look like you expect to look like, you're going to have to fall because you're going to try to read back, pass some buffer or things are going to break or you get corrupted data. So again, logical correctness we'll worry about later at the midterm. Today's class is really about physical correctness. So I first want to describe, go over quickly of what latches are again, and how do you actually implement them inside of a database system. And again, the takeaway here is that we don't want to, ideally we don't want to rely on what the operating system gives us in terms of latches. Then we'll see a simplified example of how to do hash table latching. We'll send most of our time doing B plus tree latching and we'll see a sort of a basic version and an optimized version. And then we'll finish off talking about how to handle leaf node scans. Okay? All right, so I think I showed this slide before and again, I just want to revisit it again. This stage between locks and latches. And again, if you're coming from the OS world or distributed world, they might mean, when I say latch, they might think lock. Right? But in database, that's what we care about mostly in my life and in this course. So we need to make sure we understand what we're talking about when we say lock versus the latch. So a lock's going to be this high level primitive, protection primitive that allows us to protect the logical contents of our database like a tuple, a database, a table, right? And when we acquire one of these locks, the transaction will hold that lock for the duration of that transaction. It's not always true. We'll see examples where we can release locks maybe early, but for our purposes today we'll assume that's the case. And then there'll be some higher level mechanism within our concurrency protocol that's going to ensure that we don't have any deadlocks. And then if a deadlock does arise, then the database will have a mechanism to be able to roll back the changes that the transaction made to make it look as if it didn't make any changes. So we don't have any partial updates. Today we're focused on latches. And so the latches are going to be the low level primitives that we're used to protect critical sections in our data structures from one worker against another. And so the duration of the lifetime that we're going to hold a latch is going to be very short. Like think of a critical section, I'm going to go take a latch on a page, make some change, and then release that latch immediately. And because it's going to be very simple, we're going to minimize the amount of bookkeeping we're taking for these latches, we don't want to have, the database is not going to be able to automatically roll back any changes for us. We don't avoid deadlocks and try to not make any changes unless we acquire the latch for something. So then we don't have to roll things back. We're going to have sort of minimal coordination between the different workers running at the same time. Whereas in the lock case, I'm jumping ahead, there'll be a table literally internally on the lock table, and you go and look in there, you can see who holds locks for different objects. In latches, we don't want to maintain any of that because that's so expensive. Relative to the amount of work we want to do within a critical section in our data structures. So there's this table I also like from the book I recommended last time from this Guy Gretz graphic, the B-Tree book, where he again shows this exchange between the locks versus latches. And the way to sort of read this table is within a column you read down and say, you know, what the thing is protecting, and the different ways it's protecting things. So for example, a lock is going to separate transactions from each other, and it's going to protect the logical database contents, pages, or sorry, tuples, tables, databases. And we can hold them for the entire length of the transaction. We'll talk about modes in a second, but we can take a lock for an object in different modes, like exclusive, shared, intentional updates we'll get to later. And then the database will provide either deadlock detection or deadlock prevention mechanisms built in to avoid these problems. And then again, these are the mechanisms to do this. And then the information that we're going to keep track of what locks are being held is going to be kept in a lock manager, a centralized data structure. Today again, we're focused on latches. So latches are going to protect workers from each other. This will be only for in-memory data structures. So like this is literally for like the, you know, B-plus trees in memory, but one just, you know, if a page within our B-plus tree gets flushed out to disk, I wouldn't hold the latch for that thing when it goes out to disk because it's meaningless. All right. It's protecting the critical sections. There's only two modes we can hold our latches in, read and write. And the way we're going to handle deadlocks is through coding discipline, but us as the systems developers have to write good code to make sure there's no deadlocks. Easier said than done sure, but like, there's not going to be some other part of the system that's going to bail us out. And we're going to keep the information about these latches are actually embedded in the data structure itself. So there won't be a centralized thing. Again, this makes more sense when we start walking through the different data structure types. And so the lock stuff we'll cover after the term in lecture 15. All right. So our latches only have two modes. It can be the read mode or a write mode. So read mode are, there are commutative operations where you can have multiple workers take a latch in the read mode at the same time because you know whatever they're doing isn't going to, you know, isn't going to break whatever the data structure is or calls any conflicts, right? Like, if I need to, if two workers need to read the same page and I can take that in read mode, well, that doesn't, they're not doing any writes. It doesn't break anything. So I can go ahead and have them both front at the same time. Write mode or exclusive mode is when you know one thread is, wants to access the object and actually make changes to it. And I don't want any other threads to operate on my object at the same time. So only one, only one worker can hold the latch in write mode and that blocks everyone else out. All right. And a really simple compatible matrix will look like this. If I have a read mode, if I have a latch in a read mode, and someone wants to get a latch in a read mode, I can do that. That's allowed. But any other combination where at least one of the latches either holds it in write mode or wants to get it in write mode, I have to deny that. So, again, going back to what I said before about coding discipline, the stupidest thing to do is to take every latch in write mode, even though you're only going to read it, it'll protect all your data structures, but you're basically going to get relegated to a single thread system. And likewise, if I take my latch in read mode, but I start making changes to whatever it's protecting, then that's our fault. That's the programmer's fault. And the system crashes. That's on us. All right. And there isn't any, without getting into verifiable languages, there isn't really any mechanism in C++ or Rust that's going to protect us for these things. Right? Because the compiler can't know. All right. So let's talk about how you want to implement latches. So, ideally, we want a latch that has a small memory footprint because we don't want to store a lot of additional metadata for our latch because, again, these are embedded in the data structure itself. And ideally, we want to have it be when there's no contention in the system, meaning there's no two threads or workers trying to acquire latch at the same time, we want to go as fast as possible, with minimal overhead. Now, acquire the latch and do my thing right away. If we have to, if we can't get the latch we need, then we have to make a decision of how long we should wait and how we want to wait. And we'll see different scenarios of how we want to do this. And ideally, also, too, we don't want to have a bunch of metadata per latch about, like, who's waiting for this latch because that's now basically a queue for every single latch you could have in your data structure. And I think of, like, a giant P-plus tree with a billion entries, how many pages you're going to have in there. Each of those could have another own priority queue. Right? So, again, coming from the database world, we say we don't want to rely on the OS to do any of these things. But then the OS people say, the data they don't know what they're doing, and they should not be implementing their own latches. And you can see this in, like, the Linux mailing list. So here's a post from Linus saying, like, oh, yeah, like, you know, don't, you should not be writing your own latching thing. And it basically says here, like, you should not use spin locks, but I'll claim that in a second, in user space, that's us, where the data system is running in user space. It says you should not be using spin locks to roll yourself unless you know what you're doing, and the chances are you know what you're doing is low. He's wrong, despite being Linus, right? All right, so I'm going to go through three basic implementations of how to implement latches. There's more advanced ones, like the parking lot stuff from Apple. This is probably the best one to use right now. And then there's MCS locks, which is a queuing thing. We'll cover this in the advanced class, but for our purpose here, we don't need to know this. But we need to understand the basic implementation of what latches are actually doing. So when you start sprinkling them in your code, you understand the ramifications of them. Yes. You mean P thread mutexes? We'll get to that in a second. Yes, her question is, what's wrong with C++ mutexes? So if you call mutex in C++, what do you actually get? But what? Yeah, but like, who's implementing that? P thread. But how does a P thread work? Two more slides. Okay. Okay. All right, so, maybe there's just a slide. All right. So the most basic... All right. I'll take that. This is database world. We'll talk about the OS one next. So the most basic latch you can implement is called a test and set spin lock. It's called a spin lock. And I realize I'm calling them spin locks when they're really latches, right? But this is the most simplest way to implement this is because it literally is a 64-bit memory address that you're just going to do an atomic compare and swap on to see whether you can set it. And if you can't set it, then you spin and keep trying to set it over and over again. Like, the code would literally look like this. I declare an atomic Boolean. This is just a syntactic shortcut for declaring something that's atomic. And then now I have this, my latch here, and I call it test and set. And it literally is just trying to set... Check to see whether it's set... The current value is zero. If yes, then I can set it to one. And I can do that in a single instruction automatically. So it's not like if, then this, then that. And somebody else can come and swap in and change it before I can. It literally is one instruction to go and apply this change. And if I can't get it, then I just spin. Because we're doing this in the database level and the user space, we can decide whether how many times we want to retry it, whether we want to yield the threat to the OS or abort ourselves and restart. All right? Why is this bad? What's that? Waste. Sorry, I heard waste. You said... You're busy waiting. You're busy waiting. So basically spinning the CPU, check, check, check, check, over and over again. But I got... So I could put maybe like... I could put an exponential back off to say, okay, I tried to get it. I couldn't get it. Wait one millisecond, two milliseconds, four milliseconds. Right? Actually, that's a challenge that you're just spinning over and over again. Another problem is going to be cash coherence traffic. Right? So again, assuming I have a two-stock at CPU, the latch I'm trying to acquire is over here on this NUMA region. Does everyone know what NUMA is? Non-uniform memory access. Basically, if I have two sockets or two or more sockets on my motherboard, each CPU socket is going to have DRAM that's close to it and it's really fast to talk to that. But I can also talk to memory that's over on another socket called a NUMA region. But that traffic is much slower because I got to go over this interconnect between one socket to the other. Right? And so Intel, they do a lot of work to make sure when you write programs, you don't know and don't technically have to know where memory actually is physically located. But of course, now you could have your program access something that's on another socket and it gets really, really slow. And the hardware tries to play games and move things around for you, try to speed things up. We can ignore that for now. And for this class, we don't have to worry about NUMA. I just want to explain that a worker running on this CPU over here wants to acquire the latch that's in this other socket's CPU memory. So it's going to keep spinning it over again. But now like it's all this traffic over this interconnect that's going to slow my entire database system down. Right? So this is inefficient because I'm spinning, but also like the traffic on the actual hardware itself is expensive. So now her question is, what's wrong or how does actually the C++ mutex actually work? So this is called a blocking mutex. It's the easiest thing to use because it's built in C++ and it basically you acquire and release. There's not a lot of mechanisms in it. And the way you use it is sort of like this. You lock it and unlock. Do whatever you want in the middle. So I asked her, how is this thing actually implemented? Does anybody know? So if you call it SCD mutex, what do you get in C++? P thread mutex. Well, how is P thread mutex implemented? It's called a Futex. It's in Linux. Have you ever heard of Futex before? Fast user space mutex. So the way it works is it has the spin lock that I just showed in the last slide. In user space, they'll have their own little test and set thing you can do. But if you try to acquire it and you can't, then you fall back to a heavy weight mutex inside the kernel. So if no one holds the latch with the Futex, I try to acquire it. If no one holds it, then I just do a compare and swap real fast in user space and I'm done, and my program or my thread keeps running. If I can't get it, then the OS takes control of us and we go down now into the kernel and then we get descheduled because it knows that I can't run until the thing I'm waiting for is available. But what's down in the kernel? How do they keep track of threads? What's that? Sorry, say again? Blocking queue. Blocking queue, but like, there's also, the scheduler has his own hash table to keep track of what threads are running. And they use their own latches to protect that data structure. So if I can't acquire this, I go down in the kernel and I get descheduled and that's very, very expensive. Syscalls are expensive. We want to avoid them. So this is just the diagram like this. So again, I have two workers running different sockets. They both try to acquire at the same time. One of them will get the user space latch. The other guy tries to go down and get the OS latch and they get descheduled. And again, this is slow at the same time you involve the OS. This is bad. So the first two latches I showed you, they didn't really have modes. It was just like, it's all or nothing. And so the way you implement this in C++ with the reader-writer latch, you can use shared mutex. I think we do the read-write lock in bus tub, which is just a p-thread rewrite lock. And the way this basically works is that the latch itself is going to have its own priority queues, its own counter to keep track of like how many threads are waiting. You actually can find the scheduling policy for the latch itself. So the idea here is that if I have a reader thread comes along and wants to acquire the latch, I go check to see whether anybody is waiting for the read latch or the latch being in read mode or write mode. If it's available, then I increment my counter to say somebody's holding the read latch now and I go ahead and do whatever I want. And now if anybody else comes along and also wants to acquire it in read mode, the system knows, the latch knows I'm in read mode right now, so it can let the other guy run as well. But now if a right worker comes along, tries to acquire the latch, we have two read workers already holding the latch in read mode, so it's going to have to stall and then maintain an internal priority queue to keep track of like what threads are waiting for this. So then depending on the policy you can figure in the latch, if another thread comes along that wants to get it in read mode, and in theory I could acquire it because it's commutative with all the other latches or the workers that hold it in read mode, I could acquire it right away, but you can set the policy to say I know another thread is waiting for it in write mode, so let me go ahead and put it to sleep. And in C++ I think they're doing this all in user space, not down on the kernel. But when you have to then block and wait for the acquire the latch you're looking for, then that's going to be an OS mutex, which we don't want to do. So I'm just showing you a high-level overview of the test and set operation. The compare and swap is the basing building block that you used to build more complicated latch primitives. And depending on not where you want the OS to do it or not, most bigger database systems, the enterprise ones will not rely on the OS for anything. And it's a combination for portability and also it's just faster to avoid the OS and line it so it's wrong. Okay? All right, so now let's... Oh, sorry, yes. I just want to clarify. The question is, is parking mutex and Apple, is that built on a test and set? All of them are, yes. Okay. Yeah, like that's the... Does everyone know what compare and swap is? No. Okay. All right, a slide. One slide. All right. So compare and swap is this atomic destruction that modern CPUs provide that allow you to check a memory location to see whether it's a current... whether the current value of that memory address is what you expect it to be. And if it is, then I can go ahead and overwrite it with my new value and again in a single instruction. If you had to do this in C++ code to be like, if the value equals this, then set it to that. But again, if that was just the actual instruction to do that, that would be multiple instructions. And by the time you go check to see whether it's that value is a flag set to true, by the time you go in and go to update it, somebody else might have squeaked in and sneaked in before you did and updated before. And so on modern CPUs, you can do this in a single instruction that's atomic to guarantee that when you check it and set it, no one else can get in before you do. And then that's the basic perimeter allows to do more complicated things. So there's a bunch of different intrinsics in C and C++ so you can use for this, right? They have different versions of this, like if the test and set can succeed or the compare and swap is a seed, they'll return back the old value or the new value or true and false. So let's say in this case here for this intrinsic I'm saying here's the address I want to check and assume it's a 64 bit integer, here's the value that I want to see whether it's currently set to, and then if it is, here's the value I want you to set it to right now. So we jump to this memory address here, it's 20, does 20 equal 20? Yes, then I can go ahead and overwrite it with 30. Pretty simple. But that's again, that's the building block we need to have all, you know, to build all our more complicated latches. This compare and swap stuff was added. I think it was like the late 90s, at least in x86. So we good? Okay, cool. All right. So let's see how we can do this for hash tables now or use latches with hash tables. So hash table is going to be easy to support because assuming we're going to linear prepashing is that there's only certain many ways you can actually access the hash table, right? Assume it's linear prepashing, I hash to some location into my hash array or hash table, and then I scan down from top to down looking for the entries I'm looking, you know, that I need, right? And in this case here, because the threads are all moving in the same direction, like going top down, even though they may start at different locations in the hash table, I can't have any deadlocks because there isn't one thread going top down another thread going bottom up, right? So the question is going to be to what granularity do we want to have our latches protect our data structure because that's going to determine the amount of parallelism will be able to support. For this lecture, we're going to ignore how to handle resizing the table. The simplest way to handle that is you have a right latch that protects the access to the data structure itself. So if I get full and I need to double the size of it, I just switch that latch into right mode and then do my resizing. And that prevents everybody else from coming in. Like, that's the easiest way to do this. All right, so the scopes of our latches can either be within a page or a slot. And again, this is going to determine the amount of parallelism we have. So obviously when a page latch is going to protect the entire page itself with a latch, and no matter whether you want to read one entry or all the entries in the page, you would hold a latch on the entire thing. The alternative would be you would have a latch for every single slot in a page. And this is going to allow more fine-grained access, but again, now the challenge is that it's going to take more space because every single slot needs to have a latch. And now, as I'm scanning through my hash table, I got to acquire the latch for every single slot as I'm going along. So again, there's no free lunch in systems or computer science. It's either I have a single latch per page, which I don't have to acquire once for the page and it doesn't take a lot of space, but then it blocks everybody else out from reading the entire page, or I have it for every single slot. So say a real simple hash table like this. T1 wants to come along. Thread 1 wants to come along and find D. If we hash D, we land at this location here. We get the entire page in write mode. Look, try to find the entries that we're looking for by just scanning down. But at the same time, another thread wants to come along. It wants to insert E. Same thing, I hashed to this page, but the page is already latched in read mode and that's not commutative with the write mode latched and needs to do the insert. So we'll have to stall thread 2. And again, whether it's spinning in user space or got descheduled by the kernel, that depends on your latch implementation. So now when thread 1 is done scanning this page, it can jump to the next page. It still holds the latch to the page it started at because it needs to know where to look at next to make sure nobody's moving things around. And so we can then release the latch on page number 1, acquire the latch on page number 2, and then now thread 2 can start running and try to figure out where it wants to insert. The same thing, it wants to come down here to do the write. The read latch is not commutative with the write latch, so therefore it has to wait. And then once thread 1 is done, thread 2 can then acquire the latch and do the update to assert its entry. Yes, question? Or we schedule things in. Isn't that sort of a race condition? The question is, if I hold the, I have a read latch, sorry, it's a scenario again, sorry. I guess the writer happens to sort of have our scheduling policies. So we're going back to the very beginning here. A race condition in terms of what our output is. If we have something we're trying to read at the same time we're trying to write. So again, say this scenario here. Say T2 shows up before T1. And what's the race condition you're trying to deal with? So when you say read, write, you read and then write and then read, that second read is that another find, a key? Yeah, let's say it's so again. Yes? Our scheduling policy puts the find before the write. But is it wrong? Yeah, so yeah, so there's a, this is the logical correctness thing, right? Like if say I do a select, I run a literal select query that doesn't look up in this hash table and I find D, right? And I get back the answer. Now some other transaction, some other thread comes along and deletes D, right? And removes it from the hash table. Then the first thread again, now runs another select query that finds D and it doesn't come back with it, right? From the data structure perspective, that's fine for what we're talking about today. But if we talk about it after the midterm, that's an anomaly of inconsistent reads and that's something that the control mechanism for the system will handle in transaction level. The low level of the data structure, we don't care. It's correct. Who decides what, in the order of what writes we should see or not see, that's a higher level thing that comes later. Yes? And the great thing about databases or these transactions is like there's multiple answers that can be correct potentially. But we'll get to that later. No, well in the specification of SQL there's a description about what is considered correct or not correct at different isolation levels which we haven't covered yet, right? And the easiest, sorry, the most strictest correctness level would be, isolation level would be called strong serializable or strong serializable. Basically it means that like whoever comes first should see the system as if it was running by itself and its changes get applied first, right? And then everything else comes after that. But getting way, way ahead of ourselves, from the data structure perspective I don't care that your thread came in and deleted, look for D, then D got lit it and you go to look for it again and it's missing. That's not the data structure's problem. That's somebody else's problem. It's our problem but like not today's lecture. Through the data structure problem and it's really hard, yeah. It's why all the NoSQL guys did new transactions at the beginning because this shit's hard, right? All right, we'll come to that later. There's a lot of things they didn't do because it's hard and then they actually had to do it later, all right? So again, let me just show you how to do two slot latches. Again, now I have a latch on every single slot itself at my hash table. So now when I do a find D, I get the latch on the re-latch on the slot itself. I go ahead and read what I'm looking for and say this guy now wants to jump to this page. He wants to get the right latch on C, ignore how he found that because he hashed there. Now when the first thread tries to get the re-latch on the next slot, he can't because the other thread has the latch, so he has to stall. But at this point, even though he's going to stall and wait, it's safe for him to release the latch on the previous slot because there's no reason for him to keep holding this latch because he's just going to spend and wait for this thing here and everything's still going to be correct. Now I mentioned the reorganization of the hash table itself. If you have to resize it, assume that there's some other latch protecting the entire thing that's in read mode, so that's okay. We only set that to write mode if we have to resize the whole thing. So the global latch for T2 would have the global latch in read mode. Yes. So this is the slot in page? No, this is just thinking like it's some offset and some page for our hash table. So it means it's all fixed length. Yeah. So like this... Is it just lining up the page into more parts? Yeah, correct. Yeah. Basically, do you want fine-grained latches or coarse-grained latches? That's all it is. So this should be pretty straightforward. Okay, let's get to B-plus trees because this is harder and more fun. So again, just like before in our hash table, in a B-plus tree we want to have multiple threads reading right at the same time. The challenge here is now in the hash table, at least in linear probing, the number of pages was fixed and the organization of the data structure was fixed. Meaning like no matter if I create my hash table and I have, say, a million slots, it doesn't matter whether I have a thousand threads going at it or one thread going at it. I'm always going to have a million slots. It doesn't matter if I have a half a million keys in it or not. The data structure is always the same. In a B-plus tree, since the data structure is self-organizing or self-balancing, as I insert things into it or start deleting things from it, it's going to start reorganizing itself. So I need to make sure that as I'm reorganizing things, as I'm doing splits or merges, I have to make sure that the data structure is correct. So let's see how things can go bad. Say we have a thread here and they want to delete key 44 at the bottom. So what do I do? I traverse down and I look at the guide post markers to figure out whether I want to go left or right and I reach down to my leaf node here and I go ahead and delete it. Now I have to rebalance, I have to do a merge because this leaf node is less than half full. So maybe I'll steal an entry from my sibling. He's going to move 41 over. But before I can do that, my thread gets descheduled for whatever reason. The OS has to do something, gamma rays came down, it doesn't matter. My thread's not running anymore. So now while thread one is asleep, thread two comes along and they want to find 41 and they start traversing down just like before and they get here. And again, now they look at the guide post, they realize they want to go down to the right, to node H, but now it gets descheduled for whatever reason. And then thread one wakes up, moves 41, thread two wakes up, goes down to the node thought we needed to go to, and now the key's not there. Best case scenario, you get a false negative here. Worst case scenario, you crash and the system fails and you can corrupt your data. So we need latches to protect this thing. So the technique we're going to use is called latch crabbing or latch coupling. I think the textbook calls it latch coupling. I think the Wikipedia calls it latch coupling. But it's basically the protocol we're going to use to decide as we traverse down to our tree what latches we want to take and then when can we release the latches up above us. Because again, the easiest way to protect this entire data structure is to put a giant latch on top of the whole thing and then everyone has to get gate keep through. But then that becomes a bottleneck. So we want to be more clever and selectively release our latches as we go down when we know it's okay. So the basic protocol is, in order for me to go into the B plus tree, you always have to last the root, but once I'm in my root and I figure out whether I want to go left and right and then I acquire the latch from my child that I'm going to go down from the current parent of Matt and then once I know I'm okay and I'm able to go there, I can release my parent latch for that current node of Matt if I know it's safe. And the definition of safe is going to be that we know that based on the operation we're trying to do, like insert or delete, if it's an insert, we know that the child isn't full and therefore if I have to do a split, when I insert my key, it's not going to cause me to do a split at that child node which may get propagated up to the parent. If I'm going to do a delete, if I know that it's more than half full, then again, if I remove the key, I know I'm not going to do a merge which again will propagate things up to my parent node here. So again, the basic protocol is that I start at the root, first down, acquire read latches on every child as I'm going down because again, I'm doing a read operation, I'm not doing any updates, and then I just unlatch my parent for doing insert or delete, I start the root taking write latches as I go down and then once I have my child node that I'm going to move to in write mode, then I go check to see whether it's safe and if it is safe then I can go release any latches I have up above me. So let's go back to our example here. So I want to find key 38. Again, I start at the root, take the root node in write latch mode, I get down to, sorry, then I acquire the read latch on B, then I move down and at this point here, it's safe for me to release the latch on A because again, it's doing a read operation, I've already arrived at B where I need to go, so I'm going to go there. So I can go ahead and release the latch on B, get the read latch on D, same thing, it's safe for me to release the latch on B, so I go ahead and do that and I get down to H and so forth and I finally get the key I'm looking for and I'm done. Let's delete 38 now. Again, start at the root, the root node in write mode latch, I get a write latch on B, move down here and again at this point here, if I do it, I don't know what's going to happen below me in the tree, below B and because if I know if I have to delete a key from the node I'm currently at on B, I'm going to have to do a merge, so therefore I can't release the latch on my parent A because I may have to go make changes to A up above. So I'm going to hold the write latch on B and A, get down to D, now at D I see that, no matter what happens to me below in the tree, if I have to remove a key from D, D is not going to do a merge and therefore it's not going to make any changes up above it, so it's safe for me to go ahead and release the latches on A and B. Does the order matter? Sorry, question. Which node is the safe node? Whatever you're at now. So if I'm at B, I need to go to D, so I get the latch in write mode and then now I have it in write mode, then I check, am I safe? Because you can't check whether you're safe until you hold the latch on it. This is safe and go ahead and release everybody up above me. Yes. This question is, is it being half full, the only check to see whether it's safe? What else would cause a merge in case of a delete? That's the thing, we only care about it, like splits and merges. We're trying to make sure that we don't screw ourselves with that. All right, so at this point here at D, we had to release the latches on A and B, because again, D is safe. Does the order in which we release those latches matter? He says, it makes more sense to release going from the bottom to the top by releasing B followed by A. Why? If I release B and someone's waiting for A, yes, so he says if you release B, the other thread is waiting for B or A, sorry. It has to go on the same path. Yeah, it has to go on the same path. A latch, but yes. Okay, but if I release it on B, it's still waiting for A. Also too, what if I have a thread that's waiting for A, but it wants to go down this side of the tree? But I release B. It's still blocked. So from a correctness standpoint, it doesn't actually matter. The data structure will still be correct whether you go from the bottom to the top, top to the bottom. For performance reasons, we want to go top-down because you want to release, you wait to think about it, the latch is protecting everything below it. So if I hold a right latch on the root, I'm protecting the entire data structure in right mode. Now there may be a bunch of re-threads and other threads doing stuff over here, but that's okay because if their modifications would have caused us to do a split or merge, then up the entire to the root, they would have to still hold the right latch on this. So again, the main takeaway is that we want to release latches as soon as possible. And we want to release latches that would have the most, that would free up the most of our workers in our data structure. So we always want to release from the top going down. All right, so then we get down here and we get the right latch on H, go ahead and do our delete and we're done. All right, so let's look at another example. We're going to insert 45, same idea. I get the right latch on the root, on A, go down to B. At this point here, I know that if whatever bolt is below me in the tree, if it has to do a split, I have room in B to accommodate another key. So it's okay for me to go release the latch on A, get down here on D. Now D is completely full. I don't know what's below me in the tree because I haven't gone there yet. So at this point here, it's not safe to release the latch on D until I go down and now I can see that inserting to I would not cause a split on node I. So I can go ahead and release the latches on B and D going top down. And I can insert my key. All right, let's look at one where there is a split. For simplicity, we can ignore sibling pointers. So I'm doing an insert 25. B is safe, I'm going to release the latch on A. I get down to C. C is safe, release the latch on B. Then I get down to F. F is not safe. So I can't release the latch on C. And now I need to do a split and that split is going to cause me to insert a new entry up into node C. So I'll go ahead and do that. Add my new node J. I'm running out of space because it's PowerPoint. Again, we're ignoring sibling pointers for now. Go ahead and now do the update to C. You can now include this pointer. And then once I apply all these changes, then I release the latches going top down. So I've already given this answer before, but what was the very first thing I did in these scenarios when I'm going to do updates and insert some deletes? What's the very first thing you have to do? Get a latch on the root node. Right? And like, yeah, that's correct, but this is a bottleneck now because it basically becomes almost a single threaded data structure. That everybody has to go into the system, sort of go into our data structure. The first thing you have to do is acquire the latch in write mode in the root. If you're doing reads, that's not going to be compatible, so it'll block all the readers too. Again, it's correct, but for my performance reasons, it's not ideal. And so the common technique everyone uses is this optimistic latching scheme I'll talk about now. I don't think the algorithm has a name. It's from this paper from, I think it's from the 70s. Is there a date on that? It says 77. Yeah. These guys that I be on bear and schlock neck, sometimes it's called the bear schlock neck algorithm, which is kind of cumbersome to say. But it's based on this observation that you know that most of your threads, most of your workers, their operations are not going to cause a split or merge into your B plus tree nodes. Again, my example is here, I'm showing nodes with two keys in it because that's a fit of the PowerPoint. But in a real system, the size of a node is going to be the page size of your database, so like 8 kilobytes, 16 kilobytes. And you can store a lot of keys. So the most of the times you can do a bunch of inserts and that's not going to cause any splits. And likewise for deletes. So if you assume that splits and mergers are going to be rare, then instead of taking right latches all the way down, even if you're doing the latch coupling scheme, you're going to take re-latches all the way down until you get to a leaf node, right above the wreath node, and then now you check to see whether that assumption that you're not going to have to do a split or merge is correct. And if it is, then you go ahead and acquire the leaf node in right mode and then do your change. But if you're wrong, you just restart and then do the pessimistic approach of just taking right latches all the way down. So this will be a common theme you see, not just in databases, in a bunch of different systems in general. This is an optimistic scheme where you assume that you're not going to be any issues, not any problems. And you do the sort of fast way of making some change or doing something in your system. And then if you're wrong, you just roll it back and take care of it, right? Intel actually had this in, it was called TSX. We actually had this in the CPU itself. I think it was a bug in it. I think they turned it off. It might have got turned back on. But it was like this optimistic memory stuff where you could have a critical section where you assume you're not going to have any conflicts in some critical section, and then when you went to go apply the change, then you just don't check to see whether that assumption was correct. And if not, it would roll you back automatically. But again, we'll see this when we do talk about computational transactions. This is a very common technique. You do the fast thing because most of the times there won't be any issues. And if you're wrong, then you have to roll back and try again. All right, so with this better latching scheme for doing lookups and finds, that's the same as before. For inserts and deletes, again, we basically do the search, taking read latches all the way down until we're one level above the leaf node. I mean, we know where we're in the data structure because we can keep track of how many levels down we are. Either in the page or a simple counter would work, too. You go acquire the level right above the leaf node. You acquire the leaf node in a write mode. Then you check to see where it's safe. If it is safe, then you release all your read latches that you took from before. Apply your change and you're done. If you're wrong, then you just release all your latches and go back and take write latches all the way down. You could do the optimistic scheme again because assume next time you come back around, things will be safe. It depends on the implementation. So this works really well in low-contention environments because you obviously assume there won't be any conflicts and most of the time you're correct. And so things run faster. All right, so let's go back from the example before. Let's delete key 38. Again, instead of taking the root node in a write latch mode, I'm taking a read mode. Keep going down until I get down to D. Now D recognizes that it's one level above the leaf node. So I want to delete key 38 from node H. So I take the H into write mode. Check to see if it's safe. It is. I can go delete it and I know I'm not going to do any merges. Right? So again, best case scenario. I traverse the data structure almost as if I was doing a read and therefore I can have a maximum on a parallelism. But just only at the bottom do I check to see whether that assumption was correct. See how to do insert 25. Again, take the root in read mode. Take B in read mode. Do the coupling as I release latches as I go down. Now I get down here into F. In case of F, because we're trying to do an insert, F doesn't have any more room so it's not safe. So we're going to restart the whole operation and then just take write latches on the way down. Neat trick, right? So in all these examples, as I've shown so far, we were only going in one direction. We were only going top to the bottom. And as I said, there weren't any dev locks because nobody, you know, everyone's going to the top. They're always starting at the same point and are going down. There's no, you know, as I said last class, there's no pointers to your parent. You can't go back up because that's where you could have conflicts. But again, because we're at B plus tree, we could have sibling pointers. And now we have a challenge where we could have one thread going one way and another thread going another way, and they both hold latches for what the other person wants, what the other thread wants. So now we got to deal with that scenario. The original B plus tree paper, this wasn't an issue, but the B link stuff that came from CMU, that's where they added sibling pointers and that's where you can have dev locks. So let's look at a simple example here. So I want to thread one. They want to find all keys less than four. So we're going to get the root in read mode and then get the C in read mode. And then let's say, then I want to scan across. So it's going to follow the sibling pointers. So just like before, I hold whatever node I'm at now, I hold that in the current latch mode that I have it. And then I then try to acquire the latch to where I want to go to. So in this case here, it again wants to scan C, I want to go from C to B. So I hold the latch on C, get the latch on B, move over here and then I can release the latch on C and then do whatever it is that I need to do. Right? So the protocol is basically the same thing, even though we're now moving horizontally instead of vertically. Yes? This might be the same thing from earlier, but if we're reading this one... We'll get to that in a second. Yes, same schedule. Yes. All right. So again, the read modes are so commutative, so I can have two threads doing read at the same time. The first thread goes down to C, the second thread goes down to B, and they want to go across each other. And in this case here, the two latches that are holding are commutative. So therefore, they can both do whatever they need to do. That's fine. All right, so let's now do when we have a... One of them wants to do a write, one wants to do a read. So T1 wants to delete the key 4, and T2 wants to find all keys greater than 1. So they both start at the same time. I assume we're doing the optimistic lock coupling. I just talked about a latch coupling where they start at the root, both in read mode. Thread 2 goes down, takes B into read mode. Thread 1 goes down, takes node C in write mode, and that's the key that it wants to delete. But now thread 2 is scanning across leaf nodes, and it wants to acquire the latch on C in read mode, but it can't because T1 holds that in write mode. So we have to decide what we want to do here. And T2 doesn't know anything about T1. Because I said there's no centralized data structure that says here's the threads that are running, here's what they're doing. All it sees, all it knows is that there's a latch on this other node that I want to go to, and it's currently in write mode, and that's not compatible with the mode I want to put it in. So you have to do something. So what can T2 do here? The wait, that's one option. What else? What's that? Are you reading the slides? Okay. What's the third option? It can go street on the other thread and try to kill it, and kill it, right? And take the latch from it. So what do you think is a good idea here? What's that? It says wait for how long? Yeah, I heard it forever. But how long? But how do you know? That's just waiting, right? It's just spinning until the latch is available. But like, how does it say this? Do you know what T1 is doing? No, you don't know anything, right? What's that? Do you? Well, yeah, you know it's in write mode, but how long is it going to take? How? Which one? Do you kill yourself from reschedule, or do you kill the other guy from reschedule? What do you want to do? Kill the other guy. She says kill the other one. Fantastic, all right. How do you do that? What's that? Well, yeah. What do you mean, keep it long, everything you've done in the past? Yeah, that sounds expensive. What is a normal amount of time? It says wait for the average time for a right to happen. But like, you don't, how does it say this? In my simple example here, because I have to fit it on PowerPoint, there's two nodes in the leaf, right? What if there was a bunch of leaf nodes all over here? I got to hold all these guys in everything in write mode, because I want my changes to happen atomically. So I don't know whether this other thread keeps going in the other direction. What is a normal time? You don't know. He says give up your write lock and let the other guy read. But how do you know they're waiting for you? If you're T1, how do you know somebody else is trying to get your latch? You don't. Yes. Bingo, there you go. Excellent, yes. So you should kill yourself, right? So killing another thread is hard, because how would you implement that? Can you send an interrupt? When I get an interrupt panel, that's expensive, right? That's a sys call, right? Is there like a flag you say, should I kill myself? And you have to check that every so often. How would that work, right? Now you're going to check in some other memory location. And what do you get? You don't know how much work the other thread has done. And therefore, you don't know whether him aborting and rolling back is way more expensive than you aborting yourself. You know nothing at this point, right? So the best thing to do is just kill yourself, and then maybe you also could wait a little bit in the beginning and then give up right away, depending on what you know you need to do, right? And so this is the simplest thing to do, and it turns out to be the best thing to do in most scenarios, in most all scenarios, because again, you don't know anything about the thread, you can't communicate with the other thread, because that's expensive, and you're just better off just aborting and starting over again. Yes. Yes, so the statement is you would have to wait also to or kill yourself too, like when you're doing the traversal of the tree going top down, like if I try to acquire a latch on somebody else, on the next node, but that's already being held, what do I do? It's the same scenario here. The exact same scenario. But in that case, you're not... You want deadlock though, right? The problem is like, again, if T2 wants to get the latch on B and T2 wants to get the latch on C, that's a deadlock, you don't know that it's a deadlock or just contention on acquiring some latch, right? So the best thing to do is just immediately give up. So that means you could have a scenario where like, you hold the latch and I hold the latch and one of us should only give up, but we end up both giving up and killing ourselves, right? But again, the cost of maintaining metadata about who's waiting for what and in what way, that's more expensive to do in the regular case where you assume there isn't contention. Yeah, so it was this one here, right? So they're both doing reads, but assume they're both doing writes, right? I need to update all keys greater than one or something. He needs to update keys less than four, right? I'm going this way. He's going that way. That's a deadlock. I'm thinking in terms of latches. I'm trying to acquire latch on this direction, trying to crash on that direction, and we're deadlocked. The question is how do you prevent both of them killing themselves? So like, you can't, because I don't know you exist. I don't know what you're doing, right? The, you know, if you just think of computers in general, it's very unlikely that you and I are going to be exact lockstep in our threads and the exact same number of cycles we're both going to try to acquire a latch together that would deadlock and go kill each other. It's a race condition, but it's rare, but you can't prevent it. Because, again, the cost of preventing it is so expensive. The statement is, from a philosophical standpoint, it would be more efficient to kill the other one. Would that be better? It's not really a philosophical question. It's just like straight up, is it better? Like, you take their wallet or whatever. You get, this is a toy example where it's only one node. Like, if you think about, I don't know what the, I don't know what work you've done. You don't know what work I've done. If we can sort of keep track of that, then we may be able to decide, okay, well, you hold five latches, I hold one, it's better to kill me because you had to wait a bunch of time to get those five latches. So that's sort of a high level what we'll do when we do transactions. They'll figure out who gets priority or others based on how much work they've done so far. If it's going to be so, like, fine-grained and short, it's better just to kill yourself, right? T2 isn't waiting on C1. Okay. Then T1 kills itself, tries again, still bad, kills itself again, does it a million times, right? Yes. Who's to say that isn't possible? It is possible. So the question is, could you basically have, could you starve a thread? It's the term you want to use. Could you starve a thread because every single time you try to get something, it can't because somebody else is in there. Could you starve that? Yes. So... I think I have a slide on this one. Yeah, so there's... I'll answer your question in a second. The latches aren't going to have anything to handle deadlocks for us, and it's not going to have anything that can prevent starvation. In the read-write latches, you can set priorities like writers or readers or you want to do FIFO or round-robin scheduling, but there's a higher-level construct, a scheduler that is inside, oh, this worker is trying to run this query, he's trying to touch this data structure, and it keeps getting aborted, and I know it's getting aborted because it's coming back with a retry message, and therefore maybe I want to schedule other workers running at the same time to make sure I always get through. That's had a way to basically handle that. Most systems basically let, you know, let Jesus take the wheel or whatever phrase you want to use, and just let it go at it, right? Because eventually the, you know, you should get through. Now again, if I have like a thousand query trying to run at the same time, trying to all update the same key, there's no magic scheduler that's going to be able to handle that. Everything's going to get contended and end up being a single-thread system. So we want to optimize for the case where we assume that the, we assume contention is going to be low and we want to sort of fast, fail fast no-weight policy, we just check, can I do it? No, okay, let me retry again. Because by the time, you know, I'm going to go retry, then I'll be able to do what I need to do. His question is, while you're killing something, do you have to roll back any changes that you do? Yes, in the code, yes. So again, going back to the writer example here, if this thread that kills the cell, if they had updated a bunch of things, that's why you hold the right latches for those things you've updated so you can go back and reverse those changes. It's possible to get deadlocked. No, why would you get deadlocked? If you already hold the latch, why would you get deadlocked? Yeah. So his question is, in what scenario do you need the backwards pointer in the sibling because this seems like it's causing problems for us? Right? Yeah. Your query is this, right? Find keys less than four, find keys greater than one. Same as you could start at one and go until you find four. Yeah, then just before everything. So in what scenario? But that, I have a billion keys, right? Like, and I, yeah, that's just way more sensitive. Nobody does that. Yeah. Essentially, make all the way to one die. Yeah, so we didn't talk about skip lists. There was a, actually, single store before, before the single store was MemSQL, they had these skip lists and skip lists only have, because it's a lock-free data structure, which is a bad idea. It's another topic. But like, they, they had their skip list can only go in one direction. So they had to do a bunch of tricks of like having ways to jump into the data structure to like try to do reverse and then sort it in reverse after, you know, after you get it out. It just makes life harder. You can do it and avoids this deadlock issue. But it's still an example. If you're traversing down, if I can't acquire the latch as I'm going down, it's not a deadlock. I still want to kill myself, potentially. Usually. It's still, it's still, yeah, so you still, so you won't have deadlocks if you do what you're proposing, but you still could have latch contention where I can't get the latch because somebody else holds it. And in that case, again, usually you want to spin for a little bit and then kill yourself. Yeah, you still want to kill yourself. That sounds weird, but you know what I mean. Yes. Would there be like a heuristic where you want to wait for a certain amount of time and then you're like, you still want to get the latch after that amount of time and kill yourself? Yeah, so he's right. I don't know if he would have said. So he basically said, if we're a threat, we know how much work we've done. And say we did a lot of updates. We know that they were expensive to do. So could we have a heuristic that says when we spin, we can determine how long we want to wait based on how much work we've done. Yes, you could do that. I don't think Post-Cast MySQL will actually do that. It might be wrong. Would it be better to show or is it just the time and just going back and doing that? Would it be better to show? I mean, you can imagine a really simple heuristic. I have a counter in my local address and local memory for my worker. How many pages have I updated? And for each page I've updated, wait maybe an extra 100 microseconds. Something like that. Simple heuristics. I don't know whether it actually makes sense or not to do it. Again, it's a cop-out for all cases and databases. It depends on the workload. It depends on, like, you know, if everybody's updating a bunch of stuff, then maybe that's a bad idea. But if you have one thread, one worker that updates a little bit of things, then maybe, yeah, that might make sense. But once again, if everybody tries to update the same key, everything gets bogged down to a single-threaded system. That's the extreme case, though. Yes. How do you handle that? They're dead. I mean, they restart. Yeah. Yeah, so very clear. I may have said this on the slide. The restart mechanism is transparent to the user. Yeah, I don't have slides like this. So, like, I run a query, and I have to traverse a B-plus tree, and to go look at the primary key, and I can't get a latch as I'm going down, I don't want to abort the query and go back to the user, and, hey, look, I couldn't get a latch, because they don't know what a latch is, right? And then tell them to restart. We do this transparently for you. So, like, you submit one query, it may restart the traversal of the B-plus tree multiple times, but you don't see that from the end user of the application. We're doing it internally. It's just the query got a little bit slower because of that. Yeah. A lot of questions. Sorry. Yes. Yeah, absolutely. So the question is, is there a scenario where someone has a right latch on the root and if you restart, you're going to come back and really abort it? Absolutely, yes. It's unavoidable. I mean, the more keys you insert, the tree gets taller, and therefore the likelihood that someone is going to hold a right latch on the root goes down, right? Going back to the stall stuff, too, it's not just how much work is the other thread doing like you have to sort of wait for. Remember, these data structures are backed by pages in the buffer pool that are on disk. So even though I'm updating one key, the key I need to update might be not in memory and I got to go out to disk and get it. So that's why, like, and you don't want to stall, you don't want to spin forever for a long time because you don't know, like, you know, it has to go get to disk. It's just a really slow disk and that's going to be a long time. And you can be waiting for 100 milliseconds, you know, 500 milliseconds. So you said, like, oh, yeah, do the average time. I mean, it depends on so many factors that it'd be impossible to attract these things. Again, this is why this is different than, like, taking a regular data structure algorithm class because these things are backed by disk and we're having multiple threads run at the same time and there's a bunch of things we need to do to hide those distals. SQL Server is a whole other beast. SQL Server, they actually have their own user space co-routines. So, like, if you're traversing the data structure and the thing I need, I can't get the latch, instead of just spinning, they go back to their own user space schedule and it says, I can't run because I'm waiting for this latch and then they take your thread away and have it do some other work and then they know what latch you're waiting for. They're actually doing some tracking about who's waiting for what latch is inside of it. They can do that because everything is co-routines in user space. Very few, nobody else does that. SQL Server does some really cool things. All right, cool. Any other questions? Yes? Yeah, I did. Okay. Yeah. Yes. So you acquired the function. Yeah, so this question is, let me go back here. This question is, if I'm traversing along with sibling nodes, this one here, right, if they're trying to get across, why does, so T2 is at B, T1 is at C, why does T2 need to hold the latch on B in order to get to C? Because you need to know that the sibling pointer is still valid and this is the right node. This is the right node you should be looking to, right? And you know that if there was an update, because you hold this in read mode, nobody can update it. So you know that nobody's going to replace B with something, with some other new version of it. They now point to something else but you're still going to follow the pointer to whatever you thought was there before, right? So you have to hold the latches until you know you're safe on the other side. And you can go ahead and release it. Same thing, it's going from top down. You need to know that the thing I'm jumping to next is what I should be jumping into. Yes? Yes? So how do you distinguish the case where both of them are going in the opposite direction? Or there are two cases, how do you distinguish them? Yeah, so the same thing is, a question that last time I talked about how systems like Postgres have sibling pointers and internodes, even though I'm only showing leaf nodes here. And if I use those internodes sibling pointers to jump, again, horizontally, how do I take latches on those and make sure that things are still correct? So the protocol, everything I'm describing here would still work. If for read it's simple, could you just take the read latch across, because anybody else coming above you once you were right, they'll see your read latch and they'll stop. Anything below that side of the tree that was doing an update, you'll get blocked. You take the read and take the read down. So that's fine. For doing updates, I think it works the same way as you come across, if what you're trying to do below is not safe, then you still hold latches for those things. So the protocol still works, even if you have to go across horizontally and go down. You still have the deadlock, if everyone's trying to go across vertically, or it's horizontally on you too, and then you do the same thing I'm describing here. Okay, cool. Just to finish up, this is hard. I'm showing you the most simplest version to do latch crabbing and death detection. We're not going to cover this in this class, but there's way more complicated schemes. You can have version latches. You can have delayed updates. You can do the BFs laundry stuff or you delay things. There's a bunch of other stuff that you can do this. The BW tree is a lock-free B-plus tree from Microsoft. That's a whole other nightmare. But again, this is hard, but this is good because you take this class and this is why you don't want your random JavaScript programmer building your B-plus trees or data structures in your database systems. You won't see new students like you guys that know what the hell they're doing and make sure that you don't cause problems. Again, we talked about hash tables, we talked about B-plus trees today, but these techniques of this idea of everything's going the same direction or I kill myself as soon as I can't get something and restart. This is relevant to a bunch of other data structures in systems as well. I feel like we should just call this course Kill Yourself, which is not... That's asking for CMU to get involved. I don't need that trouble. One year, somebody did complain that I did say, kill yourself a lot. Sorry. Next class, we're talking about sorting or aggregations. At this point, we're moving up the stack. Now we can actually start executing queries. Fantastic, right? I won't be here on Monday, or I'm not teaching Jignesh Patel. He'll be the other professor. He's going to start teaching on Monday. Then Wednesday next week, he and I are both going to be gone. I'm going to the Post-Cost Conference in New York. I'm giving a keynote there about databases. I don't know what Jignesh is. He might have to go talk to his pro-officer. My number one PG student, Matt Buchovich, will be teaching on Wednesday next week about joints. Jignesh is awesome. Jignesh asked him about growing up in India. Before he joined CMU, he was telling me crazy stories. He used to get in fights every morning on the bus going to school. I think he carried a knife. Asked him about that. We'll talk about the midterm next week as well. Alright, hit it.