 All right, so today's class, now we're talking about actual implementations of Concert Azure Protocols. So real quickly, the one announcement I have is that we have another guest speaker coming tomorrow for our seminar series on hardware accelerator databases. So this would be bright light out of London, and so there'll be a GPU accelerator databases. So this will be the second and the last one we'll have on GPUs, and then the very last one will be later November, will be all about actually doing FPGAs and custom ASICs. So last class, we spent most of our time talking about the isolation guarantees of transactions as described by the asset moniker, and the two key things I want to reiterate again was when we had this discussion about what it means for a schedule to be serializable, and I said that there was this distinction between what was conflict serializable and view serializable. For that discussion last class, was not about how to actually generate schedules that are either conflict serializable or view serializable. It's really about if I give you a schedule and it's static and the operations are fixed for the transactions, can you tell me whether it's serializable or not? So again, the distinction between conflict serializable and view serializable had to do with, in the case of conflict serializable, that we can verify that it was conflict serializable by swapping the order of operations, until we can convert the schedule into be serial schedules, or regenerate the dependency graph and check to see whether there was any cycles. I said again, the textbook definition of conflict serializable had to do with this, but in the real world, databases will say they're serializable and they really mean conflict serializable. Then view serializable had to do with, if it was the schedule would still generate an outcome that was still considered correct by the application. So I show the example of adding up the number of counts that are greater than or equal to zero. I don't care whether I got the exact amount for each account, I knew it was greater than zero, and that's all I cared about. Or I did that blind right where I lost all the rights previous transactions, I don't care whether they were actually not serializable or not, but in the end I sold the right for that last transaction. Again, no databases actually support this because it requires you to understand the semantics of the application. So again, the things that we're carrying out here, again, we have these fixed schedules, and we see these conflicting operations, and we know that we can't swap the order of them, and therefore this is not conflict serializable. So that was the gist of what we were talking about last class. So now where we're at today is we need a way to guarantee on the fly that schedules we're going to have for transactions are going to be serializable without knowing what the entire schedule is ahead of time. So again, all my examples, I gave you the schedules, I gave you the read and write operations, the transactions, and I would say, is that conflict serializable or not? But now in the real world, the real world doesn't work that way. Your application connects to the database, starts issuing whatever queries it wants, we don't know what you're going to be doing in the future. So we can't use the same techniques that we talked about last class. Last class was all about verifying whether an existing schedule was serializable. So now we need a way to figure out how to actually how to do this on the fly, and this is what real systems are going to do. So the one solution we're going to talk about today is to use locks to protect database objects. I said remember that there was a distinction in concurrential protocols between pessimistic and optimistic. So using locks as a pessimistic protocol, because we're going to assume that there's going to be a conflict and therefore we're going to require you to acquire a lock before you're allowed to do anything, because we think you're going to have problems later on. So we go back into this example we just showed, we can now introduce locks into our schedules. So here at the very beginning we have, the transact T1 is going to get a lock on A, and this lock request will be sort of has to occur before we're able to do any operation we want on the object. So we say we want a lock on A before we can read A or write A. And the way we're going to keep track of who holds what locks is through a lock manager. So lock manager is going to be the centralized coordinator that is essentially like the traffic cop for the entire system. So anytime I want to get a lock on an object, I got to go to the lock manager and ask, hey, give me that lock. In this case here at the very beginning, assume no other transactions are running at the same time. We ask for the lock on A, we go to the lock manager, it says nobody has that lock, so you're allowed, it's granted to it, it's allowed to run. Then now let's say again, we're assuming we have a single thread, we can only really execute one operation at a time, so we're transacting T2 starts, and it wants to get the lock on A, but when it goes to the lock manager, the lock manager knows that, oh, I already gave out the lock on A to T1, therefore I'm going to deny your request. And the transaction T2 essentially is going to have to stall and wait for that lock to be given back to it. Now how, whether it waits inside the lock manager, because it stalls the thread, or it sends back a rejection, and then T2 goes to some kind of scheduler and says, hey, I'm waiting for this lock, but I can't do anything now, schedule my thread for something else. That's left up to the implementation, we don't care about that at this point, we just want to understand the protocol to do this. Then T1 goes along, then it finally unlocks A, and then the lock is released, and now the lock manager is keeping track of, oh, well T2 would need to lock A, I couldn't give it to it, so now that lock A is now available, it goes back and gives it to T2, and then T2 can run and commit, and when it's done, it releases the lock. So this is the high level what we're going to do today. We're going to use locks to protect objects, to prevent transactions from reading and writing to them, before you have to acquire the lock for us, before they're allowed to do anything. So today we're going to start off talking about the different lock types. We only need to have two in the beginning, we'll have three more at the end, then we'll have, we'll talk about the two base locking protocol, and then we'll get into the details of how can we extend the protocol to deal with deadlocks, how to deal with hierarchical locking, multi-grainure locking, can you take locks and tuples or attributes, or columns or whatever, and then we'll finish off if we have time talking about locking in practice and how people actually use the currency protocol and use different isolation levels. So back when we talked about index and currency troll, I showed this table before from this book from GritsGraphy, and he made this distinction between locks and latches, and we focused on latches or in our current show for indexes because we were trying to protect the physical soundness of the internal data structure itself. And so now what we're dealing with in this talk is all about lock. So these are trying to protect the logical database contents, tables and rows and single attributes, single values. And so we'll go through all these different parts of this as we go along and through the lecture. But again, the key thing about this and the sort of distinguish this between the latches is that it really has to do with this deadlock piece. In the latching protocol, we said we need to be very careful about how we acquire latches because there wasn't gonna be somebody in the background that can come on and save us if we ever have a deadlock. And the way we did that was making sure we acquire latches always in one direction. In locks, with database locks, we can't actually do that because we don't know what the queries are gonna be doing or transactions are gonna be doing ahead of time. So we can't sort of enforce any kind of order in how they acquire them. So we can end up with deadlocks as we need a way to deal with that. And that's essentially what you guys are re-implementing for project three. So a very basic two-base locking protocol or two-base locking protocol is that there's gonna be two types of lock, shared lock and exclusive lock. And you sort of think of these as like the read latch and the write latch that we talked about before. So shared lock is just as it sounds. It's a lock that can be shared for reads among other transactions. So if I have a shared lock on an object and another transaction wants to read that object, they can also get a shared lock. Those things are compatible. And exclusive lock is what you use for writes. And it basically says the object that's under this exclusive lock, whoever owns that lock has exclusive access to it. Nobody else can even read to it or write it to it at the same time. And again, we can have this compatibility matrix to say like if one thread holds a shared lock, another thread can get a shared lock, but they can't get any other lock. If at least one of them is exclusive. So what's gonna happen is transactions are gonna, before they do any read or write operation on an object, we're gonna go to the lock manager and say, hey, give me the lock for this object. Or if they already hold like a shared lock and they wanna maybe write to it, they can do an upgrade and say, I hold the shared lock and now give me the exclusive lock. The lock manager's job is to figure out whether you're allowed to get that lock or not. If you're allowed to, then you get it and keep running. If you're not, because somebody else holds a lock that's incompatible, then you get blocked and the request is denied. And then it's up for the transaction's job to then release the locks when they're done with them. All right, we give it back to the lock manager so the lock manager can say, all right, well, here's the transactions that are waiting for this, they can now run too. So the way this is actually gonna be implemented and this is what you'll be doing again for project three is essentially gonna maintain an internal lock table that keeps track of what locks are being held by what transactions and what transactions are waiting for other locks. And then you can have a queue for the waiting locks so that when a lock is released, you just go pick out whatever's the beginning of the queue and say, all right, you're now the transaction that holds the lock. Does the lock manager need to be persistent or durable after a crash? Anybody take a guess? You're taking a head yes, why? I saw you just now, yes. You did this, that's the yes. So why? She says maybe not, yes, the answer is not, right? Because think about this, if I crash, it's just like the buffer pool manager you guys built for project one, right? If I crash, memory gets wiped away, I come back and have to figure out what was going on, you're not gonna have transactions still running, right? Because again, we said that we know parts of transactions, so a transaction did a bunch of writes and then it was gonna do a bunch of more writes too but then it crashed, you come back, that transaction needs to get rolled back. So you don't need to maintain the lock table on disk because you're never gonna come back and have any transaction resurrected. Because again, think about it in a practical sense from an implementation standpoint. The client that was holding the connection to the process, that connection is now severed, right? And it doesn't know how many, excuse me, it doesn't know what queries actually got persisted, everything just has to get rolled back. So the lock table does not need to be persistent and that's why you don't have to put it into the buffer pool. All right, so let's go back to our simple query before, our simple schedule before and now we're gonna show that we're instead of just doing lock and unlock, we're gonna actually acquire different lock types. So T1 wants to do a read on A, write on A, so the very first thing it's gonna do is get an exclusive lock on A, nobody else is running so it's allowed to do this. Then it goes ahead and unlocks it and then transaction T2 can get the exclusive lock on A, does a write, releases it, now transaction T1 wants to do a shared lock on A because it just wants to read it, it's allowed to get that and allowed to do the read and complete it. What's the problem with this? It's correct, right? What did I say before? I said that with these locks, you had to acquire the lock, the right lock type or right lock mode, sorry, the correct lock mode, be very careful. Correct lock mode on whatever object you want to access before you're allowed to access it. So we did that here, right? But what happened? Yes? He said dependency graph is more cyclic. He said dependency graph is more cyclic, just even more straightforward, we have an unrepeatable read here, right? Because this transaction did a write on A, this transaction did a read on A, but now it's gonna get the value back is gonna be this one here and not the one that it wrote here or even the one that it read here. This is an unrepeatable read anomaly, we talked about last class. So this just goes to show you that just because you have locks doesn't mean you're magically gonna get correct schedules. So we need to be a little bit smarter, right? And this is what two-faced locking does for us. Two-faced locking is gonna allow us to use locks in a certain way. And it's going to guarantee that we're gonna end up with schedules that are conflict serializable or just serializable, right? And it's essentially gonna be used in the lock manner to determine whether you're allowed to acquire a lock or not. So this is the oldest concurrency to a protocol. It goes back to 1975, 1976. This was invented by the IBM guys by Jim Gray who won the Turing Award in 1995, 93. They built this concurrency protocol for System R. Because again, back then, it was like all brand new. There was no, there's no, you know, there's no textbook say here's how to do transactions in currency trope. So this is what they came up with and this is the first provably correct and correct meaning it would generate schedules that are serializable can turn to currency protocol for a shared database environment. So again, the key thing about this protocol versus the dependency graph or the swapping method is we don't need to know what all the queries you need, you're gonna execute ahead of time. We can do this on the fly. So two-faced locking is as it sounds, there's two phases. So the first phase is the growing phase and this is where transactions are gonna request all the locks you need as you go along from the lock manager and the lock manager can still decide whether you're allowed to have them or not based on whether somebody else already holds that lock. But then soon as you release one of those locks, soon as you unlock it and give it back to the lock manager, you're automatically put into the shrinking phase. And in the shrinking phase, you're only allowed to release locks. You're never allowed to acquire new ones. If you try to go acquire a new one which you technically can't really do in SQL but if you try to go acquire a new one, then that's a violation of the two-phase locking protocol and the data system will abort you. Another way to think about it is visually like this. So say this is the lifetime of a transaction and then the y-axis is the number of locks that the transaction is holding. So when I'm in the growing phase, I can keep acquiring more and more and more locks. Soon as I release one, then I'm now down in the shrinking phase and I can hold the locks for as long as I want. I can release any lock that I want but I'm not allowed to go acquire new ones. And so if you did actually something like this, if I released a bunch of locks, then try to acquire more. Again, this would be a violation of the two-phase locking protocol and the data system would actually abort you. As long as you have these two phases and you follow this protocol, you'll be guaranteed to generate schedules that are serializable. So let's go back to our example and now introduce, again, now using two-phase locking. So we'll get exclusive lock on A and T1. That's fine, we get that. Then we do a read on A, then a write on A and then now I try to get my exclusive lock on A and T2 but this is denied because T1 holds it. Then I do my read on A here and then I unlock it and at which point T2 gets kicked and says here, you know how to lock now and then it can run. And this is a serializable schedule thanks to two-phase locking. Pretty straightforward. Yes? So his question is, does the shrinking phase start when it unlocks A? Yes, right here. But this is sort of an illustration of what's actually going on underneath the covers. In your application, you don't actually, you usually don't explicitly say lock something, unlock something. So you can't write in SQL, unlock this, unlock that. This is just showing you how the protocol works. The data system could decide, no system actually does this, but you could speculate that, all right, well, I don't think you're needed to do anything after this. So I'll just go ahead and unlock something. So basically where the first unlock starts, that's where. That's immediately in the shrinking phase. All right, so again, the two-phase locking by itself will guarantee that you generate schedules that are serializable, right? Again, and based on what we talked about last class, you'll have a precedence graph that is acyclic, like there's no cycles between nodes. But it is susceptible to another problem called cascading aborts. So if I go back here, G1's gonna do a read on A, write on A, then a read on B, and write on B. Transaction T2 is gonna do a read on A and write on A. So at the very beginning, and also too, sorry, at T1 at the bottom, you see here we're gonna do an abort. So what'll happen is if I run this, and this is a correct schedule with two-phase locking, because I had acquired the excuse of lock on A and B, then I read A, wrote A, unlocked it, and then I read B and wrote B, but then I aborted. But in between this point here, T2 was running, and it read A and wrote A, and so it read the version of A that was written by this transaction here. So we can't have this, right? This would violate our dirty read. This would produce a dirty read anomaly, right? Which would be bad, because we're essentially leaking information about this write before this transaction actually committed. So at this point here, if transaction T2 tells the data system I want to commit, we can't allow it because it read something from a transaction that has not actually committed. So under two-phase locking, this schedule is permissible. This is still a correct schedule, but again, we'll have this cascading robot problem because this guy aborted, and then whatever this guy did, it'll have to abort too. So the downside of this is that this is essentially all wasted work. We did a bunch of, say this thing, instead of just reading running A, it updated a billion tuples. We had to sit with all those one billion tuple updates until this thing actually find out what happens to this. And if it ends up aborting, then all that one billion tuple updates gets thrown away. All right? So again, two-phase locking will guarantee that you have serializable schedules, but it's going to potentially produce schedules that could have dirty reads and require cascading rollbacks. And then the other problem we haven't got to yet is also you could have deadlocks. So we're going to talk about how to solve these two problems here. So we'll focus first on how to handle dirty reads, and then we'll focus on how to handle deadlocks. Okay? So the other thing I'll also say to you is there's going to be some cases where two-phase locking would, that you could have a schedule that would actually be serializable, but because two-phase locking is a bit more, I don't use the word strict, a bit more cautious or pessimistic than maybe necessarily you would need to be, there are schedules that are conflict serializable that are not permissible under two-phase locking. But those are like corner cases and there's no easy way or efficient way to handle them. All right, so the way to go ahead and handle the cascading abort problem is to use a variant of two-phase locking called strict two-phase locking. And it's sort of a misnomer because under strict two-phase locking there really is kind of only one phase. And that's the growing phase where you just keep acquiring locks. And then you actually don't unlock anything until the very end. So I can keep acquiring locks all I want and you never actually release them. And it's only when you go to commit here at the very end that at that commit moment, then you release all your locks. All right? That's what I'm saying, it's not really a shrinking phase because you don't really shrink, you just shrink at the very end. So this is gonna only generate schedules that are conflict serializable, but in a lot of cases it's gonna be more cautious and more pessimistic than you maybe actually otherwise need. So whether you want to use strict two-phase locking or regular two-phase locking, it really depends on how susceptible your application is or how susceptible your transactions are to cascading boards. So the word strict actually means something more formally in the context of concurrently control. So the word strict means that any value that is written by a transaction is not read or overwritten by any other transaction until that first transaction that modified it completes. Again, that's why we can't have that cascading abort, because the first transaction would have modified it, never released the lock, and then nobody else can read or write that lock, read or write that change. So again, the benefit of this is that we're not gonna have any cascading abort, so we're not gonna worry about waste of work, and it actually makes it really also easy to roll back transactions, which I haven't really talked about yet, because we're not gonna have to maintain different versions of an object because we may have to roll back multiple times. So in the case of strict two phase locking, I modify the object, nobody else can read it, nobody else can write it. I know how to roll back to the old version in case I need to abort my transaction that modified it, but it's not like I need to maintain a chain of versions if we have multiple transactions modifying it and then they can't commit yet. So then when I abort, I need to figure out how to roll back to the very first one. So let's look at an example here. This is my favorite example of moving $100 from my account to my bookings account. We'll do that in transaction T1, and then transaction T2, we're just gonna compute the total amount of all the accounts in the bank. And again, this echo thing is not really a command that we have in our database system. I just wanna show that this is somehow we're emitting this summation of A and B out to some world, it's the final result of this. I could have made this a write back into the database, but I didn't wanna complicate things. All right, so here's how to actually these transactions with locking, with the scuse of locks and shared locks, but without two-phase locking, right? And so say again, we start with our initial value of $1,000 per bank account. So under this approach here, T1 would get the scuse of lock on A, T2 would try to get the share lock on A, and then we'd have to pause and wait for that, but then T1 releases the lock on A, T1 is granted it, does the read unlocks it and then tries to get the share lock on B, but then T1 tries to get that and it has to wait, and then we can kind of compute our summation here, right? And so the problem in this particular example is what we saw before, where we have this transaction reading the intermediate state of the database while this transaction is running, and at the point where we've taken $100 out of my account, but before we put the $100 back in. So when we compute the summation here, we're gonna have, actually that should be 2,000, 1,900. We're missing $100 because we're reading this state here before we actually put the money in, right? So this is what you get without two-phase locking. With two-phase locking, the non-strength version, that T1 would start to get the transaction, get the scuse of lock on A, this tries to get the share lock and can't, so it has to wait, then we do our modification here, but before we unlock A, we get the scuse of lock on B because we know we're gonna modify it later on, and then we can unlock A, and now we're in the shrinking phase, so this guy can then do the read, but then it can't get the share lock on B because that's being held, so it ends up stalling again, but then when we get to the bottom, we make sure that now we're reading the state of A and B together, so we get the correct value. And then basically, restrictive A's locking is I get the scuse of lock on A, this guy gets the share lock, he has to wait, and then I get the scuse of lock on B, right, because I never get back this until the very end, and then when I go to commit, then I unlock everything, and then now this guy can run, and then I get the right value. So in this case here, you can kind of see strict two-phase locking is essentially creating a serial ordering of the transactions. Right, because it allows, it makes sure that nobody can read anything that was modified, but no transaction until that transaction completes because I'm holding the locks to the very end. Yes? So actually, the proof is locking controls how the lock manager applies the release lock, right? So what are you, what is the schedule is not within that new model of two-phase locking, what are you trying to, try to lock before you release all of them? So your statement is, your question is, in this example here, I showed that I'm acquiring the locks in the right order as required by two-phase locking. Your question is, what if I don't have been in the right order? Again, two-phase locking is a protocol you're gonna run, you're gonna use at runtime. So I'm showing you static examples, here because it's on PowerPoint. But in the real world, you don't know what everything the transaction is gonna do ahead of time. So you're sort of seeing these one requests at a time. Right? And so if you see the request one at a time, then in the data systems using two-phase locking, that it will enforce the correct ordering as required by two-phase locking. Does that make sense? Again, last class is all about static schedules and how do you actually verify whether serialized or not. Two-phase locking is a protocol you can use to generate serializable schedules. Question, yes? Yeah, so what do we say? In two-phase locking, we have to remove the lock. How do we remove that? The question is, under strict two-phase locking, am I releasing the lock before after I commit? It's sort of all at once. Right? Again, I'm on PowerPoint, I can't really show how that works. But this commit kind of includes all of these. Like I'm done, I'm not gonna execute any more queries, I'm not gonna acquire any more locks. Here's my commit, commit my transaction and release my locks. That's considered atomic internally. And we use latches to protect the data structures to make this happen. And you'll see this when you implement this in project three. So is there any mechanism to ensure that there's a lock and that it's all atomic? So the question is, are there any mechanisms to ensure that the unblock and the commit are atomic? Yes, latching. Again, how does this actually implement it, right? You're gonna implement this in, like, you have these internal data structures, right, you can have a state table for what transactions are active, you could, you actually don't need to, it's actually a need to be entirely atomic. All right, because you think about it, if I say I maintain an internal table that says my transaction has committed or not. If I've committed or not, if I say I flip that flag and say I've committed, but I haven't released my locks yet, it doesn't matter because nobody, like the lock manager can prevent anybody from releasing those locks. So it doesn't matter that I've committed, but haven't released the locks yet, right? Would it matter, though, if you release the locks first and then you commit? You could do that, but you would still need to potentially, it depends on the isolation level. You may maintain some global state for transactions that say, all right, this tuple was red, or tuple was modified as object. I got the lock for it, but I'm running on a strict two phase locking, I shouldn't have been able to do that because if it was committed, it would still hold the lock. So if I got the lock for it, then I can assume maybe it's committed and therefore it's okay for me to go. This is a low level implementation detail that if you release the locks, if you commit everything's fine, I think you can still get by releasing the locks first and then commit. Depends on how it's implemented, depends on whether it's a global state to say which transactions are actually or not. Postgres has that, MySQL has this, some MMA databases don't, it depends on implementation. For our purposes on PowerPoint, we're fine. Yes. Under a strict two phase locking, how does the database mechanism decide when to release a lock or a tool? Your question is, for the non strict two phase locking example, how does the data center decide whether to release the lock or not? So this is what I was saying, you don't really write transactional programs through SQL with explicit lock and unlock. Right, but I'm saying like I can't write, there's no unlock command in SQL to tell the database system, go ahead and unlock. We're sort of just dealing with this at the high level, like the sort of high level abstract way of describing the protocol. So whether you actually can write unlock yourself, most systems probably don't let you do that. So your question I think is asking like, all right, if I can't actually call unlock explicitly, how does the database system know to unlock? It doesn't. Most systems run strict two phase locking. I think we can look something, I don't think you can unlock tables, but I think you can, sorry, I don't think you can lock and unlock explicit single tuples, but I know you can lock and unlock tables. We can try that in Postgres in a second to see how they actually implement that, what happens. Yes. So for two phase locking, like you mentioned that it's growing and shrinking. Yes. So like growing is it like all transactions or one transaction here? It's one, like so my example here of the lifetime of transaction. This thing here, this was for a single transaction. This is the lifetime of the transaction, it's in its growing phase. All the transactions can be in the shrinking phase or whatever, we don't care. It's just for our transaction, how are we acquiring locks? Okay, see where we're left off. So if again you think about the universe of schedules we talked about last time. So all schedules would be this giant blob, it may have serious schedules in the middle, conflicts or lies would be here, and then a subset of this, I forgot to draw this, a subset inside of this would be two phase locking, and then a subset inside of that would be strict two phase locking, and then there'll be a larger blob of schedules that are gonna be non-cascading. So again, I should've drawn this out. I had it in the old one, I took it off and fixed it. So inside of conflicts or lies will, there'll be a box that says 2PL, and inside of that will be a box that says strict 2PL, and then inside of that you'll have serial, right? So again, it's more restrictive, the scope is more narrow than all possible conflicts or liable schedules, but there's no easy protocol that you can implement that'd be efficient that they were to capture all of these, the all possible conflicts or liable schedules. All right, so the other problem we gotta deal with now is deadlocks, right? So strict 2 phase locking handles the cascading aborts, but now we gotta handle deadlocks in both strict 2 phase locking and non-strict 2 phase locking. So there's two ways to do this, right? Again, the deadlocks are sort of obvious, right? If I have a transaction in T1, and once we get a lock on A, then we get it a shared lock on B, then we try to get the shared lock on A, but that's being held by transaction T1, and this guy gets the exclusive lock on B and that's being held by T2, right? We have a deadlock, right? This is sort of standard CS problem, right? So we have to deal with this, deal with this. So, abstractly the way to think about this is that the, we saw this in the dependency graph or the precedence graph, it really, it's a cycle in this graph of transactions waiting for other transactions. And essentially what we need to do is we need to have a protocol in place that allows us to either detect when we have a cycle and break that cycle by killing a transaction, or we can have a protocol in place that would prevent that deadlock from actually occurring ever. And that's sort of what we did in latching, right? We made sure we always acquired things in the right order. So we'll cover the first one first, detection. So what's gonna happen is that we're gonna maintain inside the data system something called a waits for graph that looks a lot like the dependency graph, but in dependency graph we were waiting for sort of, we had dependency between objects, held up between, objects being operated on by different transactions. This is dependencies between transactions waiting to acquire locks being held by other transactions. So in our waits for graph, we're gonna have a node for every transaction and they'll have an edge between two transactions or two nodes if one transaction is waiting for a lock being held by another transaction. So when you actually generate this waits for graph it can depend. The most common ways what you guys would do in project three is actually you build this on the fly every single time your deadlock detection or detection thread wakes up. So the idea is that you have this thread in the background that gets kicked up every so often. I think Postgres is 10 seconds, or Postgres might be one second, MySQL might be 10 seconds, right? You can do something you can configure. And the thread will wake up, it'll look at what transactions it has, it builds this waits for graph, identifies the cycles, and then it has to make a decision about how to break that. So if you go back to this really simple example here, here we don't even have any reason rights, we're just trying to acquire locks, right? So our waits for graph would be like this. So we have a share lock on B, we wanna get in T1 but that's being held by T2, so we have an edge from T1 and T2. We have an exclusive lock request on C but that's being held by T3. Doesn't matter that one's exclusive lock, one's share lock, as long as the locks are incompatible then you have an edge. So we have an edge here from T2 to T3 and then we have an exclusive lock request on A but that's to have a share lock already held by T1, so we have an edge from T3 to T1 and again, of course, now we have a cycle, right? So the other interesting thing about this when you guys build this in project three is we don't actually need to be, for this thing to be super accurate, right? Meaning we don't need to lock the entire, we don't need to latch the entire lock table to figure out who's holding what locks because who cares if we miss one in our first pass because we'll come back around and see it again. And this is another reason why we don't require you, we don't want you guys to build the waste or graph on the fly as transactions require locks and release them. You have this background thread do that because that's not gonna interfere with the critical path of the execution regular transactions. All right, so we have this thread that's gonna wake up every so often, it's gonna build this waste or graph and then check for the cycles, which means a deadlock and then now we need to make a decision about how we're actually gonna handle this, right? And so we're gonna choose a victim in our graph and do something to it, right? We're gonna abort it and then we can either restart it or just kill it all, you know, and kill it straight out. And we can, and then this is gonna, when we kill that transaction, we then release all the locks that it was holding, at least depending on how far we roll it back, release its locks, that'll break our cycle and our waste or graph and then now the whatever transaction I was waiting for that one lock can then start running, right? So there's this trade-off in your implementation about how aggressive you actually wanna be checking for deadlocks versus the overhead of what it takes to check, right? Or the overhead of checking has to be traded off with how long transaction have to wait for to be released from a deadlock. So I could check for a deadlock every microsecond but then I'm just burning cycles, building weights for a graph or looking at it and then trying to figure out whatever cycle and maybe there isn't gonna be one very often, right? Or I could check it every 10 seconds or every 10 minutes, then now if there's a deadlock, transactions are sitting there forever waiting to be released by the deadlock detection thread. So this is why this is a tunable parameter in every single database system that does deadlock detection with 2VS locking because you don't know whether your applications are gonna have a lot of deadlocks and maybe you wanna be very aggressive so that way they're not spinning and waiting to acquire locks that are never gonna come out versus the overhead of checking everything all the time. And the different systems do different things. So the first issue we gotta deal with is how to decide what transaction we wanna kill. What do we want our victim to be? So choosing a victim is gonna depend on a lot of different properties about our transactions and no one weighs better than another. So we can choose things like which transaction is the youngest, right? And if we sign them timestamps when they arrive, you can look at the timestamp and figure out what time they showed up and then kill the ones that are the youngest. We can say which ones have actually executed the most queries and therefore if we abort them we'll have to roll back a lot of changes. We can say the number of locks they've already acquired in the database because it's expensive to acquire locks. So if you say you acquire like a hundred of them and this guy requires 10, maybe again that's wasted work requiring there's a hundred of locks and maybe you don't wanna roll back. So I'll kill the guy that only has acquired 10. You can say how many times the transaction's been rolled back because you don't wanna starve it out, right? So there's a bunch of different things you have to consider when you decide which transaction to kill. Again, I think usually they always try to keep track of how many times you restart it the way you don't starve out a transaction from getting aborted over and over again and sometimes they'll say you've restarted five times I can't restart again so it just kill you outright. Again, the different database systems do different things. The commercial guys that do two-phase locking with deadlock detection have all sorts of parameters that allow you to tweak all of these things, right? And then some of them can get kind of crazy and sort of build predictive models and say like well I think your transaction is gonna execute these queries and therefore you're likely to abort or not abort so maybe I'll kill you, not the other one, right? You can get very sophisticated in deciding how to break these deadlocks and this is what sort of separates the commercial guys from the open source guys. All right, so the next question is how far do we actually wanna roll back? So the most obvious thing is to say, all right, well I executed five queries, I have a deadlock, I can't allow you to proceed so I'll kill you and then roll back all your five queries. But maybe you actually don't need to do that. Maybe you only need to roll back one query, that releases the lock that you were holding for something else and then that breaks the deadlock, you pause the transaction that you rolled back a little bit just maybe over a couple of milliseconds and then you allow it to go rerun the query that you rolled back again and not have to restart the entire thing. And that might just be enough to break the deadlock without a large rollback change, okay? All right, let's do a demo. So we're gonna do Postgres and MySQL, right? Both are implementing two-phase locking. With deadlock detection. The Wi-Fi works. All right, what do we wanna do first? MySQL, Postgres. Raise your hand if MySQL, raise your hand Postgres, even less, all right, MySQL it is, okay. All right, so we have a simple table called TransactionDemo that has two tuples, right? ID one with a value 100, ID two with a value 200. So the first thing I'm gonna do is set a flag for the lock timeout. So this says, so NADB is the internal execution engine or storage engine of MySQL. So that's why it says NADB in front of it and then lock wait timeout. So this says if I try to acquire a lock and I can't after 10 seconds do something, right? So I have two terminals, so I'll do the same thing at the bottom here. This one seems dead actually. Sorry. All right, there it goes. All right, so I'll set my timeout like that. So the first thing I'm gonna do is set the both terminals around the serializable isolation level. So I haven't described what isolation level is, but just assumes again serializable is exactly what we talked about before, that we want conflict serializable schedules. So we'll begin the first transaction. We'll start at transaction down here. And then the first transaction is going to update the first tuple and just add one to its value, right? It's allowed to do that, that's fine. Second terminal, second transaction is going to update the second tuple and just add one to the value, so that's fine, right? So now we'll go back to the first terminal. We will try to execute an update on the second tuple that was already modified by the guy in the bottom. That stalls because the bottom guy holds the exclusive lock for it. You go down here and try to do an update on the first tuple, same thing. I can't do that because the one above it holds the exclusive lock, but now you see what happened actually post, sorry, my sequel recognized that there was a dead lock when we tried to acquire the lock in the bottom one and then killed it immediately, right? And then we also saw, you might have saw the terminal at the top, as soon as this guy got aborted, this one then got unstalled because it acquired the exclusive lock from the lock manager and then immediately did the update. Did everyone see that? So now the bottom one here, now I am aborted, so I can't do anything. If I ever try to read from the table, it'll tell me I'm gonna, no, let me do that, right? Post goes, we'll do that. But my change got aborted, but also you can see down here, I actually don't see any updates from the guy above, because I'm serializable. I can't read anything from uncommitted transaction, right? So that's why it says 100, 200. And now if I go back up here and commit, I did my changes, I go to the bottom, I do a select, and now I see the changes, right? Because I added one to both values, right? So my sequel is actually doing something smart here, where instead of waiting around, waiting for the dead lock detection thread to figure out there was a dead lock, it saw that there was a dead lock and immediately killed one of them, killed the one transaction that tried to acquire the lock that was being held by somebody else, right? All right, so let's go back, let's go to Postgres. So same table, two values, one, 100, two, 200. So for now, here what I'm gonna do is I'm gonna set the dead lock timeout, again, to be 10 seconds for both of them. Again, how long should you wait and to see the dead lock's not gonna be resolved? So then you gotta kill somebody. All right, so we're gonna start the first guy, sorry, we'll start the transaction here in serialized isolation level, search transaction at the bottom. Same thing, the guy at the top is going to update tuple one. The guy at the bottom is gonna update tuple two. That's fine, that was allowed to happen. Now this guy is gonna try to update tuple two. It pauses because I hold the exclusive lock at the bottom, I try to update tuple one, pauses because the guy holds exclusive lock at the top, right? And then it actually gives you great information, that happened really fast. So dead lock detected and then tells you that process two or three waits for a share lock on transaction and whatever. It's blocked by process at the top, right? And then it recognizes that you couldn't do actually what you wanted to do and then it'd go ahead and kill you. And then again, same thing, if this guy up here, when this guy got aborted, this thing was able to get released. Actually, did that happen right away or did it pause for a bit? I didn't, it paused? Okay, let me try that again. So let's roll back this guy, roll back this guy, right? So same thing, if I do select star from transaction demo, right, I see the values that I would expect. Okay, so let's try this. Let's set the timeout to be something larger, 20 seconds. All right, start a transaction, start a transaction, update tuple one, update tuple two, update tuple two, pause, because I hold exclusive lock at the bottom, try to update tuple one, pause, right? And then 20 seconds from now, within 20 seconds, the deadlock detection will wake up, recognize that there's a deadlock and kill one of them. And it should kill, actually maybe the top one? Yeah, nailed it, right? So what happened there? I started the bottom one first. It had an older time stamp than the top one, right? Then the top one was the one that got killed because it chose to kill the younger one and it kept the bottom one, which is older, alive. This is why I love sort of Postgres because it's almost like exact implementation from a textbook to describe these things, okay? That's a little more complicated example. Let's do the same thing where actually, you can actually see internally in Postgres that they're waiting for these different locks. So let me roll back my transactions. Actually, let me do this. Let me make a new terminal up here, right? And this will be the, set the deadlock to 20 seconds up here. So same thing, we'll begin a transaction in serializable mode here, in serializable mode there. First guy will update two to one. Second guy updates two to two. This guy tries to update two to two. This guy tries to update two to one. Now in this bottom terminal, before I can get it before it craps out, I can actually run a query against the catalog. That's a little hard to see. But at this bottom query here, I ran a query into the catalog and it said, what's the current state of the locks? And you can see here that Postgres is telling you that the process ID of my, one of these terminals is being blocked by another terminal. And here's the query that it's being blocked on, right? Set value where ID equals one, set value where ID equals two. So again, Postgres is maintaining this internal information in its lock table to know that this query is being blocked by this other transaction in this other process. And here's what it's being blocked on, right? And again, what did we see? We saw that I started the, actually I started this one first, but then this one got killed, not this one, right? But that's fine. All right, any questions about this? Yes. So in your example, you only detect the deadlock when the seconds where you want to find the lock. So only the seconds when it's killed in the lock and it's still with the deadlock. All right, so your question is, in my example, that it only detected the deadlock when the other transaction tried to acquire the lock? Yeah. All right, so I think he's saying this. All right, so if I go back again, start transaction here, start transaction here, I update tuple one, I update tuple two, now I try to update tuple two, and it pauses, but I don't do anything down here. And your question is, why doesn't it detect a deadlock? At this point here, there's technically not a deadlock. Yeah, the other side is not the one that's still going to get killed, it's not the other way to call it. So his question statement is, if the terminal at the bottom, it may get stalled and never dequeued and never released the lock, technically that's not a deadlock. Right, deadlock is when I hold a lock and I hold a lock and I want your lock and you hold that lock and you want to acquire the lock that I hold, right? And this here, the dependency graph is that this guy holds a lock, but he doesn't depend on anybody else because he already got the lock. The guy above it acquired one lock, but nobody else wants that lock, my bottom guy doesn't want it, but then it wants to acquire the lock that the bottom guy holds. That's not a deadlock. So this, I think in Postgres we'll stall forever. Right, like the cursor here is sort of flashing at the bottom because I submitted this and it's just sitting there. There might be a lock time out, command or setting you can set, I don't know what it is, that it'll eventually say this thing's not going anywhere, you can't proceed. So watch this. So now that I go down here, I call a rollback on this. Again, top thread is waiting to acquire an exclusive lock being held by the bottom thread. Soon as I roll back, again, I release all my locks, top guys completes because he's now able to acquire it. Okay? Yes. I don't know why it's different because if you don't call a rollback or it's missed, the lock is locked. If you don't call back a rollback and make the lock is what? Sorry? So if you're doing this transaction, you don't, and if you don't call a miss or rollback, the lock will still be held by the second. Right. That's not a deadlock though. That's just a stall. So what I'm saying is there's, we showed the example when there was a deadlock, Postgres deadlock detector woke up, killed it, killed the transaction and break the deadlock. Here it just stalled. Now there might be another parameter. I have to look at the Postgres manual that says, all right, you've waited a minute for this lock. You're not getting it. Let me just kill you. Right? My sequel actually might do that. Right? So then this made me real quick rollback this because we have to make, I want to talk about other things. So same thing. I update one, I start a transaction, update two, I update two. I can't do that. If I commit, it actually then tells me that I can't let you do that because this guy modified it and committed and therefore this guy would try to update something that this guy overwrote, but it updated something in the past and that's not, that's not coger. That's not serializable, so it killed you. So again, I encourage you to just pop in on Postgres and just try things out because it almost follows the textbook exactly. Yes? Something is what, sorry? Something that's cold. Cold? Cold. Cold killed, yes. Yeah, when it turns out, again, it's an abort. When I say killed, I mean aborted. All your modifications and that transaction are rollback and it's as if the transaction never ran at all. Right? Yes? So can we say that they are actually straight to best logging because the abandonment system only knows that it is safe to leave all the logs with transaction companies? So the statement is, since you don't really know what locks you're gonna release because there's no explicit way to do this in SQL, can you say that any system that actually does two-phase locking is really doing strict two-phase locking? As far as they know, yes. I might be wrong about this, but I think yes. I know SQL Server is strict two-phase locking. I think yes. But the algorithm doesn't have to be that. Okay. The other way to deal with deadlocks is to do, sorry, do deadlock prevention. The idea here is now, we're not gonna have a background thread to protect us. We're not gonna have a wasteful graph. We're actually gonna, at the moment you try to acquire the lock, do we check to see whether, if it's being held by somebody else, what should we actually do? Right? And so the way we're gonna do this is that transactions are gonna be assigned a priority based on when they arrived in the system. Right? It's gonna be given a timestamp. So what we're gonna say is that the older your timestamp is, right, the farther it is in the past from the current time, then the higher priority you have over other transactions. Let's say you have transaction T1 and T2. Say transaction T1 has timestamp one, transaction T2 has timestamp two, timestamp one has higher priority over timestamp two. So the two ways to do deadlock prevention are weight die and wound weight. So weight dies is we're gonna have the old transactions are allowed to wait for the young transactions, but the young transactions are not allowed to wait for the old transactions. So if I have my requesting transaction has a higher priority, meaning it's older, then the transaction that holds the lock that requesting transaction is trying to acquire, then the requesting transaction is allowed to wait for the holding transaction to release it. Otherwise, if it has a lower priority, the requesting transaction has a lower priority, it has to abort. So the old is allowed to wait for the young. Wound weight says that if the requesting transaction has a higher priority than the holding transaction, then you kill the holding transaction, steal their lock, and then start running, and keep running. Otherwise, the transaction that's requesting the lock has a lower priority, meaning they're newer, then you have to wait. So I think a visualization will make this more clear. So the first example I have T1, T2. T1 starts first, so T1 has a higher priority, but then T2 starts running, gets the exclusive lock on A on first, and then T1 tries to get that same exclusive lock. So under wait, wait, die, T1 has to wait because we allow the old to wait for the young, right? But under wound wait, T1 would be like this old man with a gun, come in and kill the young guy, steal their lock, and then start running, right? Under the other example now, T1 starts first, gets the exclusive lock on A first, and then T2 starts trying to get the exclusive lock, right? So it can't do that. So under wait and die, T2 would have to abort, right? Because, sorry, yeah, wait, wait, die, T2 has to abort because T1 has higher priority, and the younger transaction, T2, is not allowed to wait for T1, so it aborts right away. Under wound wait though, it's allowed to wait because the younger allowed to wait for the old. So what is essentially, what essentially this is doing is the same thing we did under latch-crafting. You would have the threads of choir latches always in the same direction, from top to bottom. We're essentially doing the same thing and are these one or two protocols. So you don't mix these, you're either doing wait and die, or you do wound and wait, you don't do both, right? And this again ensures that transactions are always waiting, always trying to acquire locks in one direction based on these time stamps, and you never have like an old guy hold a lock, and a young guy hold a lock, right? Different locks and have a dead lock. They're always going in one direction. So this is pretty simple, and this is what this solves. You don't need a lock table. You don't need a wait for a graph, you don't need a background thread, you just do this. Which I think actually I think my secret action might be doing this, I double-check. So we already talked about these issues. Why are there no dead locks? Well, there's only one direction of acquisitions. There's only one direction of one thread or one transaction trying to acquire another lock that's allowed to wait. You otherwise you kill yourself or kill them, right? And then when we restart, we always want to make sure we give the transaction the same time stamp. Obviously, what's the obvious reason? Or sorry, when you restart you use the same time stamp and the reason why you want to do this is because otherwise if you just keep getting restarted and get a new time stamp, you can get starved out. So at some point you're gonna be the oldest guy around, right, and you'll be killing things or waiting for things, right? You do whatever you want. So that's why you always reuse the same time stamp every time you restart a transaction. So is this clear? So, so far, we've been assuming that there's a one-to-one mapping between any lock that we're acquiring in our database and the objects that we want to acquire locks on, right? One tuple, one lock, one table, one lock. But going to the lock manager actually is not cheap, right? Because we have to protect that with latches because we want to have multi-threaded current access into our internal data structures. So now if I want to update a billion tuples, it's gonna suck if I have to acquire a billion locks or on all those tuples. So a way we can solve this is through lock granularities or hierarchical locking. And so now what's gonna happen is when a transaction wants to acquire a lock, we can make a decision about what that lock should be on, right, should it be on the database, should it be on the table, should be on the tuple, right? And then the goal of this is now requires to acquire the few amount of locks that we have to need to actually do what we want to do. And again, there's gonna be this trade off between, you know, you can acquire an exclusive lock on the entire database and do whatever you want, right? And that's the easiest thing to do and you don't have to acquire one lock, but that minimizes the amount of parallelism you have because now nobody else can do anything at the same time. Right? It's actually what a lot of systems used to do in the, you know, I'm gonna say old days, but like this is the most simplest thing to do, acquire a single lock on an entire database. SQLite does this because they're running in a better environments, right? It's not actually true because they have one writer thread, but multiple reader threads, but MongoDB essentially used to do this. They would have an exclusive lock on, I think the entire database, right? And then maybe you got it down to a single table, a single collections, and now they actually do the smart thing because they bought this company called WireTiger. So the way to think about it, we have this hierarchy and our transaction comes to come along. And so if it acquires a lock on the table, and again a table has two rules, the two rules have attributes, it implicitly requires locks on everything below it. So now we only need to go to our lock manager and say, hey, give me a shared lock on the table, and then everything below it becomes, you know, it's put in a shared lock mode. Right? We still have to do two phase locking here, right? We've got to make sure that nobody is, you know, we have threads trying to acquire these locks of these different objects at the same time. Right? And make sure we always go in the same direction, like we can acquire locks as we go down, we don't wanna go in the other direction. I mean, this is sort of another dimension we can have to make, to improve the amount of parallelism we can have in our system. So let's look at an example. So again, we wanna get my, I have an offshore shady bank account and I wanna get how much money I got in there, right? Cause I'm trying to dodge taxes or whatever. And then we have the TA Lynn and he's gonna go increase the balance of his account by 1%. So the question is, what kind of locks do we wanna acquire? Well, the answer is multiple ones. So we can have exclusive and shared locks for the leaf nodes in our lock tree. I say we just do tuples at the very bottom. But then we can have what are called intention locks, what are like hints to the upper parts of the tree to tell other transactions, what's going on down below it. So that you can then make the scheduling decisions about whether you're gonna be able to acquire the locks that you want down below before you get there. So again, an intention lock, essentially, take an intention lock on a higher level node in the tree. You don't take an intention lock on the leaf nodes. And it's a hint to other transactions that say, hey, I have a lock down here in, I have a node down here in exclusive mode or shared mode. You may not know actually which one, but just so you know, if you wanna do something like that one lock the entire thing, you could do that without having to check every single tuple or every single node below it. So if a node in the upper part of the tree is in intention mode, then you're gonna have explicit locking going down below either shared lock or exclusive lock. So there's three types of intention lock. The first is shared intention shared that just says that below me in the tree, there's a lock, there's a node being locked in shared mode, attention exclusive says a lower node, there's a lock in exclusive mode. The one that always sort of fouls up students is shared intention exclusive. It just says that I'm taking the node in the upper part of the tree in shared mode and implicitly that means I have everything below it in shared mode, but I also have an intention exclusive mode lock at the inner part to say that somewhere down below I have an exclusive lock on one node. So you get a shared lock on everything, but then it's a hint to say, oh, by the way, I'm also doing an exclusive lock somewhere else. So you can put these all in a giant compatibility matrix and now we're expanding what we have before. So the way to think and look at this is like, say T1 holds a node in one of these five locks and then T2 wants to acquire a node, the same node in other these five locks. So intention shared basically can be compatible with everybody except for exclusive. Exclusive is never compatible with anybody, but here you can see that again, as you sort of go up, things are less compatible with other ones. So if you have shared lock and you want to get attention exclusive, you can't do that, same with the shared intention exclusive, but shared lock is okay. Right? This probably doesn't make sense so let's go through an example. So what's gonna happen is we're gonna start at the top part of the tree and at each level as we go down, for our example we'll just have two levels. We make a decision about based on what operations we want to do on the database, what lock modes do we need to acquire? So we can get a, in order to acquire a shared lock or intention shared below me, I have to make sure I have the right locks above me. So to get a shared lock or intention shared lock on a node, then I have to hold at least an intention shared above me or something greater than that. If I want an exclusive lock or intention exclusive or shared intention exclusive, then I must at least hold an intention exclusive above me. So again, let's do a really simple example here. So again, we want to use, we want to get all the money I have in my bank account. So I want to read a single record in the table. So we only have two levels. We have the table locks and then we have the tuple locks. So my transaction shows up, once I read a record in table R and this is my record here. So in order to read this, I have to have a shared lock on the tuple itself but I can take a higher level intention lock above me so that I'm not locking all the tuples in shared mode. So in this, I can do shared intention shared for table R and then I just do my shared lock on my particular tuple. So now if anybody comes along and wants to do an exclusive lock on the entire table, that's going to be in compatible with the intention shared. So then Len wants to do a read or wants to update his record and he wants to update his tuple over here, he can get an intention exclusive lock here. That's compatible intention shared. It's just hints for other people and then it can go and get the exclusive lock for this one tuple that it wants to modify and then we're done. Again, these are just hints for other transactions to say if you go proceed down to the tree, you may or may not be able to get the lock that you want because what we don't want to happen is we don't want to maybe acquire, we want to acquire locks for a billion tuples. We don't want to acquire for 999 million and 999, all of them and the last one we can't get because somebody else holds an exclusive lock and we just wasted all that work trying to go into the lock manager. We sort of push up to the tree, it's most information impossible what's going below it so that we end up not wasting work. So this is a more complicated example. We have three transactions, T1, T2, T3. T1 wants to scan R and update a few tuples, T2 is going to read a single tuple in R and then T3 is going to scan two tuples in R and then, sorry, T1 scans them all, updates one, T2 is going to read a single tuple and then T3 is going to scan all of them. Right, so we first start off with T1 wants to scan R and update a few of them. So it just wants to read these two here and then update the last one. So it can go ahead and get at the table node and get a shared attention exclusive. Again, now that implicitly locks all the tuples in shared mode, right? But then it only needs to get the exclusive lock on its one tuple. So tuple one and tuple two are implicitly in shared mode, right? Because it wants to read them, but then in the last one we get exclusive mode. So for this transaction, say, you know, say this is a billion tuples along the bottom, I only had to acquire two locks, right? That's a pretty good trade-off. T2 wants to read a single tuple in R, so it's going to read this tuple here that's not being modified. So it can get the intention shared on the table R that's compatible with shared intention exclusive and then it can go and get the explicit shared lock on this tuple, and we're good to go, right? Transaction three wants to scan all the tuples in R, right? And again, we could just go down and get a shared lock on every single tuple, but instead we'll try to get a shared lock on the table itself, right? But this is not compatible with shared intention exclusive because one guy is modifying this tuple and it has an exclusive lock, so therefore this transaction has to stall until this thing is released. And I found that out as soon as I tried to lock the table in shared mode, right? Had I gone down to the leaf nodes and acquired explicit shared locks on every single tuple, I would have only found out that I couldn't do my query until I got to the last one, right? Because you have to acquire all the locks you want and all the objects you want to access before you're allowed to do whatever it is and what you want to do. So it's not like there's an iterator where I'm going to read a tuple, give that query to the next lock, read the next tuple, write the next lock, right? So it has to happen all at once. Yes? I'm just going to go back to another comment. I should just, yes? The intention shared lock and the list of locks has to set priority, right? So I'm going to go back and find it. So his statement is the intention shared lock has the same, you said priority? It says the same priority as what? Shared? As what, sorry? Wait, say it again, sorry. IS has the same behavior as IS? IS. Is IS? No, right? So if you hold IS, you're not, if you want to get, if you hold, this guy holds IS, you want to get a shared lock. That's not compatible, right? If I hold IS, I don't want to get a shared lock. That is compatible, right? So it's not, they're not compatible. They're not exactly the same. Okay, so again, the main idea with these hard to go locks is that it minimizes the number of locks that a transaction has acquired in order to do the work that it wants to do. And this minimizes our traffic or contention of having to go into the lock manager, right? And these are just sort of the rules you have to apply about how to, if you want to get a shared lock or a exclusive lock down below, what kind of locks you need to hold above it, right? And we talked a little bit about that before about lock escalation, but like, you can upgrade your locks if I hold a shared lock and go to an exclusive lock, but also you can say, if I have to acquire a bunch of shared locks down below, maybe I could restart my traversal into the tree and just maybe acquire the shared lock that I should have gotten above me, right? Yes? It looks like the general idea is kind of a new computation that what kind of lock is going to be held at a lower level? Correct, that's what I was trying to say. It's a hint to other transactions of what's going on. So why don't we just use the same approach as we did in what tree that's like, you're locking from the roof and then we go down and you're going to need the antithesis here for a long way? So this question is, why don't we just do the same thing? Why don't we just do a lock coupling or, sorry, latch coupling or latch crabbing in the B plus tree is as you're traversing the tree, you start releasing the locks that you don't think you need anymore. Why is that a bad idea? Yes? There's two phase locks. Exactly. Yes, that violates two phase locking. And under two phase locking, as soon as I acquire a lock, as soon as I release a lock, I can't go acquire more. So two phase locking follows under this tree, right? So what about you just like descend down to the leaf and then you release all your ancestors. Okay, so what we're talking about here is really sort of like for one query, right? You have to do sort of the same thing for the next query. And now you can't acquire locks as you already released them. So like if you're accessing another table, is actually, there's still a violation. This question is, or statement is, if you're accessing another table, and I release locks on another table, if I try to acquire new locks on the second table, that violates two phase locking? Yes. It's for the entire database. Yeah, it's sort of confusing about, I'm showing you a tree, and it looks like the B plus tree, but it's not the same. This is like an internal lock hierarchy. All right, I have like two minutes. Let me just go through real quick locking in practice, and then we'll cover all of that in a minute. Let me just go through real quick locking in practice, and then we'll cover isolation levels next class. All right, so as I said before, you don't actually acquire locks manually in your queries, right? You can provide hints though, to the database system about what your, what your transaction is actually gonna do, right? And there are ways to unlock the entire table, which is the next slide, and but you actually don't wanna really write programs using this, right? So this is actually not part of the SQL standard, but most database systems allow you to do something like this, and it's usually like lock table and the table name, and then whatever mode that you want, right? And the great thing about Postgres and the other major commercial database systems, they follow the textbook. So they have shared locks and exclusive locks. My SQL always wants to be different, so instead of a shared lock, they call it a read lock, instead of a exclusive lock, they call it a write lock, but the idea is basically the same. So here's how to lock the entire table in the different modes, right? Postgres is more terse, and then SQL's ever has this select one thing, which I don't fully understand. So you can lock the entire table. You usually don't wanna do this, and then there's a unlock command to unlock it later on. The, there are, again, there's also a way to provide hints to the data system about what your transaction's gonna do, what queries are gonna do. And so, as I said, anytime you wanna run a select query, the data system's gonna acquire the shared lock for the objects that you're accessing in that query. But maybe you're gonna read it first, and then modify it later on in the same transaction. So instead of acquiring the shared lock the first time, and then the exclusive lock later on, because we can upgrade our locks, you can say a hint in your select statement, to say, I'm gonna read this table, but I'm gonna update it later, so don't acquire the shared lock, actually acquire the exclusive lock. This is called select for update. So you just have your select statement, and then you just tack on this for update at the end, right? You can also do this in shared mode. It doesn't actually do anything, right? It's just sort of there, because when you read it, it can be shared mode anyway, all right? All right, so any questions about this? So I wanna stop here, and then we'll pick up on isolation levels and fandoms next class. Is there any questions? All right, guys, see you, have a good weekend, see you on Monday. The bright light talk will be tomorrow, and then project three is posted on the internet. Homework four is also available on the internet, and then I'll send out more information on how to get started on the extra credit this weekend. Okay? All right, guys, take care. Ha ha ha ha, that's my favorite all-time job. Ha ha ha ha, what is it? Yes, it's the SD Cricut, I-D-E-S. I make a mess unless I can do it like a Gio. Ice cube with the G to the E to the T. Here it comes, Duke. I play the game where there's no rules. Homies on the cusley, I'm a fookus, I drink brook. Put the bus a cap on the ice, bro. Bushwick on the go with a blow to the ice. Here it comes, Willie D, that's me. Rolling with fifth one, South Park and South Central G, and St. I's when I party. By the 12-pack case, I'm a four. Six pack, 40 act, gets the real bounce. I drink brook, but yo, I drink it by the 12 ounce. They say bill makes you fat. But St. I's is straight, so it really don't matter.