 Okay, I think we can start now. We actually have a lot to go through today and especially we want to show demo today as well. It's going to start a little bit early. So quickly go through the administrative stuff. So Project 3 is due on Sunday November 14th and very importantly we have a recitation that is going to be held this Thursday at 5 p.m. over Zoom and I highly encourage everyone to attend the recitation because actually Andrew and I observed that for many of you already used all your leaf days on Project 2. So on Project 3, unlike Project 2, we have some special policy change that will extend the deadline, etc. On Project 3, we have fixed all the grading policies, the scripts, etc. So we will not extend and you should definitely start early and check out this recitation and seek for help early as well. We will not extend and if you go over, there will be a penalty, obviously. So Project 4, sorry, Homework 4, again will be released today and it's due a little bit earlier on November 7th, alright. So getting a little bit advertised on the upcoming database talk that we are organizing. So next Monday, we'll have the CEO, I believe, and the co-founder of Pinecone to talk about their vector database. Essentially it's a database system, actually there's some noise, yeah. Actually it's a database system designed for machine learning applications, handle vectors, features, etc. And what's interesting is that the presenter, I mean, which is the co-founder of this company, is actually the previous head of the Amazon AI Lab. So it's kind of interesting to see what he got to say next Monday, alright. So get to the content today. In the last class, we talked about sort of like the standards on how do we define transactions that are going to be executed correctly, right. We talked about serializable transactions, especially we talked about serial transactions which means that transactions execute one after each other, right. That's definitely going to be correct. And then we talked, we said that we don't really want that to happen always, right. Because in many cases, we want transactions to be executed concurrently, right. So they are more flexible, we get higher hardware transition, as is potentially later latency, if we can execute them in parallel. So we talked about two different concepts that we can allow for operations in transactions to interleave each other, but then to make sure that they are still correct. And then essentially we talked about two analysis that we can decide whether a schedule of a transaction, of a set of transactions would be correct. First would be a conflict serializable, right. Essentially we can look at the non-conflicting operations in a transaction schedule and flip them around and see whether we can eventually put back a schedule of a set of interleaved transactions back to a serial order, right. And if we can succeed on that, the schedule would be called conflict serializable. And that's what we consider the correct transaction schedule. And we talked about a little bit more flexible standard, which called view serializable, right. Allows more sort of execution schedule of transactions. But in practice it's difficult to achieve because essentially we need to understand more semantics of the transactions, all right. And then just quickly go over this example right here. Let's say we have a transaction t1 that has read, right, right, and read, and read, and read, and read, and read, and read, and read, and read, and read, and read, and read, and read, and read, and read, and read, for example, would just be, these right to read conflict, right. We define a conflict when you have two operations happening on two different transactions, but on the same benchmark, and at least one of them is the right operation, right, then we'd be called a conflict. And that's the things we are trying to avoid to make the schedule of a transaction quote unquote correct, or in other words serializable, all right. So that's what we're going to talk about today. Just to emphasize, last class we are talking about kind of a theoretical analysis, right? Just assuming that we know the complete read-write set of different transactions and we know how things interleave and we know what a bot would commit, right? Just assuming we know all those things, then we analyze whether a schedule of the transaction, of a set of transactions would be valid or not, or in other words, serializable. But that's not really, I mean, in practice, you don't really know those read or write of a transaction in advance, right? So typically, just users, there's issues queries on demand and you don't really know whether a transaction will be a bot or commit either, right? So in practice, we actually need a different algorithms or different implementations to ensure that the transactions that executed in a system don't really have a conflict. And especially when there are a bot, I mean, the effect of the a bot transaction will not be seen by other transactions as well, right? Especially, again, under the circumstance that we don't really know the read or write set of different transactions in advance and then we don't really know which one a bot or which one commit either, right? So the solution we are talking about today would be, I mean, as many of you may guess, would be use logs to make sure that the records accessing by transactions are protected, right? So that there would not be invalid conflicts, all right? So when we say we will log to protect records, so what will be the most simple thing that we could do, right? For example, here, I'm showing you one example, right? We just, I mean, log and unlock things while we are accessing these records, right? For example, here, we can say, hey, there are transactions, T1 and T2, and when T1 needs to access records A, it can use a log on A, right? Oh, actually, we were assuming that there would also be a centralized location, actually not a zoom, I mean, in the database system, there would actually be a centralized location of component that would manage all those logs. I say, hey, what are the logs currently out there? Which transaction owns which log and which transactions are waiting for which log, et cetera? We'll go into details later, but essentially conceptually, there would be a component in a database system that's in charge of all those locking. And actually, that's something that you guys are going to implement at least complete in the last project as well, right? So it's very important. Okay, so here, again, back to this example, transaction A needs to read out right on A, then, I mean, for example, it can just ask logs on record A from the log manager, and log manager can grant log on A, right? Because nobody has had this log on A before. And then here, I mean, transaction T2 comes along, and then it wants to acquire this log on the record, but then it got denied, right? It was already held by transaction A. And then it has to wait, and wait, and wait. And then until later on, transaction A and finish, transaction A, transaction one, finish accessing on record A, it can just release a log, right? Then it notify the log manager. Log manager can now grant this log on transaction T2, and essentially transaction T2 finishes, and then it can release a log, right? So this is the high level kind of concept that we are going to talk about today. But this is a very simple example, and we all talk about much more specific algorithms. In the lecture, right? So to give you a heads up on what we're going to talk about today, we're going to talk about different types of logs at first, let's talk about the definitions, the terminology, and then we did talk about one canonical, a very widely used algorithm to achieve a concurrency control called a two-phase locking. I mean, we do that different logs we talk about. And then we talk about several issues that we need to deal with under the algorithm or method of two-phase locking, namely, data detection as well as prevention. And lastly, we'll give a little bit of a heads up on hierarchical locking. And because I want to show a little bit demo today, so probably we cannot finish hierarchical locking, but I want to give you a little bit of a heads up and we can continue next class. Just before we start, one important concept and we need to distinguish here is that there are actually two types of things that will protect records in the heavy system, essentially. This is a little bit different than the definitions you will see, for example, in CIP has a standard or operating system classes. Essentially, we'll distinguish two types of distinguished protection mechanism called logs and latches. The latches actually what you have seen before in this class would be the protection mechanism on the internal data structures of the database system. For example, if you have a beta B plus three index, and if you multiple threads want to access the B plus three index, then you would protect the records and nodes, et cetera, in this data structure that is internal to the database system and that's what we call the latches. On the other hand, the concept we are going to talk about today will be called logs. And they are essentially, again, protecting mechanism, but specifically for transactions. And what they apply on, they are just applying on the actual content in the database. Just the tuples in the table or the entire database, et cetera, but they are protection mechanism to protect the data content in the system other than some data structure that is used by the database system internally. That makes sense. So one is on data on the records tuples, these are logs, the other is on internal data structure like B plus trees or hash tables, et cetera. And of course, within this category of logs, there are different types that apply in different modes, different methods, et cetera. And lastly, there will be a lock manager, let me see it at the bottom, that would be coordinate everything, which would be a component in a database system. So we start with a few definitions of different types of logs. So you probably have seen these similar concepts in other classes or even in C++ standard. So at the most basic level, there will be two types of logs. The first would be called shared logs. Essentially, these are logs for read on the records, for read only on the records. So you can have different threads that both share the read log on the same record, but they can only perform reading on the record. And the other would be called exclusive logs. And these logs would allow a write or updates on a specific record or object. But on the other hand, as you can imagine, I mean, only one thread can hold this exclusive log. And then this compatibility matrix of these two types of logs, right? One read only, the other allow writes, okay? One called shared and one called exclusive. So the procedure of the log manager to grant logs is that while transaction executing, transactions are executing, if the transaction realizes it needs to access a specific object or record, let's say record A, it would first request the transaction manager to grant that log to that transaction. Either or in some cases, it could be update, right? Update from a shared log to an exclusive log, that also happens. And then the log manager would just check, hey, whether it has granted this log to other transactions before, right? If no, it grant the log. If yes, it has granted before, then it has to block that request. And if the transaction successfully acquired that log, and then, I mean, it does whatever it needs to do to that record, it finally, after it's done, it can release the log, right? And then the transaction, I'm sorry, the log manager would just be responsible for updates all the internal metadata, right? To keep track of which transaction is holding a which log and which transaction is releasing which log and what log would be free and then grant to other transactions, et cetera, all right? Okay, so come back to the earlier example. So I simplified the T2 transaction a little bit, but just to give you a more concrete example on these two types of basic logs. So here, transaction T1, redone, rewrite on A, and then read on A again, transaction T2, only write on A, right? So here, because transaction T1 needs to both read on write on A, it would just request a exclusive log from the log manager and the log manager granted a log because nobody was held in that log before, right? And then after transaction T1 is done with this record A, it can just release this log, right? And then later on, transaction T2 would need to acquire this log and then log manager granted the log, right? Because transaction T1 already released that log and then after T2 is done, it can release the log. And finally, right, because T1 only needs to read that record, it doesn't need to acquire exclusive log again, right? It only needs to acquire a shared log and then after everything finished, it's done, right? So everything seems good so far, right? But there's actually a problem happening here, right? So what is the problem? The problem is that transaction T1, actually transaction T1 performed two reads on record A, right? It first read it, write it back again, and then read it later. But between the two reads, transaction T2 actually come along and change the value that transaction T1 is written before, right? And this is actually unrepeatable read, or I mean, from T1's perspective, it's unrepeatable read, from T2's perspective, it's dirty writes, right? But essentially, I mean, this transaction schedule is incorrect, right? And nothing, I mean, nothing here is really, well, yeah, there will not be a serial schedule that would produce a result or the effect that the equivalent to this schedule, right? So this is actually wrong. So what does it tell us? It tells us that by simply locking records while you need to access them, that doesn't prevent cycles in the account, in the, I mean, in terms of cycles in the dependency graph, right? All according to our conflict standard in a schedule of the transactions, right? So even though you do this, you lock records while, I mean, you want to access it, if you draw the conflict graph or the dependency graph, there can still be cycles and this can still be wrong. We need something a little bit better, right? More sophisticated than that. Makes sense? Okay. Yeah, I mean, it just shows that this is wrong. Okay, so the first fundamental algorithm that we are going to talk about, actually invented a while ago, that helped us to achieve these concurrency control functionality and produce the correct result will be called a two phase locking, right? Essentially, the thing I would emphasize here is that, unlike the analysis that we did last class, right? We assume that we know all the operations and which transaction commit our board. In our two phase locking, it's actually a algorithm that we implemented in the real system, right? And in real system, you don't really know the schedule in ahead of time. You don't really know which transaction committed our board. So under this mechanism or protocol, it only can decide whether to grant lock or to ask a transaction to wait for a lock when you are executing the transaction on the fly, right? It don't really schedule things ahead of time, okay? So I mean, almost like the name says, under the protocol of two phase locking, there will be two phases, right? One phase will be called a growing phase and the other phase will be called shrinking phase. And in the growing phase, you only acquire locks, right? And of course, if we acquire requests to the lock manager. And in the shrinking phase, you only release locks, right? In other words, once you enter the shrinking phase, you cannot acquire any new lock anymore, right? It will make sense when we give you, make more sense when we give you some examples, but here, just again, illustrate what these two phase locking protocol is trying to do, right? Assuming that, I mean, here we illustrate a number of locks in the transaction during the time duration. And in the growth phase, number of locks just keep increasing, right? And in the shrinking phase, the number of locks just keeps decreasing, right? Once in the shrinking phase, cannot acquire any new lock again, right? So for example here, I mean, if in the shrinking phase, it would assume that it release some lock and then acquire some new lock, this will be considered invalid under the protocol of two phase locking, all right? So I'll give you a concrete example. Here, assuming, again, I think it's a similar example, it's the same example before, T1 read write and read, T2 only write, right? So here, in this case, T1 first enter the growing phase and ask the trans lock manager to grant a exclusive lock and then, I mean, grant it. And T2 here, because T1 is still operating on the records, right? It cannot release the lock yet. So when T2 trying to acquire this exclusive lock on record A again, it just got denied. It has to wait. And then after a while, only when transaction T1 has finished, right? It can finish all the accessing operation on record A, it can tell the lock manager, hey, I can release this my lock now, right? Then lock release. And after that, transaction T2 acquired the lock, exclusive lock on record A, and then after T2 is done, it finishes, right? Let's see here in this example, by specifically define the growing phase and shrinking phase, we actually avoided the earlier problem that happened before, right? Transaction T2 has a right sneaking between two reads from transaction T1, right? Here it doesn't happen. It's like a conflict serializable and this is considered a correct schedule. Make sense? Cool. So, so far so good. But actually this one problem that we, that still can exist, well, all to make this two phase locking algorithm inefficient, which is that again, when we analyze the transaction conflicts last class with these different conflicts, we actually always assume that we know which transaction, but both which transaction can miss, and we just sort of assume there's a magical algorithm that would eliminate the effect of transactions that have aborted, right? We will analyze the conflict last class. We only focus on transactions that have committed and then analyze where there will be conflict here. But in actuality, transaction may abort, right? So under these two phase locking protocol, I mean it can handle abort a transaction abort, but then when transaction abort, it may have a cascade effect that could affect other transactions and makes a system inefficient. So let me give you an example here, right? Again, here same thing, right? T1 and T2, right? And assuming that we have two records, A and B, right? So here, yeah, assuming that T1 is right on A, and T2 are right on A, and T1 come back and right on B, right on B, right? Here in this case, under a two phase locking, we could actually start T1 with a growing phase, right, lock on A, lock on B, and then after the growing phase is done, we release the lock on A. Actually, the read on B, write on B doesn't really matter here. It actually pretty much ignore this read on B or write on B on the transaction T1, right? But once the transaction T1 release lock on A, right? Finish, I mean after it finish accessing a record on A, it can, the transaction T2 can come along and then read this record, right? And then assuming that after that, I mean transaction T2 read that record and then do some update on that, assuming that after a while, transaction T1 abort. Then the record that is read by transaction T2 is actually invalid, right? So the schedule on the left, I mean satisfies a two phase locking, right? It is a growing phase and there's a shrinking phase. It ensures that, I mean it doesn't have the conflicts that we talked about before, right? But then if that happened, if transaction T1 abort, transaction T2 has also to be aborted as well, right? Because otherwise, this record read by transaction T2 would be invalid, right? So essentially here, as you see here, even though two phase locking would ensure that, I mean at the end of the day, all the committed transactions would satisfy conflict serializability, right? Everything is correct, but when abort happened, right? I mean if there's a other transaction that has read the record from a aborted transaction, the other transaction has to be aborted in a cascade fashion, right? And this could go on and on, right? Could be five, 10 transactions if there are lots of conflicts, right? So there will be a lot of work waste if you just use this original version of two phase locking. Make sense? Cool, cool. Yeah, so, well yeah, that's essentially what we just talked about here. So the two phase locking is actually a little bit of conservative schedule, right? There are actually certain conflict serializable schedules that would not just be allowed by two phase locking. Again, because two phase locking doesn't really know the entire schedule ahead of time, right? So it's a little bit conservative, but at the meantime under two phase locking, there may still have a dirty risk, at least temporarily, that may make transactions to abort in a cascading fashion, right? Which would generate lots of waste work. So the solution for that would be a little bit stronger version of two phase locking. There are actually different variants of solutions, but the solution we talk about today in this course would be a pretty strong variant of two phase locking called strong, strict two phase locking. And in other words, some people also call it rigorous two phase locking that would help to resolve the cascade aborting issue we just talked about. And then there can actually, there can be other potential issues, like if different transactions can, while if different transactions are acquired on locks, transaction one, for example, acquired on locks on A and B, transaction two, for example, acquired on locks on B and A, then there could be dead locks, right? And we also need to have mechanism to prevent transactions to have dead locks or to break them up when that happens. So let me first focus on this dirty read problem I just talked about. So the solution I have already mentioned for this will be called strong, strict two phase locking, right? So in this case, a transaction is only allowed to release locks after it has ended, right? So you can almost think that there's a homogeneous shrink phase, right? The growing phase is the same, right? But the shrink phase has actually just become collapsed together, right? You just release all the lock at the end of the shrink phase and then especially at the end of the transaction, either when the transaction committed or aborted, right? So here, right? So let me demonstrate here. So because of, I mean, historical reason, we still call this protocol two phase locking, but essentially, I mean, the second phase is just like a single point in time, right? When the transaction finish, it release everything, right? That's just the entire second phase. So definition related to this strong, strict two phase locking is a concept called a strict schedule, right? Essentially, a schedule is a strict. If a value written by a transaction is not read or overwritten by any other transactions until that transaction finishes, right? I mean, you can sort of tell how strong strict two phase locking guarantee that, right? Because once you acquire lock, you never release that lock and until the end of the transaction, right? So just guarantees this strict property of a schedule. And then again, the advantages of these strong strict two phase locking would be that first, it will not incur cascade abort. I will give some example. And then second is that when transaction abort, you only need to restore the updated value for this single transaction as well, right? So when transaction abort, there will not be, again, there will not be cascade operation you have to do to restore values from multiple transactions. You only need to deal with this single transaction when abort happens, okay? So let me give you an example here, right? Say here example, I just want to move $100 from my account to my promoter's account, right? And then we have another, this is our transaction one. And then for another transaction, it just sums up the total amount of, I mean, dollars from all accounts, right? And this echo here, you can just think of that and just print that information out, right? So under a non-toothless locking example, right? What we can have is that, again, back to the earlier example, we can just lock records while we need to access them, right? Here, for example, and also in the initial state of the database, there could be, I mean, for the account A, there could be $1,000, account B, there could also be $1,000, right? So again, in the non-toothless locking example, you just lock records while, I mean, this transaction needs to access them. They first acquire a exclusive lock on A, right? And then when T2 wants to apply a shared lock on A, it just has to wait, right? But then after a while, T1 can release a lock and then T2 can have the lock, right? On A, and then read the record. And then after that, it can lock. And after a while, I mean, oh, actually, yeah. After that, and T2 wants to acquire a shared lock on B and granted, and then later on T1 has to be blocked, right? And then wait for T2 to release a lock on B and then finally T1 can finish as well, right? But in this case, right? Because we didn't really follow toothless locking. So while transaction T2 is reading the record on A, it's actually read a sort of a half-updated value, half-updated value from T1, right? So A is already deducted by 100 by T1 in this case. And then when T2 trying to up output the summation of the two values, it only output 1,000 and 900, right? Which will be incorrect, okay? So under toothless locking, so there will be a shrinking face and then there will be, sorry, there will be a growing face and there will be a shrinking face, right? So first T1 trying to acquire a lock on A, granted, and then T2 has to wait in this case, right? Then comes back, I mean, T1 can actually acquire a lock on B ahead of time, right? Because it has to acquire all the lock in the growing face. And then after that, when T2 tries to acquire lock on B, it has to wait, right? And then only after a while, when T1 finishes all the operation, it can release, it can release T1, the record on A earlier, but then it can release the record on B, the lock on B later, right? And then in this case, T2 can finally acquire lock on B and then after it finished operation, it can release all the locks, right? So here, if we sum everything up, T1 and T2, well, when they sum up, when they sum up the record, the value from A and B, there will be 2,000, right? So in the two-face locking case, this schedule of the transaction would be correct, right? Finally, under the strong strict two-face locking, again the same example, but in this case, T1, while it is applying locks on record A or record B, it just never release lock, right? Until the very end of the transaction. I mean, T2 here, a similar thing, right? Even though it has to wait for transaction T1 to release all locks, after it acquires locks, it never release everything until the very end of the transaction, right at when it needs to commit. In this case, there will be just an example of strong strict two-face locking and then there will not be cascaded bars in this case, like this, all right? Again, it's still correct, right? A plus B would still be 2,000 in this example, okay? So if we get back to our earlier example, earlier diagram of the different category of transaction schedules, here, let's say, assuming that we use this big box to demonstrate, to represent all the possible schedules of a set of transactions, then it's a serial schedule just be a tiny portion of them, right? And then, beyond that, there will be a conflict serializable, right? To allow you a swap non-conflicting operations. And then beyond that, assuming that you know a little bit of semantics, there will be a view serializable, right? Would it still be correct but difficult to do? And then, on a different level, right? There could be a set of no cascading of bars transactions, right? I mean, some of them would be serial, some of them may not be serial, right? It's like a different set of transaction schedules. But then, the overlap of the non-cascaded bars and the serial transaction schedules would be called strong strict 2PL, right? I mean, it can guarantee that transaction schedule is correct but it's a little bit more conservative than conflict serializable, right? And in the meantime, it also, I mean, don't allow, it don't allow bars either, right? So it makes things a little bit more efficient. Makes sense? Any questions so far? Yes, please? Yeah. Have you known what state you don't want those four-page lines? Yes, yes, we don't need to go back yet. Essentially, the question is how do we, under regular two-page locking, how do we know we entered this shrinking phase, right? So, that's a very good question. So essentially, that's why in practice, most of the time people actually use a strict, a strong strict two-page locking, right? Exactly like I said, it's very difficult to know you already entered the shrinking phase unless you get some sort of hint from the users, hey, I will not need new locks anymore, right? Or my transaction has ended, right? It's difficult to know. So, yeah, I mean, that algorithm exists in textbook. It's a first developed algorithm in this category of algorithms. But in practice, people just use, most of them use strong strict two-page locking, yeah. Any other questions? Yes, please? All right. So, you can't actually allow the users to comment before pushing one, right? Yes. So, that means like every transaction is now varying from the value of this one-page and the value of this one-page and the value of this one-page is a bit above all the time they have to have a chance to do it. Sorry, I'm not entirely following what you were repeating, yeah. Yes. If a transaction has a return to a value at least a block of that, then maybe other transactions with strict shape of, to meet that value has to wait for T1 to commit it, before they can commit themselves. Yes, yes, they have to wait for T1 to commit before they can commit, that's true. Yes, yeah, because they don't know whether T1 will commit or not. If T1 will abort, they have to abort on a regular 2PL, yes. Well, is there any benefit? Well, the benefit would be that potentially you can release log earlier, right? Get more flexibility in your schedule. But in most cases, people use strong-strict 2PL, yeah. Yeah. Okay, any other questions, by the way? Cool. All right, then we talk about, I mean, 2PL and a strong-strict 2PL. We talk about how do we prevent cascade abort from dirty race and now just talk about a little bit about hey, what if there are deadlocks, all right? So, let's give you an example, right? Again, very simple here. Assume that transaction T1 acquires a log on A first, right, because it needs to redirect it on A and T2 comes along and wants to acquire a log on B because it needs to read on B and then assume that later on T1, T2 needs to read on record A, so it wants to acquire the log on record A but got denied, right, because it will be held by T1. And then, again, assume that T1 later on needs to read or write on record B, so it wants to acquire a log on record B and also got denied. So, in this case, it's just a deadlock, right? The both of the two transactions will be with or whatever, which is bad, right? We have to address this, otherwise this software just doesn't work, right? So, essentially, that would be just called a deadlock and then, most specifically, a deadlock which just means that there's a cycle in the different transaction waiting for each other so that, I mean, nobody can really release any log and nobody can really proceed, right? And, essentially, there will be at high level two types of approaches to deal with deadlocks, right? The one is deadlock detection. Essentially, it's a little bit optimistic. You just let transactions to acquire logs, I mean, in whichever way they want, right? And then, if there would be deadlock, if there would be deadlock, then you just try to detect the deadlock and then try to break things up when deadlock happens, right? And then, another approach would just be called a deadlock prevention, which is a little bit pessimistic. So, essentially, you will assume that, I mean, deadlock may happen very often. So, before a transaction acquire any log, you check, hey, if I get this log, whether there will be a possibility that can generate a deadlock in this different waiting relationship of the transactions, right? If there's such ability, you just, I mean, either you kill yourself or you kill the other transaction, right? But that depends, but essentially, you try to prevent deadlocks ahead of time, right? So, let's talk about a deadlock detection first. I mean, I will show them some demos related to that. So, to implement this deadlock detection algorithm, you actually have to sort of create a wait for graph, right? That's the standard terminology, a wait for graph to keep track of what logs that each transaction is waiting for to acquire. So, this wait for graph is actually a directed graph. So, if a transaction TI is waiting for a transaction TJ to release a log, then there will be a directed edge, right? From TI to TJ. And then you just draw all these edges for all the current transactions in your system. And if there's a cycle, then there will be a deadlock, okay? So, also the way to think about this is that in the internal component of the database system, there would actually be a background thread, right? So, the log manager would keep track of all this kind of metadata for which transaction is acquiring which log, which is waiting for which, et cetera. And also, there will be a background thread that just wakes up periodically, right? To look at, hey, this like a different relationships, different wait for relationship between different transactions to check whether there will be cycles, right? And if there's a cycle, the background thread will do something to break things up, all right? So, here in this example, right, T1, T2, or T3, let's look at just wait for relationship, right? Here, transaction T1 is waiting for transaction T2 because T1 acquires a log on B before, right? There's an edge. And similarly, transaction T2 is waiting for transaction T3 to release log on C, right? Because, I mean, again, T3 get a log before or earlier and then, finally, there are transaction T3 is also waiting for a transaction T1 to release a log on A, right? So, there will be a deadlock. And essentially, there's a cycle in this wait for graph on the right, right, which we have to break up. It's good. Nice. So, the way, again, like I sort of hinted a little bit earlier, the way we are going to handle this deadlock situation when it happens, it's just that we are going to select a transaction we call victim and to just kill that transaction, throw it back and just to break up the cycle, right? And this victim transaction could actually choose to either restart or just to abort and let the user to handle whatever error message they get, right? But the second case, the abort case, is actually more often. It just tell the user and give them the option, right? And lastly, I mean, as I mentioned before, there is actually a background thread periodically checking it, right? So, there's actually a trade-off, right? If you're checking it very often, then obviously you can detect deadlocks earlier and break things up earlier. But then, I mean, it takes time to build a graph and check the cycles, et cetera, right? So, consumes more resource. And then vice versa, right? So, let's talk about how to select this victim. And actually, in fact, there's not really a very, Jose, most of the time that the way to select this victim is actually kind of intuitive, right? It's kind of like a black art, right? There's no, like, a sophisticated algorithm to tell you, hey, this is the optimal way to select victim, right? Because oftentimes, this relationship between the transactions are kind of complicated and difficult to know, hey, what is the potential best way? So oftentimes, just people just use heuristics, right? And to give simple rules and make things easier to handle, easier to implement, and efficient. And just, they may not be optimal, right? For example, in many cases, you can break up the cycles by age, and you can just, I mean, kill the transactions with the oldest age, for example, right? In this case, the intuition would just be that the oldest transaction in your system may require the most number of logs, right? And then, I mean, if you kill that old transaction, maybe you just release many logs, right? And then, that potentially gave better opportunities to prevent dead logs in the system. But of course, on the other hand, you could also argue that, hey, the old transactions may already did lots of work, right? It's kind of a waste to just kill the transaction that already did a ton of work. So another way to select victim would just be that, hey, look at what will be the transaction that has progressed the most, right? If a transaction that has made lots of progress, you don't kill that. On the other hand, you kill a transaction that didn't do much work, right? Because it's cheaper to restart. Or you can also just directly look at the number of logs, right? So another reasoning would be that, hey, if a transaction has maybe locked, has required lots of logs, right? Then, potentially, it didn't need much time to finish, right? So maybe it just needs the last one or two logs to finish the entire transaction. So maybe you should let that continue and then kill the other transaction with, I mean, again, similar to progress, right? Less progress or less logs, right? Oh, and then there's the other scenario where if you don't use a strong strict 2PL, then if you kill one transaction, then that may have cascade effects, right? There are other transactions, either read the record, for example, either another transaction, read the record that is already written by my current uncommitted transaction, right? If I abort this transaction, then other transaction has to abort, right? So maybe in that case, we just don't abort that transaction, right? But again, these are all like heuristics, right? I mean, with some reasoning for each of them, but none of them would guarantee an optimal way to break things up, right? Yeah, oh, lastly, the another thing to consider is that we don't want a starvation either, right? So we also need to keep track of the number of times transactions have been aborted or restart, right? If we just keep aborting and restarting a single transaction, we'll probably give that transaction a little bit of priority as well, right? So there would actually two, let me see, okay. So I'll finish these slides and then we can see some demo to see that in action, right? So this slide just says that, hey, when a transaction aborts, you actually have two options, or different options to decide, sorry, let me take it back. When you select a transaction to be a victim, you actually have options to decide how do you handle that situation, right? You can either just completely abort that transaction, right? You owe everything back and either tell the user, hey, this thing goes wrong, I mean, do something with that, or you can just restart it automatically, right? Or you can actually try to go back to the transaction, query by query, to see that, hey, what would be the minimum number of queries that I owe back within this transaction so that my wait for graph does not have a cycle anymore, right? And then you just owe back the few number of queries up to that, I mean, good point, and then you just continue from there. Hopefully, when things start again, other transactions maybe proceed earlier and then they can finish earlier and then you will not meet this cycle or dialogue situation again, right? Just two options. So now, let me give you some demo on this dialogue detection action. Cool, let's see how this works. And I actually, where is this? Okay, it's here. Okay, oh, oh, I probably should. Yeah, it's actually not very easy for me to see that, but that's fine, next time I will. Okay, so this is my CQ, I have loaded it, right? It's my CQ database and then it is using that dialogue detection mechanism to, I mean, ensure that the dialogue can be break up and proceed and then in this my CQ database, right? I have created a table called the transaction demo and then there are two values in that table, right? Just, yeah, select that again, right? Or I can see that, right? Yeah, okay. Then this value one with ID, record one with value 100 and record two by ID two with value 200, okay? So, let me just to make sure that this, this dialogue detection mechanism is on, right? So here, let me use two command, okay? Well, the first command, right? As you can tell, hopefully, I don't know if people in the back can see it or not, right? Hopefully, you can see it. The first command, we set the inundaby dialogue detection is on, right? By default, it's on, but just make sure. And second, right? We set the global dialogue detection with time to be 50 seconds, right? And here, I'm going to start two transactions, okay? Oh, probably, ah, let's, probably started two times, right? So, let's just start it again, okay? I set the session isolation level to be serializable, right? And then we're beginning a transaction, right? And then I will do the same thing here, right? In the second terminal, right? I, speaking of transaction as well, right? Okay? This is actually going slow the next time I should bring another laptop, actually. So, what we'll do here is that we'll have a first transaction, right? To do a update on the first value, right? ID equals to what? And then we come back for the second transaction, we'll do a update, right? On the ID equals to, right? Make sense? And then we come back. We, oh, actually, then we can directly come back to the first transaction, right? We can do a update, there's two, actually. Oh, actually, yeah, sorry, yeah. In the second case, I haven't set the database. Shit. Let me commit this, right? Oh, this is not easy. Demo, start from, okay, now we are good. Sorry, sorry, sorry. Let me just make sure that I committed everything and come back. I start this transaction again, okay? Right? We set the execution level, begin, and then we, oh, oh, actually. Yeah, because we already selected two pause from the second transaction, right? So that while it is executing, it's actually still holding some locks, right? So that blocks my transaction on the first demo. Let me try to, yeah, after I commit this, right? You can see the first transaction also goes through, right? Makes sense? But that's not exactly what I wanted to show, but essentially that's, I mean, the effects of the log detection algorithm, right? So let me just quickly go back to this, right? And make sure that this thing is also committed, okay? And let me start this again. You can see this first transaction goes through successfully, right? It is looking at records on ID equals to one. And then the second transaction, right? Okay, also goes through successfully, right? But it's looking at the record on ID equals to two. And then come here, if we come back to the first transaction, right? You see, if we use a record on ID equals two, then, sorry, if we use a query on ID equals two, it is waiting, right? Because the second transaction already acquired a lock, right? So now this is what I actually want to show. So we come back to the second transaction, right? And then we do a change on ID equals to one, right? What happened here? What happens is that the system automatically detect that they're actually a deadlock, right? And then we set the deadlock detection timeout to be 50 seconds because of all the hassle. It's well above 50 seconds. And the transaction, the second transaction, once it issues this query, it already formulated a deadlock in this like a simple relationship, right? And the system detects that and breaks that transaction up and then just directly kill that transaction, right? Because it says deadlock found and then try restart. And then the first transaction on the top, we can see that it goes through successfully, all right? Yeah, cool. So now, let me give you an example on Postgres, all right? To see how things are doing there. So here, similarly, right? It's the same table, right? Two records, one and two. And then one record is on, one record has ID 100, the other is ID 200, right? Let me select this again to make sure that I'm on the correct database. Okay, I'm in the correct database. Nice. So here, let me, set again, there's a deadlocked timeout. Let's just set it to be a little bit fast, right? Let me make sure that we don't need to wait for too long to see the result and a similar thing, right? For the first transaction, we're going to, I mean, first begin the transaction, of course, and then access the record with the ID as well, all right? And a similar here. Come back to my commands. On the second transaction, right? We've accessed the record with ID equals to two first, right? Then let's come back to the first database, right? X2 again, right? Again, it's actually got a block here because transaction two is accessing the same record and already got the lock, right? Finally, if we come back to the second transaction, if we come back to this transaction, sorry, yes. And then when we get this, trying to access the record with ID equals to one, then it just got a block, right? Because there's a deadlocked situation. And then, yeah, after a few seconds, right? Remember, we have set the level of deadlocked timeout to be 10 seconds, right? And then after 10 seconds, it would recognize, hey, there's a deadlock. And then what's interesting here is that it actually shows you which transaction is causing this deadlock, like who is waiting on who and what type of lock it is getting, right? It shows the shared lock here. I think it's because I believe that when it is executing this query, it first tried to read that record first, right? So it was trying to acquire a shared lock first. But of course, eventually it needed an exclusive lock. But because the other transaction got an exclusive lock, it cannot get a shared lock. And but it just tells you which transaction is the block on which and then, I mean, why it is aborted because of this deadlock reason. So the next thing I actually want to show you a little bit is that just actually internally, I can show you that how a Postgres is tracking all these information. They actually have a specific table. You can actually access these dependency information of different transactions, right? To see how they are internally maintained and try to detect this. So to do that, actually, let me just try to commit everything. Oh, it's a roadway because it got aborted. Let me just try to commit this thing as well. Committed, okay. Yeah, there's no transaction progress. Okay, yeah. Another thing I just want to point out is that when I'm trying to commit this transaction here, right? It doesn't allow me to commit, right? It says already your back, right? Just a thing to point out. So here, what I want to show you is that I want to set the, wait a second. Set the timeout time to be a little bit longer, right? For example, to be 300 second, right? Just arbitrarily long. Oh, I don't know. Oh, sorry, I accidentally copied the wrong command, sorry. Okay, set the timeout to be 300 second and then again, we do the same thing, right? We do, actually, I think I can just directly find the earlier command from here, right? Begin the first transaction, the second transaction, right? Should have transaction access record T2 first, right? And then we go up, we begin the first transaction. The first transaction will access record T1 first, right? And then we let the first transaction access T2 and then we let the second transaction to access T1. Oh, it goes through. What I did wrong this time? Actually, well, I mean, I don't know what I did wrong this time, but it didn't go to a block. Maybe I just messed up the commit history or anything. I don't think I have time to go through this again, but essentially, right? In Postgres internally, they will have this table to specifically track of different relationship between these transactions, right? And you can access that table and to look at which transaction is waiting on which. And that's what Postgres use to formulate the dependency graph and detect that logs and break things up, all right? Cool. Next time, I think I will probably use a different laptop to set this up, right? This is not very easy to present. Oh, sorry, I should, okay. Any questions so far? On the demos on the log detection to physical locking. Then for the rest of time, we talk a little bit about a different mechanism called the local prevention. Also trying to address this deadlock issue and then we give a little bit of heads up on hierarchical locking, all right? So we talk about this one way to address the log is, again, to let transactions to acquire logs as they want and then break things up when there's a cycle, right? When there's deadlock. But then another way to achieve this is that we can actually, before the transaction acquire any log, we check whether there will be a possibility, right? There's this log acquisition we'll call the deadlock, right? If there is, then we just directly release, oh, sorry, we do something accordingly, right? Either kill this transaction or kill the transaction that is already held in log, okay? So that will be called the log prevention and this will not require this width for graph or background checking, et cetera. We talk about earlier when we have the deadlock detection mechanism, makes sense? So in deadlock prevention mechanism, we actually assign priority to transactions based on their time stamps, right? So we actually always assign higher priority to the transactions that have a higher time stamp, essentially the older transaction. And we do this because we don't want starving, right? So we can see that essentially while we are killing transactions, sometimes the transaction gets started, but we always keep the original time stamp of the transactions even though we start at the transaction again. So in that case, we always ensure that at some point a specific transaction has become the oldest transaction in the system and has the highest priority so that it can proceed but not be killed. And essentially there are two different schemes to achieve this deadlock prevention depending on which type of transaction to which type of transaction otherwise we'll see that depending on which you are going to kill. So the first scheme is called the weight die or essentially we only allow the older transactions to wait for the younger transactions. So essentially if a younger transaction want to compete or acquire a lock that is already held by an older transaction, we wouldn't allow that to happen, right? We will kill the younger transaction right away, but that's called weight die, right? If a younger transaction, if it's a younger transaction, it doesn't allow to be waiting, right? It has to die, right? And then on the other hand, it's called wound weight. So it will be that we only allow younger transactions to wait for the old transactions, right? Only allow that one single direction of weight. So another way to look at it is that when an old transaction want to acquire a lock that is already held by a younger transaction, we'll give a priority to the old transaction. So we'll kill the other transaction, the younger transaction that is already waiting, right? So we only allow younger wait for old, but if we old trying to wait for young, trying to acquire a lock that's already held by young, we wouldn't allow that waiting to happen. We'll just kill the other younger transaction right away. That make sense? Okay, so in this case, again, give a little bit of a heads up. What we want to achieve is that we want there would only be one direction of waiting among all the transactions, right? So if we, again, we could draw the weight graph here, but even though we don't draw it, right? The potential weight graph would be guaranteed no cycle, right? Because there's only be one direction of edges in the hypothetical weight graph, even though we don't actually need to compute it, right? Because we don't need to detect the dialogues here. That make sense? Okay, so here, so give you an example, right? Here at T1, T2, in the example above, assume that T1 acquire this exclusive lock and then T2 guard the lock first, and then T1 as a older transaction now wants to acquire this lock, right? So first, under the weight die scheme, right? Because T1 is older, right? It is actually allowed to wait, right? In this case, older transaction always waits for younger, right? But if you are going to use a worn-weight scheme, right? Because T1 is trying to acquire a lock on T2, but older transaction is not allowed to wait for younger transaction, we will kill T2 right away, okay? On the other example, it was the opposite direction, right? Here, T2, as a younger transaction, is trying to acquire a lock that is already held by T1, which is an older transaction. So here, in this case, in the weight die case, right? We wouldn't allow a younger transaction to wait for older transaction, so we are abort T2 right away, but in worn-weight case, right? We will allow younger transaction to wait for older transaction, right? This is a one-direction edge, so we can allow T2 to wait, right? And then hopefully, I mean, T1 can release the lock at some point and things can continue, all right? Very easy. So, yeah, we sort of already explained this, why do these schemes guarantee no deadlocks? Well, because there's only one type of direction, right? Allowed in this hypothetical weight graph of these transactions, even though we don't need to generate the weight graphs. And when a transaction starts, what will be the priority, right? So it is the original time step, right? We will transaction you start because we don't want the things to start, right? We are guaranteed that if you start enough times at some point at a time, you'll always be the oldest transaction in the system, so you have the highest priority, right? So you don't start forever, all right? Any questions on deadlock prevention mechanism? Okay, no question, cool. So for the remaining time, I'm going to give you a little bit of a heads up on the hierarchical locking, even though we don't really have time to finish it today. So here, what we observe is that so far, we will talk about these locks, right? We only talk about a one-to-one mapping from a lock to a record on the database, right? We will talk about ABC, I mean, et cetera, right? That's a notion of record that we use. But actually, so in actuality, right? A transaction may, for example, read the entire table, right? And the entire table may have a bidding record, right? In that case, for the transaction protocol or commercial protocol we talked about so far, it will just need to acquire one lock on each of the individual bidding records, right? So that will just be very time consuming and it's actually much more expensive than acquiring latches when you scan, for example, a B-plus tree because for this, you can see no matter whether it's deadlock detection or deadlock prevention, you have to go to the lock manager, right? Check who is waiting for the lock of this record, whether this will already be held or not. In some cases, you need to generate this dependency graph. Or in other cases, you have to go back to see, hey, to determine which transactions already have the lock and who to kill, et cetera, right? So acquiring lock is actually a much more expensive than latches. And then if you, if a transaction needs to read a billion records, then acquiring a billion locks will be very, very expensive, right? So another way to do this is that maybe we can acquire locks at a higher level, right? So if I know that a transaction would need to acquire locks on the, would need to read this entire table, right? Acquire locks on the entire table, read the records from the entire table and acquire locks for individual records, then why not instead, I just have a giant lock, right? That I can just lock the entire table, whether it's either read or write, right? And then we only need to acquire this one single lock on the entire table without going through each individual record, right? As it would save lots of lock acquisition time. But the problem also obvious, right? Here, if you only have a table level or database level of locks, then every transaction have to lock either the entire table or entire database, right? The simplest lock would just be the entire database. You just lock the whole thing. Then it's usually back to the, back to the original case where we only allow one transaction to be executed at a single time, right? So given this, we can see that there's a trade-off between coarse-grain locks and the finer-grain locks, right? In terms of the flexibility, you allow different schedules as well as the actual lock acquisition overhead. So to balance those trade-offs, to find a sweet spot, that's why we introduced the concept of hierarchical locking, right? So we have locks at different levels that cooperate with each other. And that's what most systems will do. At least like a mature system will do. So again, so this, to reiterate this concept we talk about, we transaction or acquire a lock, it can decide what will be the granularity of that lock, right? Has the opportunity to decide whether it's an entire table, entire database, or individual, or a tuple. And what we want is that we want the transaction to acquire fewest number of locks possible, right? To achieve the operation that it needs to do. But in the meantime, we want the lock to be as fine-grained as possible, right? So that we don't want to this transaction to block other transactions that they don't need to block, right? So this is a trade-off between parallelism versus overhead. So here, to use this a little bit, right? Let's say we have a transaction T1 that come along that wants to access all the tuples in the entire table, right? So here, instead of directly lock all the tuples, right? It can just, I mean, as a diagram draw here, it can just acquire a big lock on this table A, right? So that it can just already have the exclusive access. Or alternatively, right? It can choose to lock all the tuples, right? In this data, in this table, that will obviously be more expensive. But on the circumstance that T1 only needs to access one or two tuples, it will also allow more flexible scheduling for other transactions, right? Similarly, it can also acquire even, acquire locks even on individual attribute level, right? So how do we balance those locks in the entire system, right? If we have the opportunity to acquire locks at just a different level, especially with different categories, right? Either can be shared, either can be read, right? It's like lots of options we can choose here. So the way to address this to make different locks at different levels to coordinate with each other will be called intention locks, right? It's kind of like a try-out lock. Or it's kind of like lock you acquire but to give other people a heads up of what you want to do, right? So essentially, an intention lock allows a higher level note on the kind of like a database object tree we talked about where I showed you earlier to be locked in either shared or exclusive mode without having to lock all the descendants, right? So on the other hand, if a node is locked on intention mode, then some other transaction is actually some transactions can actually do explicit lock at a lower level of the tree, right? So let me give you some specific definition, right? So here we have three types of intention locks, okay? So the first type will be called intention shared, okay? So this is trying to say that, hey, my transaction may acquire or we all acquired a shared lock at a lower level of this node, right? It's a specific shared lock. But for the other nodes, I mean at a lower level, it may not be acquired lock or it's not exclusive either, right? And then the second called intention exclusive, right? It will indicate that there will be a explicit locking at a lower level of this node with exclusive locks, right? So that just to give other people a heads up that, hey, this node is not entirely locked, right? But some children of this node is locked in the exclusive node, exclusive mode, right? So this just tells other people or give other people a heads up, see that, hey, even though my node is not entirely locked, either shared or exclusive, but there will be some a node down below that is either shared or exclusively locked by my transaction or how this is useful, right? For example, if one transaction does not need the entire lock on the big node, but only acquire a, for example, a exclusive intention exclusive lock at a lower level on a specific leaf node, then other transaction will know that, hey, I cannot acquire either a shared lock or a exclusive lock on the higher level node, right? So this gave other people a hint of what kind of higher level lock they may request, and it makes sense. Well, I will give you some examples, hopefully, in the next class, but here I will just talk about, give you a heads up on the definition and then I'll show you how, what kind of lock are compatible with each other. And again, for the last definition, there will be a type called a shared plus intention exclusive lock, which will mean that my current transaction acquired a shared lock on the entire node, right? Assuming that my node is a table, then this last type of lock would tell me that the current transaction acquired a shared lock on the entire table. But in the meantime, down below this node, there could be either one or a few leaves that has been acquired with exclusive lock, right? So they would just tell other people that, hey, I mean, even though I'm only reading this entire table, you may be able to read some other nodes, not some other leaves, right? Below, yeah, below this table, but you cannot acquire either a shared or exclusive lock on the entire table anymore, right? But on the other hand, for example, if I hold a shared intention-inclusive lock on a table, then other people can still hold either intention shared or intention-inclusive, right? Because none of the intention-shared or intention-inclusive will specify that it needs to have access on all the records on this table, right? Sorry, it can hold intention-shared, it cannot hold intention-inclusive anymore, right? So again, back up the example, if one transaction holds this shared intention-inclusive lock, that means that my transaction have shared access on all the tuples, right? But then I also have exclusive access on a few tuples down below, right? So this is compatible with intention-shared because intention-shared does not need the entire access on the entire table, it only needs access, the shared access on a few leaves down below this table, right? So this is compatible with this shared intention-inclusive lock on the entire table. But it's not compatible with this because the not compatible with the intention-inclusive lock because the shared intention-inclusive lock will have shared lock on the entire table, right? That means that any leaf below this table cannot be acquired a exclusive lock anymore, right? Because I mean, there's already a shared access on everything. So this would just be the graph, the compatibility metrics for all these types of locks, right? We can see that for the last type of lock, right? This kind of exclusive lock, then nothing really can be shared, right? If you have exclusive lock on the specific tuple, then I mean, there's no other things can be acquired either on this tuple or on the higher level nodes above this tuple, let's say table or database. But for some other intention locks, right? For example, intention-shared, it can actually be compatible with most of the other locks on the table, right? So that's for today, right? In the next class, I will give you more examples on the specific protocols on how to use these shared locks in collaboration with the original two types of basic locks we talked about earlier, and I'll give you examples on how this exactly perform, right? Yeah! I see Jay talking about the Sennah's groove, run through a can of two, share with my crew is magnificent, bust is mellow, and for the rest of the commercial, I pass the mic on to my fellow. No need for a mic, check, toss it, the bees all set to grab a 40, confirm the yoke and snap the snacks in on. Sennah! Take the sip and wipe your lips, cue my 40s getting more, I'm out, he gots the flip, drink it, drink it, drink it, then I burp, after I slurp, I skew, I put in much work with the BMT and the e-trump, get us a Sennah's groove on the dump.