 I want him not to have kids. He didn't listen. Okay. Um, all right, today's two phase locking, but before we get into that, uh, quick overview of what's coming up for you guys. Project three is due on number 16th. We're having the Q and A session tonight at 8 p.m. As I announced on, uh, on, uh, Piazza, uh, homework three is out actually server project three. We bumped up the due date to be the 16th of Wednesday. So giving you three extra days, uh, homework three will still be due on the Sunday before on the, on the 13th. Um, and then the last day of the class, the last lecture is on, uh, December 8th, and that'll be a live calling show, uh, or lecture. And so we will announce how to participate in this so you can come to lecture and then there'll be questions you can submit ahead of time because we, you know, a lot of you asked like, what about Redis? What about whatever, uh, rather than try to sprinkle everything throughout the entire semester, we'll have one, uh, lecture where, uh, people can come and ask any questions they want. And it will also allow people from the outside, uh, outside CMU call in as well. Okay. All right. Any questions about project three? Hopefully everyone has started and again, there's the web version that's available that runs in your browser that you can test things. All right. So last class, we talked about concurrency control and, uh, we spent most of the time on the context of acid, uh, talking about isolation level or isolation. Um, and we talked about how we wanted to generate a, have a schedule of, of transactions where we would be able to interleave their operations such that they, the end state of the database would be equivalent to one where we executed the transactions in serial order, one after another. And we had sort of two notions of serializability. We have conflict serializable and view serializable. So conflict serializable, we showed how if we do that sort of simple swapping method or we looked at the dependency graph or the precedence graph and we ensured that there was no cycles between transactions, then we know it was conflict serializable. Um, and then for view serializable, this was more complicated because it's, it's again, it's, it's understanding the semantics about what the, what the transaction or application actually wants to do with the data. And that's not something that we can verify. So for going forward, we'll ignore view, view serializability. Um, and so when we say, you know, when we talk about serializability, we, we primarily mean, we only mean conflict serializability unless we're, unless we're explicit. So the last class was all about looking at these sort of static schedules or static, uh, list of transactions. And here's all the read and write operations that they're going to do, right? And, and I said that like, you know, this is good for pencil and paper because we can understand what does it actually mean for something to be serializable or the implications of this, but almost no system would actually work the way that we showed because most of the time you don't know the read, write set of transactions ahead of time. Um, and so today's class and next class will be about is a how at runtime as queries are showing up from transactions from the application. We have no idea what's coming in the future. How do we actually figure out a way to ensure that we, we maintain serializability. And so today's class will be, we're using one solution where we're going to use locks to protect the database objects. And again, so I mentioned last class that for the most part, when we talk about these algorithms, we don't only care what the database objects are, they could be rows, they could be tables, it could be pages, it could be actually single attributes. In practice, we'll see what the actual granularity is. But for the core algorithms, the sort of fundamentals of it, it doesn't matter where you say there's a database object. It's, it's, you know, it's whatever it is, it doesn't matter. So let's see how we can introduce locks in schedules and see, uh, and see how this'll help us. So here we have two transactions, T1, T2. T1 is going to do a read on A, write on A, and read on A. And then T2 is going to do a write on A, and sorry, read on A, then write on A. And see now you can enter, you see here I have explicit lock commands, uh, where we can lock and unlock, uh, an object corresponding to the, or a lock, we can acquire and release a lock that corresponds to the object that we're trying to read or write to. And so when the transaction starts, it makes a lock request. And so in order to this thing, we're going to call the lock manager. And this would be sort of centralized authority, or component in our database system. That's going to keep track of, here's all the locks that we have, here's who's holding them, and here's who's waiting for them, waiting for them. So for this one, assume that there's no other transactions running right now. So when T1, uh, issues the lock request, it goes to the lock manager. Lock manager says, hey, you want to lock on A? Nobody has that. Your, your lock, it's, it's granted to you. So then you can proceed, you can proceed doing the, the read on A and write on A and so forth. If there's a context which, now T2 starts, it tries to acquire the lock on A. Uh, when it goes to the lock manager, the lock manager is going to deny that request because T1 already holds it. So it's going to have to stall and wait. Right. And again, depending on our, uh, arch, system architecture process model, this could be literally stalling the worker thread that's waiting for this, or we just, if it's, uh, non-potemptive scheduling, we just know that, that T2, the worker for T2 should never run again. And the thread can, the worker thread can do other things. It doesn't matter for our purposes here. So then the context which goes back to T1, it does the write on A, read on A, then, then unlocks. So it sends the request to the lock manager, it gets released. And at this point now, we, we grant the lock back to T2. It can do the read on A, write on A, and then it gets released. Right. So this is the high level idea of what we're going to talk about today. But obviously it's way more complicated in this because, uh, this actually won't work in practice. Exactly. I was assured. But the core idea again is this lock manager thing is going to be the, the, the place we go and ask, ask for required locks. So today's lecture, we're going to talk, start off, start off talking about lock types. Then the main algorithm we're going to use is, or the core algorithm we're going to use for, uh, making sure we acquire locks in the right, in the right way. It would be two phase locking. Then we'll show how to handle deadlocks. Um, and then we'll finish up with hierarchical locking, which you will need for, uh, project four. Okay. Right. So I showed this table earlier in the semester when we talked about, uh, concurrency control in indexes. Right. I mean, and I said we're going to mostly focus on the, the right side of the, of the table. Right. These latches. So now, uh, we're here to talk about locks. Again, this is why we have this distinction between the notion of a latch and a lock. Right. And we're in the OS world, they would say a latch is really a lock because in databases, our locks are going to be these higher level constructs that we're going to use in transactions that are issued by applications when they call begin. Um, and then it's going to protect the database contents for the length of the transaction. It's not entirely true in some cases, but we can do that for now. There's going to be a bunch of different lock modes. Uh, we're going to have explicit, sorry, in our database system, we'll have to explicitly handle deadlocks for our, for transactions. Whereas in latching, we just said we have to be a good programmer and make sure we don't write crappy code that has deadlocks. And we'll have different techniques to handle that. And the, the information about what locks are going to be held, uh, and who holds them, it's going to be in the centralized lock manager. Whereas in lash, we said that was too expensive because like you're just trying to acquire a latch for a single node in B plus three, you don't want to have to go do a look up in some lock table and say, can I, can I acquire that latch? Right. But in database or dealing with these, these database contents or database objects, the, the idea is that it's so expensive, maybe to update the database or update the database object that it's worth the cost of going to this lock manager. And there are many ways we can mitigate this, the, the overhead of doing this. Right. So in, uh, for locks, uh, for, for, again, higher level logical locks, there's, there's basically two types. There's a shared lock, uh, an exclusive lock, and we're calling it shared and exclusive to distinguish it between a right latch and, and a read latch. But the, the idea, the high level ID is the same. So if you, if you have a shared latch, somebody else can acquire the same shared latch on that object. But if you have a exclusive latch, that's an exclusive lock, that's incompatible with all the other lock types. So only one transaction can hold a lock type at this time. Right. So for the first half of this lecture, we'll, we'll assume that there's only these two types. In reality, there is way more, uh, different lock types. Some of these we'll cover, some of these we won't cover. Uh, so this is just for the top five, uh, major, uh, relational databases. As you can see, there's these compatibility, they have their own compatibility matrices. And they have a bunch of these shared, exclusive intention locks. We'll cover these at the end. I'm just trying to say that there's more stuff that you could do, not just these two lock types. Okay. And then we'll see how we do have granularity at locks, at tables, at pages, at tuples, at attributes, right, at different levels as well. And then there's also multi-versioning, which we'll cover next week. And that complicates this all, a lot of this as well. We'll get to that later. Okay. So as I soon so far, the ways it's going to work is that the transactions are going to request locks, uh, to our lock manager, or they're going to actually request upgrades. If they already hold a lock on an object, they can say, hey, give me a better lock, give me a higher, uh, higher level lock for the same object. The lock manager is responsible for granting or blocking these requests. Uh, and then the transaction is going to release locks back to the lock manager. And then the lock manager, the lock manager can then decide, okay, what's the next, who's waiting for this lock that has just got released? And who should I let run next? And then depending on how you handle dead locks, either there could be a background thread that runs in the lock manager that looks for, for, for dead locks and picks one to kill, or you can do a sort of ordering protocol as you acquire locks to decide at the moment you try to acquire the lock, will there be a dead lock or could and kill transaction right then and then again, we'll cover how to handle that later on. Right. So the main idea I want to point out here is that again, this lock manager has a global view of all the transactions that are running, all the active transactions and what locks they hold. And we need to have this global view because the lock manager is going to be responsible for taking the corrective course of action when a dead lock occurs. Meaning it can decide, okay, there's two transactions are waiting for the same lock. Who should actually get it? And priorities. And this is going to allow it to do more sophisticated ordering or scheduling to make sure that we, you know, we're trying to use all the use all the resources available to the database at all time. And again, the cost of updating a database object we assume is way more expensive than updating a single B plus tree node or some other internal data structure. So that's why it's worth having a worker go through this centralized lock manager, as opposed to trying to keep things in place as you do with latches. All right, so let's go back an example here. So we have going to have T1 is going to do a read on A, write on A, followed by a read on A, and then T2 is going to do a just a write on A. So now we're going to introduce different lock types. So T1 is going to start off doing an exclusive lock on A. No other transaction is running right now. So it gets that. Then it does the read on A, write on A. Then it goes back and unlocks A. Now T2 starts running. It acquires the exclusive lock on A. Does the write unlocks it. But then now we come back over here and T1 gets the share lock on A and reads it, reads A. So what happened here? It says the loss right. Now not quite, close. What's that? Read at the right or unrepeatable read, right? It's the anomaly, right? Because we read A, we got one value, then we wrote it and expected we read it again, we should get that value, but we get this other guy's value. So this is just to show that just because we use locks doesn't mean we're automatically going to be generating serializable schedules. We need to control or have a way to specify the way in which transactions are going to acquire locks to avoid these types of problems. So this is what the two-phase locking concurrency protocol is going to do. So this was, I think, the first provably serializable and correct concurrency protocol, of course, and then by IBM in the 1970s, and the system R. And it's basically going to, it's a pessimistic protocol where you assume that transactions are going to have problems, going to have conflicts or anomalies, and therefore you require them to acquire, you require transactions to acquire the locks when anything they want to touch before they can do anything on it. Next class will be optimistic concurrency control where you say, I think things are going to be okay, so you don't need to acquire locks now, but let's, we'll check things when you go to commit. This is making sure you have the acquire locks before you can do anything on the objects. And again, unlike last class where we had to have the complete schedule and all the rewrite operations ahead of time in two phase locking, we don't need to know all the queries ahead of time. So we can call begin and then as they issue queries, they'll, you know, you acquire locks on the fly as you go along. We'll also say too that I'm showing you like in the schedule back here, I'm showing you explicitly like here's the exclusive lock or the share lock request in, you know, 99% of the time you don't actually write your application like this. You don't issue explicit lock requests. Yes. So his question is why can't we just require every transaction to only unlock locks or release locks when you commit? That's strong strict two phase locking. Yes. That's one way to do this. Yeah. Right. So two phase locking obviously has two phases. There's the growing phase and the shrinking phase. So in the growing phase, the transaction is going to request all the locks that it needs before it does anything on an apico object. Again, it goes to the centralized Davis manager and the lock manager is allowed to grant or deny requests as it goes along as the transaction requires these things. In some cases, we'll just have to stall and wait until we get the locks that we actually need. In other cases, the lock manager say you can't this transaction cannot continue. You have to abort and roll back all your changes. Remember I said last class there's a you can call commit or, you know, at any time your transaction is running, the data system could decide you can't keep proceeding and it forces you to abort and rolls back all your changes. And if you want to reissue those query requests, you have to do it in the context of a new transaction. We'll see examples of this in Postgres and MySQL. So the second phase is called the shrinking phase and this is when as soon as the transaction tries to release or downgrade any lock that's already previously acquired in the growing phase, you enter the shrinking phase. And at this point going forward, you can't acquire any new locks. You can release locks, you can downgrade locks but you can't upgrade, you can't get new ones. And if you try to do this, the data system will stop, you can say abort and roll back. But like I said, typically you don't write explicit lock commands in transactions. The data system will handle this for you automatically. All right, so visually it looks like this. The growing phase is going forward in time. So these are just like the number of locks that I'm holding. I'm not saying what the locks are. Are they on tuples? It doesn't matter. And so I'm acquiring locks but then as soon as I decide I'm going to release one, then I enter the shrinking phase. All right. And so I can't do this. I can't release a bunch of locks and then say, oh, I need other ones and try to acquire them. That would be a violation of two phase locking and the data system will prevent you from doing that. Yes. So the same it is, what if you have two transactions running at the same time? They're both in the growing phase. They need locks that are held by the other ones. You have a dead lock. We will handle that. Yes. All right, so let's go back to our example we had before. We had the dirty read problem. But now we're going to do with the two phase locking. So T1 starts. It gets exclusive lock on A. That's granted. Does the read on A, write on A. Then there's a context switch. T2 starts running. It tries to get exclusive lock on A but it's going to get denied because that's being held by T1, right? And then T1 can run again. Does the read on A. Then it unlocks A. Then that releases the lock to T2. T2 can do the write. Then it unlocks and then it commits. Pretty straightforward, right? So at this point here, I'm not showing you this. You could put a downgrade on the lock on A for T1 because you're going to read afterwards. But again, we're showing a complete schedule here. The application may not know it's going to do that. So it also may not want to send the extra like, hey, I downgrade this lock for me because that's another round trip over the network. That's why typically it's sort of like done, it's done automatically. Yes. Right, so he brings out a good point here. So assume I don't know anything. So other than, first thing I can do is read A. So why am I getting an exclusive lock on A rather than a write lock? So for simplicity, I'm just showing you the exclusive lock. You're absolutely right. In a real system, you would do the read on A and at the read on A you would go get the shared lock. And then you try to write on it and then you can upgrade that to a write lock. Now there are a way to do hints in SQL, you can say select for update, which I'll see at the end. So I can do a select but also tell the data system, oh, by the way, I'm going to read this tuple, but I'm going to come back and write to it right afterwards. So don't get a shared lock, get a write lock. So that way you don't have to do the upgrade cost. Yes. Yes, so he brings up another good point. So if the system doesn't know the complete schedule, how would actually the shrinking phase actually work because if you don't know that you're like, never going to acquire new locks, how can you stay, I'm going to start shrinking. So in practice, most systems will, you will run with the strong strict where you said before, you release all the locks at the very end. But you could in theory do this. We'll talk about isolation levels next class. Most systems also run were like by default, they'll run at a lower isolation level, not serializable. So like I'll do the read on A, say I get a shared lock, do the read on A and then immediately release the shared lock, right? Which it violates 2PL. They don't run exactly as this. So it's the same as when that result result in correctness issues. Yes, but would you know? Some people would know, some people don't. Most people don't until it's too late. There's many stories of that. For now, again, this is what I'm saying. I don't want to introduce isolation levels just yet because I want to understand like, okay, this is what it means to be correct, have correct execution in the context of serializability. And then once we understand that, then we can dial it down and see how we can cheat. It's not the right word. Dabble into lower isolation levels and maybe that's okay. For some applications, that's okay. Other applications, like if it's your bank account, probably not okay, right? Okay, so as I said, 2PL by its own is sufficient to guarantee conflict serializability because we know that the schedule it's gonna generate, assuming after we execute all the transactions and the queries is gonna be, the precedence graph will have no cycles, right? And then therefore we know that it's serializable. But it is subject to another problem, which is not exactly an anomaly because it's not correctness issue, but it's sort of for performance issue and it's called cascading aborts. So say we have now T1 is gonna do right, read on A, write on A, and then read on B, write on B, and then T2 is gonna do read on A, write on A. So at some later point, T1 is gonna decide that it wants to abort, right? So the issue here is gonna be that if we do a write on A here, unlock A, which we're allowed to do under two phase locking, because we acquired the exclusive lock on A and B, we unlock A, we're not gonna acquire any new lock, right? So we can unlock A here. That unlocks A, so I'll T2 to start running, but now it's gonna read that right. And that's okay, right? Because that would still be equivalent to a serial ordering. But the problem is now we read something from this guy over here and then it aborts. So now we gotta roll back this other transaction here. So this is what it's called, again, a cascading abort. It's allowed to do this in two phase locking. I will say that if T2 tries to commit, we have to keep track of the read-write set. We have to know that T2 read something from T1 because since T1 hasn't committed yet, we can't tell, we can't commit T2, but it can just sit in this sort of pending state waiting for T1 to commit. There's a bunch of sort of scheduling the background going on for that, making sure you don't commit until the thing you read, the transaction that wrote something you read committed. Because we don't wanna leak anything outside of the world, but of course the problem is we let T2 run under TPL. We did a bunch of work, but then we have to now roll it back. All right, so this is all a waste of work. Save, writing B or writing A here was very, very expensive. Well, now we did burn a bunch of cycles and a bunch of disks and a bunch of money potentially doing stuff that we eventually had to roll back. So under TPL, it's gonna allow schedules that, so TPL is gonna have a couple of problems. Some of them we can fix, some we can't. So the first is that there are gonna be schedules that are not allowed under TPL, but technically are still serializable and the theory should be allowed, but just because the way the protocol is, and it's very, I don't use the word pessimistic, but it's more restrictive on what is allowed to read and write, what transaction is allowed to read and write or go at a certain time, just because it's more restrictive, we're not always gonna be able to achieve all possible schedules that are serializable, but it's good enough in most cases. And we're gonna do this because most of the time we prefer correctness over performance. So it's okay to give up some performance cost, meaning we can have a better interleaving that's more parallel in exchange for making sure that we don't have any anomalies. But we can still have dirty reads, and so we still have dirty reads, but we hold transactions until they commit until we know that the other transaction that wrote the data that they read commits, but we'd have a cascading abort, and we can prevent that with strong strict two-phase locking, and then we're still gonna have deadlocks that we have to handle. We'll get to that next. Yes. So this question is, is it the issue that TPL can have cascading abort? Yes, and then your question is like about hiding the work that T2 did until we know that T1 commits? So that we have to do, yes. Question, does TPL like hide the work or does it avoid cascading abort? Yes, so TPL would allow T2 to see what T1 did, but if T2 tried to commit, it's not exactly TPL, it's the data system itself, but I guess that's false on the TPL. T2 would, the data system would say, okay, T2's commit, committing, I know it read something from T1. I don't know whether T1 is committing yet, so let me wait and to find out what happens with T1. T1 then aborts and says, okay, well who was waiting on T1, T2 was waiting on T1, so go ahead and abort T2 as well. Yes. Is cascading abort considered as a solution to dirty reads? The question is, is cascading abort considered a solution to dirty reads? Yes, to avoid dirty reads, you have to wait for this guy to see what happens. And so my example here, it's read, write, one, TPL, but if I write a billion TPLs, I'd have to wait for that guy to find out what happens to him for this transaction, and then if he aborts, roll back all one billion changes, right? So small changes like this, who cares, larger things, it's a problem. All right, so let's see how to, you're gonna handle this dirty read problem. And so this is gonna be strong strict two-phase locking. I think the textbook might call this rigorous two-phase locking. The basic idea is that the shrinking phase, I don't say it goes away, but it's not, you can't incrementally release locks. You hold all the locks until you call commit, then technically that's the shrinking phase, and you release all the locks at once. There's a lesser version of strong strict, just called strict two-phase locking, and that's where you're allowed to release the exclusive locks, but not the shared locks. For simplicity, it's all the locks. So again, I'm acquiring locks, acquiring locks. Whether or not there's actually a shrinking phase anymore, whatever, it just says that when I call commit, then all my locks get released. And then now you avoid this problem because, that I showed before, because the, going back here, I wouldn't be able to unlock A at this point. I would have to hold it to the very end. So T2 wouldn't be able to run, because it wouldn't be able to acquire the lock on A, and therefore it wouldn't see the dirty write from T1. Yes? This question is, and he's correct, doesn't this make the system a lot more slower? Absolutely, yes. But again, it's a trade-off. Like, do I do a bunch of, if I have a lot of conflicts in my transactions, and therefore like, they're reading, they're reading the other things that the other guys wrote, like, then I'm just burning cycles, doing stuff, and undoing it over and over again. Whereas if I just sort of let things, if I was more restrictive and went things a lot of run, then I wouldn't do a bunch of ways to work. This question is, would we know in our, in a database system, what's the cut-off point where we want to do strong strict versus regular two-phase locking? As far as I know, no system does it automatically. You can't control this in PostgreSQL and MySQL and the open source systems. And this is a good example, it separates the enterprise system versus the open source ones. Like Oracle, DB2, and MySQL have a bunch of different parameters where you can change all these different behaviors of how it does deadlock detection prevention and two-phase locking and so forth. I think Segal Server gives you this by default. Because I said, actually, because you don't know, because with serialized isolation, you wouldn't know exactly when you went on lock things because you don't know what the queries that are coming afterwards. Yes. So this question is, what happens when I commit? Am I releasing the locks in sort of one single atomic update or change? Or can I do it incrementally? So you could do it incrementally, like it doesn't matter. It's not a correctness issue because the transaction that acquired the locks is committing. It's not gonna read or write anything else. So I can just start releasing the locks and let everyone else then read my changes. Now, I don't wanna get too much into durability stuff yet, but you'd wanna do this in a way that you, some systems, you may have to flush the log to disk. Now all your changes are durable. Then you release the locks, but of course now you're waiting for a disk flush before you do that. So this is where you can cut corners and maybe I'll release the locks because I'm assuming I'm not gonna crash between the time I've released the locks and flush the log to disk. And there's not gonna be another transaction that's gonna then read my change and be able to commit and expose something to the outside world before that little race condition. So for absolute correctness, you'd wanna flush the disk, then release the locks. I think everyone releases the locks first. We'll see this in other things especially in just systems where the protocol says you're supposed to do a bunch of round trips and make sure things are flushed to disk. Nobody does that for performance reasons. Because the likelihood of a crash is low and people are willing to make that, take that risk. All right, so the term strict actually has a real definition. It's not just something I'm saying as an adjective. So in transactions, we say that a schedule will be strict if the value written by a transaction would not be read or overwritten by any other transaction until that transaction finishes. And if we do this, then we can guarantee that we won't have cascading aborts. And then for transactions that do get aborted, we, because there's no cascading aborts, it's much easier for the system to roll back changes because we don't have to sort of roll back and look at this dependency grabber, who read what and who wrote what. We just say my transaction abording, I know that nobody else has modified any of the data that I've modified. So I just reverse back to the previous version. But I don't have to maybe look at a version chain or look at multiple versions and figure out what's the right one. Again, we'll cover multi-versioning next week. It complicates a lot of things. But for appers, if it's strict, it makes this a lot easier. So let's look at an example now. So we have two transactions. T1 is gonna move $100 out of my account to my bookies account. And then T2 is gonna compute the total amount of the total amount of all the accounts and return to the application. So we have only two accounts, A and B, and then assume that this echoing is just returning back the sum of A and B together. But we still have to do the reads and writes on A and B. So if we do this without 2PL, without the protocol of like making sure we don't, you know, we don't, we have the phases where we acquire locks in the growing phase and release locks in the shrinking phase. If we ignore all that and let transactions acquire and release locks any way they want, you would end up with a situation like this where T1 would start, get exclusive lock on A, does a read on A, takes $100 out, then unlocks A, then it acquires exclusive lock on B, does read that, adds $100, then writes that back out. But in between this interleaving here, even though we have a shared lock on A to try to do the read, we'd eventually get it, then be able to read it, but we'd read the dirty write from T1. This guy would stall here, and then we'd end up with an output like this. We'd be off by $100 because we would read, T2 would be able to read the state of the database after the update on A, but before the update on B because we were allowed to release a lock on A here. So that's bad. But if we do two phase locking, just regular two phase locking, then we would, when T2 tries to acquire the shared lock, it'd be denied. And then after T1 gets exclusive lock on B, then it releases A. This guy tries to share lock on B, that gets denied as well, but then he waits, but then now you're guaranteed to see, again, the changes to A and B either before or after T1 ran, and in this case it's after T1 ran. So we end up with the correct summation. Yes. So yeah, so the first point that there's, it's the same, right, so the question is, like how does it know what locks it needs to acquire? Does it analyze the query plan before it runs? No, because you don't know what data, the time you're generating the query plan, you don't know what actually data is there. Like you would have a, like you'd potentially have a wear clause, you don't know what tuples are gonna satisfy that wear clause. There is a type of locking called predicate locking that does do what you're asking about, but like that's in theory, nobody actually does it. Like some paper from the 1980s, but nobody does that because it's too hard. Okay, so the question is, how do you know your point where you can start, like how do I know here that I could unlock A? And again, what I'm saying is that in most systems you would call begin, you start issuing queries, and you wouldn't actually call lock and unlock explicitly. And so the database system would not actually be able to know that I'm never gonna read A after this point, therefore I could unlock it, right? There are some systems, Fauna is one of them, DynamoDB is another one, where like you basically, they're called like reconnaissance transactions, you're on the transaction once, they call commit, but you don't actually make any changes, you just read what they're supposed to read. Then you issue the transaction again automatically, now you know all the read-write sets, and then you can introduce the locks and unlocks in the right places. As far as DynamoDB does this and Fauna does this, I don't think any other system does this. He says Cassandra will do it next year, we will see. Cassandra had their own, they tried to do, compare swap transactions and it wasn't correct, five or six years ago. I mean, yeah, I saw that there was a guy from Apple gave a talk on this, yeah, we can get a Cassandra later, but yeah, so where am I going with this? Fauna, DynamoDB does it now, Cassandra's doing this in the future, Apple's throwing a lot of money in it at Cassandra, so I think they'll pull it off. Because Apple runs the largest Cassandra cluster in the world, it's like massive. So transactions are a good idea, they're going to add this. All right, so again, this is just regular two phase locking. Let's see this with strong strict. So it just means that I acquire exclusive lock on A in the beginning, and then I don't release it ever again. So that means basically T2 will try to get the share lock on A and get denied, because T1 is going to hold that lock the entire length of the transaction. Then when it goes and commits, then that releases the lock on A, and we produce the same, the correct result. So just to show you here, example here, like in the case of, you can ignore like, whether this is exactly the exact time of when things would actually get finished, but like you can kind of see here that there's, there was actually an opportunity to read A before T2, T1 committed, right? So there is a chance to get some parallelism here, whereas in this case here, it basically becomes the serial ordering. Yes? This question is why can we read in T, sorry, in this one here? Why can we read what's in T1 before this one here? Why can we read A before T2, T1 commits? So, I mean like, again, this is a PowerPoint, not exact ordering here, like, I mean this point A is unlocked, so you're allowed to read it, right? And then there's an, internally there's a transaction table that says, here's my actual transactions, what is their status? So, when this guy goes to commit, it would say, okay, I read data that came from T1, is T1 even still around? Have they committed? If yes, then I can commit. There's additional metadata that you have to track for transactions of what data they read to determine whether you're allowed to commit. We're ignoring that for now. So like, yeah, so say this is the commit block, right? So I'm calling commit and then I am locked, right? So, at this point, if I call commit and this thing gets aborted, then the lock gets released after I roll back the changes I made to A and then this guy would start running and he wouldn't see any of the changes of the T1. Or, this thing succeeds and commits, like, and then I can't see the change, right? If I just, I'll just move this down a little bit and then maybe it's more clear, right? I'm just, the exact ordering of the lines is not, it's imprecise because it's PowerPoint. Yep. There's latches to protect the lock table, like, the lock, yeah, there's, there's other concurrency girl stuff going on but the ideal idea is what I care about. Okay, any other questions about Too Faced Locking? So I showed this diagram last class and I tried to fix the colors because someone had seizures or something. And so this is like the universe of all possible schedules we could have for our transactions. And then we said that there was a smaller subsection of this that we had the serial orderings for a set of transactions. Then around that would be conflict serializable. Around that would be fuse serializable. But now we're introducing some new ones where we have this notion of cascading, having no cascading aborts in our schedule. Some of that will be serializable, some of that will not be serializable. And then within that, we would have strong strict Too Faced Locking because it avoids cascading aborts. And by definition, if you're, if you're strong strict Too Faced Locking, then the schedule would be conflict serializable. If you're conflict serializable, by definition, your view is serializable, right? And obviously as you go down closer into the serial region here, there's less parallelism available in the database system, right? Because the end of the day, if it's serial, then you're almost executing, I mean, you are executing them in one after another, like in one work or thread at a time. Yes? Can you say the most? If you ask for serializable execution and the system supports Too Faced Locking, you're very likely to get this. But I said, like I said last class, you can ask for serializable isolation in Oracle, you don't get, you don't get real, you get snapshot isolation, you get another isolation level. Yes? So the statement is like, let's save that question to the end of the semester. Again, like for some things, it's fine. You're like, who cares? Like the Amazon Shopping Cart famously went, it's how they built DynamiteDB. It didn't do transactions, it didn't do any of this. Because there's your Shopping Cart, who cares, right? If it's your bank account, I think you would care, right? Or like that Bitcoin exchange that got destroyed, they care. Well, they didn't care and that's why they don't exist anymore, right? His statement is like, the question, statement is like at a bank account, like with a million customers, to have these kind of problems, it would require you to like, multiple people try to do the same thing at the same time. That's how those Bitcoin things got destroyed, right? Because they exploited something because they weren't running real transactions. Again, we could talk about this, the NoSQL movement from the 2000s, early 2010s was all about like, let's not do transactions, let's not do SQL, let's not do joins, those things are slow, we care about performance. He just said right there, Cassandra's adding transactions, this kind of stuff. Cassandra was one of the vanguard of like the NoSQL movement, we're like, we don't need transactions. Turns out transactions are a good idea. For I showed that quote from Google last time, it's better off to have people like, deal with slow performance, potentially from transactions and have some really smart people, i.e. you guys, work on the extra database system and make that thing faster, rather than like some random JavaScript programmer try to figure out like a mental consistency stuff, right? I keep thinking on JavaScript, but you know what I mean. Okay, so we saw how to handle cascading imports, strong strict, let's do deadlocks. And again, there's gonna be two approaches here, there's gonna be deadlock detection and deadlock prevention. So look at an obvious example here, what happens? So T1 wants to do a read on A, T2 wants to do a read on B. But then the brothers are gonna try to get, acquire the lock on the other object. So T1 gets the lock on A, that's fine. T2 gets the lock on B, that's fine. Now T2 tries to get the lock on A, but T1 holds that, so it has to wait. T1 tries to get the lock on B, that's being held by T2 and it has to wait, right? It's the classic deadlock. So the two approaches to handle this is deadlock detection. And this is where we're gonna have a separate background thread that's gonna run and check to see, hey, do we have a deadlock? If yes, let me kill one of the transactions or one or more transactions, try to break it. And then deadlock prevention would be, as I try to acquire locks, there'll be some way to order decide the ordering of who can acquire the locks and then we can abort transactions accordingly, right? And as I said before, this is a classic example of where the enterprise systems are gonna be way more sophisticated and have way more features and be better than the open source ones because they're gonna maintain a much internal statistics about what transactions are doing and what they're likely to do in the future and then try to pick the transaction to kill that will reduce the amount of rollback that we have to do and maximize concurrency, right? So with deadlock detection, the lock manager is gonna build this wait for graph that's just gonna keep track of what locks are being held by what transactions and what are they waiting to acquire from other transactions. And so the nodes in the graph will be transactions just like before in the precedence graph and then we'll have an edge from one node to another if that one transaction is waiting for the other transaction to release a lock. And so there'll be this background thread that's gonna wake up every so often, check the lock manager to build this wait for graph and then if it detects a cycle, then it decides, okay, what transaction do I need to kill to break that cycle? So a really simple example like this, you have three transactions now. We don't really care what the operations they're doing, we just look at their lock requests. So in this case here, T2 wants to share lock on B but T2 already has a share lock on B because those lock modes are incompatible, like if someone holds the exclusive lock, so nobody else can also acquire exclusive lock or share lock. So we have an edge from T1 to T2. T2 wants exclusive lock on C but T3 has a share lock on C. So we have an edge from T2 to T3 and of course we have T3 trying to get exclusive lock on A and T1 already holds the share lock. So we have a cycle here, right? The cycle doesn't necessarily need to be between two transactions, it just could be through two or more transactions, right? So there's a bunch of different cycle detection algorithms that exist, I'm not aware of the ones better than another, it doesn't matter, simple is often good enough, right? So when we detect as a deadlock, the database has to say, okay, I get a kill transaction to break it, right? And the victim transaction will either restart entirely and restart to the beginning and maybe sort of re-exuse some things on its own. That's rare, that's only if you're running with store procedures that you can restart the transaction. Most of the times the transaction will get aborted, the client gets a notification or exception to say your transaction is aborted and then it's up to the client code to automatically or to rerun the transaction if they wanted to run it again. Yes, so the statement is, isn't this a boarding thing passing back something to the client and require them to forget what to do? Yeah, because isn't the idea of like, sorry, maybe I'm mistaken, isn't the idea of like having the database, it's just like, oh no, you tell us a few to handle it and you don't have to depend on what time the time code is done. I think what he's getting at is, his statement is, I'm saying in this case here, if I bought your transaction, you get a notification to the client and you tell the client, hey, you gotta rerun, whatever you just ran, run it again. Like, yes, but it's unavoidable because there could be application logic on the client side in between these queries and that where we don't, this doesn't know anything about it and therefore it can't make a decision of like, is it okay for me to rerun what you just told me to run before? So for example, say there's a select query, you get back from result and now there's an if clause in the application code and Python, PHP, whatever, right? That says if the value is this, then run this query or run that query. But now when you rerun the transaction again, that value it reads back in that select query now may change and therefore the query, the if branch may go down a different path. We don't know any of that, so we can't rerun it for you. You have to, the client has to do this. The way you could handle this is through store procedures, which we'll cover at the end of the semester, basically think of like an RPC or like taking application code, extracting it out and embedding it directly inside the database system. Now it's a function call. So now we could rerun that application logic from its entirety. And then there's if clause in the store procedure, they'll get fired off executed just as before. So most applications do not use store procedures. Therefore, you have to, I'm not saying it's a disadvantage to the database system, it's the right thing to do. Like you have to have some preventive code to say, okay, if my transaction fails to abort, I can roll it back. Most people don't do that. Most people, like you get a random error on the website and says, couldn't try this, try again, right? If they write crappy code, you'll see that. But underneath the cover, they may try like three times and then give up. We'll get on the client side. So there's gonna be this trade off between how aggressive we are in checking for deadlocks, right, it's a separate thread. It's checking, building this wait for the graph, it's checking for these deadlocks instead of running queries. So we could have the deadlock detector run every minute, but now that means if there's a deadlock, we know we have to wait at least 60 seconds, or at most 60 seconds before things can get resolved. But if we're running the deadlock detector every one nanosecond or one microsecond, then yeah, our deadlocks will get killed right away, but now we're burning a thread just checking for this thing over and over again. So again, how aggressive you are depends on the likelihood that you're gonna have deadlocks in your workload, in your application, and again, this is something that you can configure at control, as we'll see in a demo. So now however you're actually gonna pick a victim, depends on a bunch of different things, and there's not one sort of formula I can give and say this is the right thing to do because it depends on the workload, right? So an easy thing to do is just kill the transaction that has the highest timestamp, whatever is the newest one. But obviously that meat could be problematic because we could have a transaction that is old, but it hasn't changed anything, and a transaction that's brand new, but they changed a billion tuples, and therefore if we kill the newest one, we're rolling back a bunch of those changes and that's expensive. So we also need to consider how many, how much work have they've done so far since they started. We can keep track of how many items they've already locked, like acquiring locks is not free. So if they've already acquired like 20 locks and need one more, the other transactions will inquire one lock, then we might be better off killing off the guy that only has one lock. Then there's also keeping track of those cascading aborts. If we kill a transaction and a bunch of other transactions have tried to commit and read those changes and they're waiting to commit, if we abort one transaction and it has to roll back a bunch of others, then that's problematic as well. So again, you see why this is super complicated and this is why again the enterprise systems will do a much better job than the open source ones, right? The other thing we need to also prevent is make sure that if a transaction gets restarted over and over again, we wanna make sure that we, at some point it gets priority, it can run so we avoid starvation. So any questions about deadlock detection? Again, separate background thread, builds a wait for a graph, looks for a deadlock and then kills things. And actually related to this, how far you actually wanna roll back will be, most of the time you're gonna do complete rollbacks, you can do what are called save points, I can demo that maybe next class, but there's a way to like when I run my transaction, I can do a save point and say, okay, if I had to roll back, roll back to this point here. And that voice happened to go back from all the beginning. But again, this is something you have to explicitly put in your application code. The database doesn't, database doesn't wanna do this for you automatically. Okay, so let's do a quick demo in Postgres and MySQL. Let's start with MySQL first. Yep, sorry. So believe it or not, MySQL follows the textbook better than Postgres just for this. All right, so I have a simple table called transaction demo. And it has two tuples. ID one, ID two with the value 100 and 200. Remember the bottom one up. So the first thing I'm gonna do is gonna tell MySQL to turn on deadlock detection. So it's on by default, I just wanna make sure that is on. So set command basically tells MySQL to set a configuration parameter. And then NODB is the default engine storage manager you get in MySQL. MySQL is meant to be this pluggable system where I could replace a B plus three storage with a log structure storage and so forth. Mostly what they're using MySQL, they're using NODB, it's the default one. And it's pretty good. And then what we're gonna do is we're gonna set the wait timeout for transactions to be 50 seconds. All right, so we're gonna start a transaction at the top. Start a transaction at the bottom. And actually, well, I haven't started yet. So this is me telling what isolation level I want because by default you don't get serializable. So we're gonna tell what serializable isolation. So now I start a transaction at the top, start a transaction at the bottom. The transaction at the top is gonna update the first tuple and just add one to it, right? So it requires the exclusive lock on that. It can make the change, no problem there. The guy at the bottom is gonna now do a update on the same table, but it's gonna acquire the lock, all right, oops, sorry, that's weird. Okay, yeah, so I actually do this query. It's gonna do an update on ID equals two. And again, it requires exclusive lock on that tuple. The top guy has the lock on one, the bottom guy has the lock on two, right? So no problems there. So now the top guy is gonna try to do an update on ID equals two. What's gonna happen here? Blocks, right? Because the bottom transaction holds the lock on two. So now the top guy, the bottom one's gonna do a update on one. It has to wait, actually, the top guy holds the lock on it already. So in this case here, my SQL kicks off, it says, aha, I do have a deadlock and it kills this transaction. And this one up here, as soon as the bottom guy got killed, all the locks got released, and then this one immediately then got the lock on ID equals two, the second tuple, and then it can actually run. So now you also, too, you see the wait, the execution time of this query was eight seconds because it waited eight seconds to acquire the lock. It got stalled once the lock got released, then it was able to acquire it, right? So at this point here, I get the advice, as a deadlock found when trying to get lock, try restarting transaction. I think I'm still inside the transaction, though. Right, I'm still inside the transaction, but you see, why do I want to do that, sorry. But you see that I updated, this transaction added one to ID equals two, but after I got aborted, my change got rolled back, right? And if I run that query at the top here, oh, you know what it is? Maybe it's because of this, let me do that. Sorry, all right, there we go, awesome. So this question is, why am I able to get the, why am I able to run the select here when I shouldn't be able to see, I shouldn't be able to get the shared locks on these guys up above? Because I'm no longer in a transaction, and any query I issue is in a one query transaction. And if you remember before when I started the transaction, I said it's serializable, yeah, so I shouldn't be in serializable as well. This is, it's multi-versioning. So I can see the version of the tuples that existed at the, when my transaction starts that have been committed. This guy is not committed yet, and with multi-versioning, the writers will block the readers, so I can still read the data. This is what I'm saying, multi-versioning complicates things, so if it's single version, then yeah, you'd have to quite a share lock and it wouldn't work. So the guy up above, we see all his changes, the guy below doesn't see his change, didn't see the changes yet. Now when I commit, all right, and I run this query here, now I see the changes, right? We will cover this next week. Yes, they're basically, they're keeping, yeah, they're keeping a version, and there could be multiple version numbers. We'll cover that, yes. This question is, is there a reason why serialized are not the default option? Yeah, because it's slower. And most people, it's one of these things like, like you probably don't, probably don't, it doesn't matter, right? Yes. Previously, you seem to have been able to detect the depth. Obviously, I set the, I set the lock weight time out, right? So watch, if I can turn off delock detection, right, so let's do that. Set this off, set this off again down here. So this set command is setting on a per session, per connection, right? So I can run the same experiment before. So actually, let me set the time out to be 30 seconds. I should be fast enough to pull this off. So it'll time out to 30 seconds. The basic means that like, if I try to acquire the lock and I can't, after 30 seconds, I give up, right? So now, I start this transaction up here. He starts up down here. I try to do an update on one. That's allowed, that can happen. I do update on two. That's allowed to happen. This guy tries to update two now. He stalls. This guy has to update one. He stalls. And then after 30 seconds, it will say, okay, well, you guys can't acquire these locks. The deadlock detector is turned off. Someone's gotta die, right? In this case here, I actually don't know. I have no idea what my SQL do here. It may kill both of them or it may decide, actually, it may kill the top one first because it's gonna time out first. And then that one, in theory, should then get released. So let's see. Session of words, other than other. So the statement is, they both stalled. I don't know what, yeah. All right, so the first thing is, why is data protection enabled on a per session basis? You think it would be this global thing. Actually, I don't know the answer to that. Maybe the double check. It may be the case that it is a global flag. Sometimes the nobs are per session, sometimes that would be global. And I have root permissions of this, so I might be globally setting, I don't know. All right, so this is so unexpected. So the first guy, they both time out, but when this guy timed out, it didn't automatically release the lock that this guy was waiting for. Did it maybe not roll back? Yeah, it may be only when I roll back that it would actually, it would get released. But at this point, it's too late. Yeah. All right, so let's look at Postgres. Again, same thing, table with two, two tuples, all right? So we're gonna set the deadlock time out to 10 seconds for both of these. All right, so now, again, another example where the SQL syntax can vary, all right? So I can't, there's no set transaction level in command in Postgres. I have to say begin transaction and then I set the isolation level, right? Which doesn't, my SQL does not support this. Oh, I'm already in transaction. All right, so now we have two transactions running. Same thing before. Top guy is gonna update one, bottom guy is gonna update two, top guy then updates two, stalls, bottom guy tries to update one, stalls. And there's no deadlock detector like in my SQL. It's gonna rely on time out. So in this case here, it killed the first guy. And at the moment that this, it killed, broke the deadlock, then this guy was able to acquire the lock here. This case, well, detecting a deadlock. So in this case here, it might be, like, how does this, well, I said deadlock detected. Maybe the case that the time out specifies that like, okay, if I can't acquire the lock right away, then go run the deadlock detector and figure out what actually happened. I think that's what it might be doing there. The other thing to point out too, I don't know, laser pointer, this thing it says, I was waiting to acquire the share lock. Again, this is where like the textbook says one thing and then in real systems, they do different things. The, what we call a share lock is actually called an access lock in Postgres and a share lock is a higher one. It's basically trying to acquire a read on the tuple, try to acquire a share lock on the tuple to read it, then determine that I then need to update it and then we'll upgrade it to an access lock. The great thing about Postgres is actually you can go, you can actually see where the deadlock is. So they maintain a, you can actually run queries on the lock manager table and see who's waiting for what lock. So if I can get this to fit, we're on the same demo as before, but now I have a third panel. So we do the same thing. I gotta roll back my transactions. I think Postgres supports, I know the only support rollback. I don't know, they support abort and rollback. Okay, so we'll start a new transaction. Start a new transaction down here. This guy updates one, this guy updates two. Top one then tries to update two stalls. This guy then tries to update one, he stalls. But now in this other terminal here, I forget it before it times out. Oh, I didn't make it, sorry. What's that? Yeah, let me do that, sorry. But let me copy and paste. So we're gonna run this query. I didn't write it, it's only from the internet. It's basically going to query the lock table and show you who's waiting for what. Oh, I should also show you that like, we can get the, it's not a function, sorry. We can get the process ID of the worker thread in Postgres using this command here. So this guy is, what are it, 865. This one here is 674. And we'll see that when we run the, look on the lock table, because it'll tell us what our process ID is. All right, so let's set the time out to, oh, 10 seconds, that's not enough. Let's do 30 seconds. 30 seconds down here. Okay, so we'll start a transaction. Start a transaction. I think I'm in a transaction. Oh, all right, well that's good. All right, so start a transaction down here. All right, so now I'm gonna update one, update two, update two, stall, update one, stall. Now over here, I run this query and Postgres is gonna tell you that here's the process ID of the worker that's blocked. Here's what they're blocking on. Here's the other process ID and then here's the query that blocked, right? So this is basically what the lock manager sees, right? This says, I know I'm trying to acquire this lock on this other, and it's being held by this other worker thread. And then at some point after 30 seconds, it should time out, they get released. Now if I run this query again, you see that there's nothing in my table. Is this clear? All right, we'll play with Postgres and my SQL up more next class when we do isolation levels. Is that you can start seeing how, you visually see how the anomalies occur. All right, so let's talk about deadlock prevention. So I would say deadlock detection is, as far as I know, more common than deadlock prevention. I think in the commercial systems, like SQL server, you can turn this on. I don't think it's on by default. I might be wrong with that. So the key to my deadlock prevention is that we're not gonna require a wait for a graph. We're not gonna require a background thread to look for deadlocks. It's at the moment the transaction tries to acquire a lock, we decide whether, and that lock is being held by another transaction, we have to make the decision at that point, whether we allow it to acquire the lock, allow it to wait to acquire the lock, or we shoot it in the head, steal its wallet and kill it right there, right? So there's two approaches. There's wait die and wouldn't wait. So the first thing that they do is they're gonna require, give transaction IDs to all the transactions when they show up. For simplicity, we'll just say it's some timestamp, whether it's wall clock time or a simple counter, we will cover that when we talk about distributed databases. But for our purposes, to assume it's a counter, every transaction shows up, you increment a counter by one, and then you sign that transaction ID to the transaction. So on a wait die, the old is allowed to wait for the young. So if now a requesting transaction shows up, it has a higher priority than the transaction that holds the lock that we're trying to acquire, and we're older than them, they are allowed to wait. Otherwise, the requesting transaction has to get abort, has to abort, right? So we kill off the young right away. If it's wound wait, then the young is allowed to wait for the old. And so the requesting transaction shows up, if it has a higher priority than the holding transaction, then we're allowed to shoot the holding transaction, steal their locks, and then we start running. Otherwise, we wait. So the high little idea what's happening here is that we are doing one or the other. So we're guaranteeing that we would determine the priority always in the same way for every single transaction. So we would never have an old transaction waiting for a young transaction, or a young transaction waiting for an old transaction like in two different directions. It's like the latch crabbing, you only go in one direction and you avoid deadlocks. So really simple, we have two transactions, T1 and T2, and so if T1 starts here, so if begin starts before this begin, T1 has a smaller timestamp than T2, so assume it's one and two. So since when T1 tries to get the exclusive lock on A, that's being held by T2, if it's wait and die, then T1 is allowed to wait, right? Because if T1 is a higher priority, then it can wait for T2. Or if it's wound and wait, then T1 is allowed to kill T2 and steal its locks, right? Because then there's no deadlock, right? Because you just take their locks. In the second example here, T1 has a higher priority because T1 is less than T2, and it holds the lock on A. T2 comes along, wants to get that lock on A. If it's wait and die, T2 immediately aborts because it says T2 is less than one, so I'm not gonna get it, so I kill myself. Or it's allowed to wait, right? And again, just guaranteeing that the precedence is always occurring in the right order. Yes. This question is, does it always have to be timestamp-based? Can it be something else? But how would you determine priority? This question, is there another way to determine priority for data prevention without timestamp? So the timestamp could be a logical counter, it could be the wall clock time, but you have to deal with leap years and all that crap. There are, again, distributed cases. You can have hybrid locks where it's a logical counter plus some kind of wall clock time. But timestamp is the right thing. Yeah. I can't think of, again, you wouldn't know how to do this. In this example here, I can't consider, I can't consider what the lock's being held by any transactional, how much work they've done. It's gotta be entirely based on timestamp. So in this case here, maybe if its weight die for this example here, T2 would abort, but maybe T2 has done a bunch of stuff and never gotta roll that back, you don't consider that data prevention. In deadlock detection, you would. That's typically why it's the one that most people implement. Okay, so why do they guarantee no deadlocks? As I said before, we're only one direction, like old and new, old and young, young to old, is allowed to wait, otherwise we abort things. And then when a transaction restarts, we wanna make sure we give it the same timestamp so that when it comes back around the second or third time, it gets, it actually, at some point, it'll be the most highest priority and it'll be allowed to run, right? And that avoids starvation. Okay, so everything we talked about so far is about sort of one-to-one mapping between one lock goes to one object, but I didn't say what an object actually was yet. And obviously in a real system, and we showed post-customized SQL, we're acquiring locks on tuples, right? But now, again, acquiring a lock is not cheap because we gotta go to the lock manager, that's gonna be a hash table, we have to maintain latches, protect that data structure, right? It's not a magic wand, it's not like a latch, we do compare and swap on a memory address and you acquire or you spin, right? It's a lot more complicated. And so if we now wanna have a transaction that wants to do a lot of things, like update a billion tuples, then if we gotta go acquire a billion locks, that's gonna be really expensive. So we need a way to maybe have larger granularities on our lock so that one sort of higher level lock can acquire implicitly a bunch of smaller locks. And that'd be way more efficient and that's gonna allow us to have better parallelism as well. So this is what lock hierarchy, the lock granularity allows us to do, right? So the goal is gonna be that we wanna have a transaction acquire the fewest number of locks that it has that it needs to actually do whatever it needs to do, but also potentially maximize parallelism. So there's gonna be these trade-off to be the overhead of, if I have really fine-grained locks, then I can have better parallelism, but I gotta go to the lock manager a lot. If I have coarse-grained locks, then it's super cheap to go get that one coarse-grained lock, but now I'm preventing other schedules or execution orderings that could have had, then that could have occurred. Like MongoDB, when it first came out, they had a single database lock, right? So even though I may, well, I could only have one thread update the database at a time. And obviously that's, it's simple to implement and it'd be efficient to go acquire that lock, but it limits parallelism. There's a bunch of reasons we can take offline or why they did that, right? So the days lock hierarchy is gonna look sort of like this. At the very top, you have a database, then you have a database has tables, tables have pages and pages have tuples and tuples have attributes, right? So now if my transaction with T1's come along here and I wanna update all the tuples in table one, then maybe I'll acquire a lock on table one and then it implicitly acquires locks down below, right? So as one requests the lock manager to go get the lock on the table rather than all these individual locks down here. So in practice, the table-level locks and the tuple-level locks are very, very common. If somebody's implementing two-base locking, chances are they're supporting these two double lock hierarchies. Slightly, also very common but not as common as tuple and table-level locks or page-level locks. So Postgres doesn't support it, but MySQL does and other systems do. So it's common but not everyone does this. Database locks are rare. Some systems will let you do this and you wanna use this for like, when you're doing big bulk changes or something, like take a lock on a database. Attribute-level locking is super rare. I only know of one or two systems that support this. Yougobite announced they support this now. Like, I can get a shared lock on the tuple and then I can lock individual columns in my table. Most systems don't support that though. So, yes. So, same as in this example here, when I get a lock on the table, am I, does that implicitly acquire the locks on everyone? Yes. So you don't actually- I wouldn't have to go to lock manager for like 2.1, 2.2, 2.3, right? Yes, the same as- So does everyone have to follow this hierarchy? Yes. You'd have to make sure that as you go down the tree, can I acquire the locks at the right level of the tree? Yes. Yes, but we'll- Yes. That's fine. We'll handle this, yes. So obviously just having shared an exclusive locks would be, would really limit parallelism because it's you'd have to like, if I want to acquire a lock to a tuple, I'd have to maybe traverse down that lock hierarchy and figure out, can I get all the locks down into it? So we can use attention locks as a hint to say, here's what's occurring below in the lock hierarchy tree. And that can then infer whether, determine whether you know, are you gonna have problems when you get to the bottom or will you be able to acquire everything you need? Right? And this sort of handles the problem that he brought up, right? And the intention locks are gonna be more compatible in some cases than sort of regular shared exclusive locks. And when I show those screenshots of the documentation for Oracle and DB2 in the beginning, this is what the extra stuff that they're adding that I didn't talk about in the very beginning. They're adding these different types of attention locks. In addition to like DDL locks and maybe access locks as well, but attention locks, if you have lock hierarchies, they will support something like this. So there can be three types. So we'll have attention shared, attention exclusive and shared attention exclusive. So attention shared and attention exclusive just means that like at a lower level in the tree, at some point in the hierarchy, someone's gonna have explicit locks for either shared or exclusive, right? And so you would know that if I wanna potentially lock the whole, take an exclusive lock on an entire page, but someone has a shared lock on a tubeless on that page, you would know that because you would have the attention lock to tell you what's happening. Shared attention exclusive means that you're taking a shared lock on everything below you in the tree, but then at some lower node, there's also gonna be individual exclusive locks. Why is that allowed? Because someone might have an attention lock to say up above, right? And then the, well, yeah, so it's the next slide, right? So it would not be allowed to be incompatible, right? So like if I wanna, if someone's updating something down below and I'm trying to lock the whole thing, if I try to take shared attention exclusive, I would be, it would be not denied, right? And this would be denied as well, right? Like sorry, shared lock would be incompatible with shared attention exclusive. And again, we're doing this with two phase locking. So like my example before I was showing you like, you know, lock A, you know, lock B, like now you would say lock A in this, in one of these modes for this part of the hierarchy. All right, we only have two minutes left, so I'm gonna stop here. I'll come back to attention locks next class, but I'll quickly, let me talk about locking in practice and then we'll come back to attention locks because you'll need this for project four. So I'll spend more time on Thursday. So as I said before, most of the times you don't acquire locks manually, but in some cases you can do this and you can actually provide hints to say, hey, I'm gonna read something, but I'm gonna write it right afterwards. Go ahead and put it in exclusive mode rather than shared mode. So this is where like the SQL standard says one thing and everyone does something completely different. So to lock an entire table, there's a bunch of different ways to do it. Thankfully Postgres and Oracle DB2 have like a lock table command in a particular mode. My SQL has the same thing, but you can't put in before like you can with the other ones. And then SQL Server has this other random thing as well, right? Lock table is not that common. Typically you do this for, again, making big bulk changes of things. What is more common and does make a huge, huge difference is select for update. Again, I don't, I forget whether this is actually in the SQL standard or not, but the idea here is that I can do a read and I can specify what kind of lock I wanna be in. So if I'm gonna read a tuple and then immediately write it, write, update it in my transaction, but like again, there's a round trip. I can do a select, go back to the application, then do an update, rather than putting a, you know, rather than getting a shared lock and then going back to an exclusive lock where somebody in between, by the time I do the read and then try to do the update, someone then acquires, you know, potentially, well, it wouldn't be an acquired exclusive lock, but I can put an exclusive lock on it when I do the read so that when I go do the update, I already have the lock that I want, right? And I avoid issues. So you just add this for update thing at the end. So a bunch of other tricks you can do, you can say like with for updates, you can tell it to like skip tuples that you can't acquire the locks for, that there's a bunch of things I can show next class. This is super common in applications and I highly recommend you write code that uses this. All right, so two days locking, you can do this on every single databases. It's gonna allow us to interleaving, without knowing the queries ahead of time. I wish I had to do Deluxe Detection, Deluxe Prevention and then we will show up, go deeper to hierarchy next class, okay? All right guys, see ya. ["Super Snakes"]