 Thank you DJ. That's great. So did you have any takers for your Tiki bar thing and the other performances that you had? Okay, okay, that's great. All right, so we are going to get started and needed to clean up a couple things that from last lecture remember this slide where we are looking at the Interleaving when we are thinking about schedules and we had this schedule and this schedule is actually a correct schedule Right, even though there is interleaving because T1 makes change to a Then T2 makes change to a by giving 6% interest T1 then comes back and makes changes to be so puts back the hundred dollars from a to b So now a and b the sum of that is consistent So now when T2 goes and add 6% interest to be this correct Even though there was interleaving that interleaving was safe And you ended up with one of the two legitimate ways in which to do this transaction Because you had the right final value in the database Okay, so even though this interleaving at first seems like oh it shouldn't work out if This b happened before this read of b then we would be off Okay, does that make sense? The bottom value will still be the same because this is the same as having run T1 followed by T2 And that's the whole point. This schedule is okay All right, so some meet the main point is some interleaving is okay Some interleaving is not and the whole purpose with what we talk about today is when to determine Is there a mechanism to go and determine how to make everything safe? Okay, the next slide was something that is not safe This is essentially T1 starts to make change to a Then T2 starts to make changes to both a and b and then T1 makes the remaining change It needed to make in its transaction to be and so bad value for a was seen by T2 and the final Result for the values of a and b doesn't correspond to any one of the serial schedules. Yep We talked about that last class, but let me go back to that right so the whole definition because it's important the whole definition of serial schedule is that if I've got these two transactions T1 and T2 and I can have a schedule in which T1 runs followed by T2 and I'll get a certain number or vice versa So here is the diagram which we will come back again in a different form In the in a little bit in about 40 minutes from now. Let me just go to that slide in which we had the serial schedule Skip skip skip Yeah, that's so okay, so this is the motion of correctness so the question is T1 followed by T2 there were two possible ways in which the database could end up with right and What we are saying is both of them are correct both and are correct because we are operating under the Serializable assumption that these are conflict serializable if we were to say strictly serializable that there's only one order T1 followed by T2 because T1 started first that would be a more restrictive form of Concurrency control and we wouldn't allow as much parallelism with transactions overlapping with each other So we are doing something that seems funny We are saying it's okay if T1 came first But the way in which the database state is looks like T2 ran all of it followed by T1 and that order is okay So we are accepting that Little discrepancy because we want to allow more parallelization If we didn't want to we could do something we can come up with a new concurrency control protocol in which a simple one Not a good idea, but a simple one that will work with that Intuitive notion is say if T1 arrives before T2 T1 locks the entire database and then T2 runs That was the default mechanism by which we would get this isolation property, but it would work Everything would seem logical, but you wouldn't allow enough parallelism in the system okay, and as I mentioned last time is Spanner has this notion where this T1 followed by T2 is feeling odd Right because we are saying a schedule in which T2 finishes and changes the state of the database and T1 runs is A fine final state for the database that seems odd because T1 arrives first in all the examples we have So Spanner has this additional notion of strict serializable in which it says if they arrive in a certain order Then we will actually make it more meaningful while still allow parallelism The protocol is really complicated and I flashed that slide at the end to say if you're interested Go take a look at that paper and there's all kinds of interesting stuff and cockroach DB is spanner like but doesn't require The atomic clocks that Spanner requires that they actually Google keeps the data centers Synced the clocks on each of those sync that allows them to figure out how much should I hold back a transaction So that I can know if some of the transactions started in real time before it so that you're done in the same time order okay, so it's complicated and Take the advanced database class for that But the confusion you're having and the questions that you guys are asking is like doesn't it seem feel weird? It does feel weird because we are operating under this more relaxed notion of Conflict serializer we are saying we'll allow these schedules because the Performance that we need is high enough and the changes semantics is reasonable that we think applications are going to be fine with it Yeah, so the question is what's the universe of solutions if there's a universe of schedules They will mirror that because conflict serializable will give a certain set of Potential solutions for the end database right if you have two transactions any one of those two if they are three the permutation of that So it also embodies what's final state of the database if you had concurrent actions. They they mirror each other If I understand your question correct It's about like what can you say about the properties of the final values in the database? They exactly are the definition of this. They're one and the same. Yep All right other questions on this before we move on to the topic for today We'll come back to this diagram in a different form. So let's Close this out and bring up the next deck all right, so today we'll talk about two-phase locking and the part that we will Worry about is how do we make all of these mechanisms happen? We talked about dependence graphs in the last slide But before we jump into today's topic Just there was a request for project three to push back the date because we were a few days out in getting that to you So we moved it back by seven days. Hopefully you don't need all the seven days But if you do you have it as I mentioned in the last class there are lots of moving parts in the project So don't wait still like two days before the deadline to do it It will seem harder than it is if you wait that long. Okay, knock out the simple things first homework for Got pushed out earlier today and that's due on the member talk right Okay, so let's jump into the topic for today We will concern ourselves with this conflict serializable class of schedules that we want to allow and Remember we also talked about view serializable in which if I knew a little bit about the application semantics I could allow a few more schedules that tends to be very hard to enforce and practice No one does that but the textbooks have it so that you just know what those things are It's good to understand That but we'll focus on conflict serializable and how do we make that work? So last class we said If you start to draw the dependence graph you see a cycle we are in trouble Right, but how do you draw the cycle and we were drawing the cycle after the schedule was done? So it's like is it too late? How do you make sure that these cycles don't form? aka give us this notion of correctness that we are looking for right when is it safe to interleave actions? so that's what we'll talk about and There are multiple ways to do it. We are going to talk about Locks as a mechanism to do that today and if you remember last class We said there are two broad classes of algorithms One is pessimistic and the other one is optimistic locks are optimistic the locks are sorry pessimistic They will try to stop a cycle from forming very early on and when we talk about optimistic Conferencing control protocols in the next class. We'll see they will do the opposite. They'll say let everything go and In the end I will check if something is unsafe. I know how to back out of the unsafe parts Okay, and locking says if two things are trying to conflict with each other Oh, they will probably do bad things. Let me stop them at first touch So that's what we will talk about today the lock based pessimistic Conquerancy control mechanisms So what does it mean? You've played around with latches which you did in your very first C++ assignment at that time You were protecting a data structure in memory. You put Mechanism the latch mechanism so that two threads if they're trying to make changes that would have destroyed the data structure Left it in a corrupt state that doesn't happen locks to the same thing but for database stuff and What locks the way you would do things with locks is imagine I have a schedule and reading and writing objects So here you see two transactions T1 and T2 and what we'll do is we will put locks Before we read or write an object. We will acquire the lock Before we make that read or write operation Okay, and locks will come in different forms. We'll get to that in a second But now just assume before we read a we will apply the lock and as you see transaction T1 as it goes a bit further down after it is done with all the stuff it needed to make changes to the lock It doesn't didn't unlock very much like what we've been using with the latches in In the data structures you are trying to protect and similarly T2 will follow that same protocol and so effectively there's this lock manager which maintains a data structure and it's going to get requests for saying Here's the object. I want to lock and as you'll see in a little bit It'll also have information about the mode in which it wants to lock exclusive and shared and saying here's who I am I'm transaction T and please lock this object for me We won't get into the details of what that lock manager looks like But it's often organized as a hash table on the object ID that is being locked and then it'll keep track of all the lock holders or Holders who are requests to become lock holders that have been put on a pending queue That's all I'll say about the lock data structure How to maintain that and do that efficiently is a super interesting topic if you are interested in those details We can talk offline. Okay, so it's a structure that's going to keep track of the locks and then at some point here For example that second lock request is to the same object a a is being held already by transaction t1 So that's what the lock manager will do. It'll say whoops. You made that request. I Can't grant you that request so transaction T2 gets blocked till that later point in time Where the unlock request from T1 comes in the locks now open and T2 can get that request So same thing as you have with latches locks are doing this here for database objects Okay, this is said for now You can just assume these objects a b and c are records, but towards the end of the class we'll talk about locking at different granularities and So on this transaction process and the lock manager will do its thing All right, so today we'll talk about these different lock types We'll talk about a protocol called two-phase locking and then we'll talk about some of the bad things that can happen When you do locking called dead locks and how we get around that and then time permitting We'll get to the hierarchical locking where we lock at different levels in the tree entire database a table a page a record or even a column in a record before we go if You've taken an operating system class You'll probably find the terminology over here confusing where when database folks talk about locks They're talking about things that you are doing to protect Objects database objects that sit on this and have a life span beyond its life in memory We call latches as the things that we want to protect in memory. Okay, so this terminology is super important So let's just go through what the differences are and for the longest time The database folks could never talk to the operating system folks because you'd get into in like 80s and 90s And you'd say I'm locking stuff and for them it was locking is the same as latching and the terminology was just way off But this is you know, even if you become an operating systems person try to use the better terms right because they're they're very different so Locks in our database concepts. They try to separate user transactions the T1 and T2 We are trying to keep all these concrete transactions from stepping on each other's toes Latches are trying to keep the threads that are executing on a common data structure in memory from stepping on each other's toes Similar things, but at very different levels in terms of what they are trying to do Locks are trying to protect the database contents latches are trying to protect some in-memory data structure The duration of a lock is the entire transaction Whereas the duration of the latch is just for the critical sections that the programmer writes Now he's starting to see some of the differences that are going to emerges We go further down the different modes for the locks often locks have Dozens of different modes. We'll talk about that briefly today Locks latches are typically simple read and write though latches have also started to become more sophisticated in many programming environments Right besides user and writes at different levels in there and we'll talk about that within the context of locks towards the end of this lecture Locks end up needing some mechanism when you get into trouble with locks that are acquired so that no progress can be made It's called dead locks in the locking scenario and we'll talk about mechanisms to deal with that Latches the way you avoid that situation where a threat has grabbed a resource another threat has grabbed another resource And now they try to ask for the opposite set of resource They will get into trouble the way you avoid that with latches Did you say I as a programmer will write the code so that it never happens? So it's the programmers responsibility Right database guys are nice We actually put into the system mechanism so that if that happens we can catch it and come out of that And typically when you're using latches in some C++ code, for example, you will say I'll always follow a sequence I'll do action a followed by B followed by C so that you can never end up in this deadlock situation But it's the programmers responsibility to go and do that right Okay, locks have these inbuilt deadlock mechanisms that allow you to get out of that trouble if you get into that And the mechanisms for deadlocks as we'll see a wait for and timeout mechanisms programmers have to do that In when they're using latches Locks the state for the locks who has what what's being requested by who is kept in the lock manager And for latches when you're writing that in your C++ code It's going to be some protected data structure, right? You might have a variable that you're using to keep track of that latch Object and that's what you do So latches are in memory protect threads from getting into trouble a lot of programmer Discipline is needed to make sure you don't get into trouble Locks are for things that are on disk So they get held for a longer amount of time if you remember the graph The chart that we had in terms of how much time does it take to? Reach an object on this courses in memory. There were many orders of magnitude So locks are going to be held for longer amounts of time and they have a lot more mechanisms associated with them like deadlock detection question Very good question. So what's the difference between the lock manager and the protected data structure? The lock manager that I mentioned, which is this hash table Will be written and protected by a latch. So now he's the lock manager that's protected by a latch Which is all compatible still lock manager is sitting in memory. It's a in-memory data structure multiple transactions are trying to Access it. So you'll protect the lock manager by a latch Which is an in-memory stuff. So they're compatible a latch the protected data structure The unit so these are not incompatible in database You'll actually see latches lock mangers a great example because that itself is latched And you probably don't want to put a latch on the entire lock table So you'll put latches on some chunks of it so that a single request is in block everyone because otherwise the lock manager will become the bottom They are protecting very different things so think about the lock manager it's protecting an in-memory structure And the lock manager is protecting the record a and record b that is in the buffer pool It may get evicted and go on this but the lock is still held till the transaction is done Okay, and now there's a related question is like when do these locks get acquired a latch when you acquire You explicitly as a programmer writing a code to say acquire this latch in this mode locks as you'll see get acquired Implicitly by the database system and you're in very rare cases explicitly as a programmer in SQL specifying grab this lock Okay, and the simplest way you can understand that if you're locking at the page level is Your buffer pool that you wrote Imagine the call to the buffer pool also said hey, please identify which transaction you are one more parameter And oh by the way, tell me whether you are a read lock or an exclusive lock in which case the buffer Manager call to the page can itself go and make the request to the lock manager on behalf of the transactions work So that is at the page level at the record level it gets more complicated But that's how a database is written is that at appropriate points where the data access happened the locking request will be inbuilt into that So you as a database programmer don't have to put lock request all over your code The right abstraction will take care of putting in that request Okay, and the page level stuff is the easiest to understand because you guys have written a buffer pool So you can see how that would work Okay, record level would go a little bit below at the access path for the record when you get to the file and As you're writing all this insert calls and other stuff in this project You start to think about it like oh if I had a transaction management system Which is by the way coming as the next project where would I put it? It would be at one of these levels, okay? Other questions All right, so we already started to allude towards that locks come in at least two flavors There are many more a shared lock that says I'm reading a someone else wants to read a I don't care so a shared lock is compatible with another shared lock and This is the compatibility matrix and as we'll see sometimes a lot more lock types besides just shared and exclusive And so this compatibility matrix can get bigger and all this is say says is a shared lock is compatible with the shared lock Exclusive lock if someone has it on an object you can't get another exclusive lock on that object Yeah, you can't get another shared lock on that object. It basically blocks everyone when they're waiting for it. Yep Yeah, we'll come to that just hold on to that. It sounds bizarre that there are other kinds of locks There are all database systems have at least a dozen different types of locks to allow even more Parallelism than what we will allow so first we'll just work with these simple locks and make everything safe and Then we'll very briefly talk about multiple different types of locks and there's a vast amount of material in that so again This is like a plug for the advanced database class where we can go into that but it's to allow more parallelism So let's work with just these simple locks and try and make it safe first. Okay That's a great question. Why many locks their systems that have over 50 different types of locks and The compatibility matrix which we like that to allow more parallelism It will allow more parallelism right and I'll get to the intuition of that with the exclusive locks in a little bit It seems like I don't think there should be additional lock types, but here's an example the Different systems over here They will all let some form publish some information about the different lock types that they have and what you saw flash up on screen Maybe I'll just go back to it and let it come back again Is the compatibility matrix of things that have been disclosed and as you can see there are tons and tons of different Lock types. It's just not a two by two matrix. It's a much much bigger matrix Okay, and you can take a look you can actually find post-press this information pretty readily I have another screenshot coming in the last slide But the manual is there and you can start to see some of some of that lock modes in there All of that is to allow more parallelism and make everything we talk about today also safe, right? So But the first idea is like how do we work with locks correctly? So let's start on that topic first. Okay staying with just two lock modes now. All right, so we will have transaction request locks and This is also notion of lock upgrades, which you are basically going to ignore in this class again The advanced database class covers that Because sometimes like if you take this buffer pool if you're locking at the page level you ask a request for a page in a sequel query and You know in an application that is a sequel query that just scans all the records And then maybe in the next query in that same transaction it says I'm going to update some of the records I saw right so grab all the student records Oh, and for the students that Have a bee make it a bee plus, right? So it is only going to update some of the records that it scan in the first one So the first right first first query. It's all part of the same transaction May just issue request to fetch in pages to the buffer pool if you're doing page-level locking and just grab read locks on it But then some of those pages in which the records of interest are present You want to go and make it a right lock an exclusive lock? So that's called a lock upgrade when the query is running. You don't know what's gonna happen next When you are creating that operation to read the page, you know what you want I want to read it because I'm a select query the next one is an update grade now It needs to update stuff, right? So there are this notion of lock upgrades, which is I as a transaction Can have asked for a read lock by the way. I'm not asking it directly. Someone's asking on my behalf I'm just disclosing what my operation is. I'm reading a file and the buffer pool in this case could be asking for read level page locks And the update transaction says I'm getting this page from the buffer pool But oh by the way, I'm gonna change stuff on it. So grab Exclusive lock at the page level. So in the same transaction First a call might come back to that same object a for a read request For a read lock and later on it may get upgraded to an exclusive lock So those upgrades happen everything we talk about works with that scheme There are few more complications to deal about we will largely ignored that but just wanted to know that these upgrades happen And the lock flow the request is getting made on behalf of the queries that are being sent at some appropriate abstraction in the database engine Okay, the database programmer is not writing explicit lock calls everywhere I'm lying a little bit when you have these dozen different lock modes in the B3 code For example, you start to see the B3 programmer will have put in a bunch of explicit lock calls So ignore that piece right when you get into multiple lock modes Then you start to see lock calls start to show up in specific place and might say, you know Exclusive lock is too strict here Put this weaker form of exclusive lock because I know just what I'm doing and you already have some intuition for that type of stuff Right when you did the lock coupling in the B3 it was like it was like latches at that time But you could see how you could let go of something at some point and be okay with being quote unquote unsafe Because you know semantically all you care about is the structure of the tree So you didn't need to hold on to things for The whole time so it's tricks like that that get played around. That's how you get this explosion of lock modes Okay, but again for now We'll just talk about locks in that regular sense and lock upgrades happen lock manager will either grant the request or not grant a request we'll play around with just these shared and exclusive locks and when an upgrade When an up when a request comes in it's gonna update this hash table structure Which is protected by a latch to keep track of all the requests that have been granted and all the requests that are pending All right, so now let's go back to our example We now have two different types of locks So on transaction T1 it's not making just a lock request It's gonna say I have an X lock request because I'm making changes to it I'm an update query and then it unlocks it similarly transaction T2 has an X lock request The second request from transaction T1 is a shared lock request for a because all it is doing is reading it So now a protocols is slightly different We are not just going to say lock unlock But we also going to say which type and then reads a we shared locks are compatible with shared locks and the lock Manager is going to make all that grant and blocking requests. Okay, so pretty straightforward. No magic so far All right. Yep question We will do that in a second we will so right now this is not correct So if that's what you're thinking is like a did locks make it work? That's exactly what we are saying. This doesn't mean if you just follow that protocol I told you which is not yet a protocol right just an idea saying lock stuff It is not correct and we will make it correct in a second So over here is exactly the scenario where we have this lock request But see this unlock for object a happened in T1 and then that got read that got written by T2 and then T1 reads back What T2 wrote so obviously now if you draw the dependence graph we have formed a cycle, right? there's a right right request and then there's a right read request and We have a schedule that is not Serializable right that is not conflict serializable. So this is a bad schedule. We don't want to allow this This is leaves the database in an incorrect state. Okay, so we want to stop that So now locks are not enough, but they're necessary We need a little bit more and we need this thing called two-phase locking Okay, and this was a breakthrough when the whole theory of two-phase locking came about Jim Gray and bunch of IBM folks invented that and You know Jim Gray got a Turing Award a large part of that was based on this one paper In which he talks about the whole theory of two-phase locking and the strict two-phase locking Which we'll talk about in a little bit and lock hierarchies and all of that are the fun stuff so two-phase locking is a Protocol that we need to start following to make locks actually work to give us this Serializable schedule semantics that we want Okay So we'll do the following we will acquire locks. There was no problem with that so far We must acquire locks before we make changes to the objects but the problem was when we acquired the locks we let go of it too soon and Others could start seeing incorrect data The problem was not that we have these locks in the wrong place the problem is this unlock of a happened too soon and we could start to see this The transaction T2 could start pick up changes of the transaction that has not yet committed But so that's what two-phase locking is going to solve and saying we'll break up all requests from a transaction in two phases One is a growing phase in which they can keep making requests to the lock manager I as a lock manager is going to keep track transaction T1 has made a request making more requests stuff like that The minute transaction starts to make an first unlock request I'm going to throw a flag in my lock manager and say T1 has left the growing phase is now in the shrinking phase It's starting to release locks and two-phase protocol says Once you go into the shrinking phase, you are not allowed to ask for any more locks If you ask for any more locks The lock manager will reject it. You have to abort the transaction and start all over Okay, so why does it work first? Let's see. What does that mean if I? look at The life of a single transaction, which is what is shown here and this is a single transaction with time on the x-axis Y-axis is the number of locks as you can see the first phase where it is growing the locks It's acquiring locks acquiring locks, you know, maybe that read request has come in the select query And it reaches a certain point and then it stays with all those locks and then starts dropping So the shrinking in this phase over here and once it shrinks it cannot go back up and start to acquire new locks Okay, so for in this case this point which is The point at which it had grabbed all the locks that needed is called the lock point and It's this magical point which I've come back to in terms of what it rear what what it means in terms of the Independence graph that is getting induced behind the scenes So everyone with me so far as to what the protocol is for two-phase locking Right, so we can go up and then start to come down. So it's got to look like a mountain with no valleys It cannot have this valley Right, it cannot be I start reaching that lock point I start dropping locks and I start to go up again cannot do that That will cause a straw cause cause trouble and the intuition is the is the following. This is a two-phase 2pl violation two-phase locking violation the intuition which I hope you can work out by yourself is why does two-phase locking work? It's because if I've got two transactions I'll go back to this slide if I've got two transactions like this is one transaction another transaction will have its own curve, right? The transaction that reaches the lock point first is the one that will be The first transaction in that final serial schedule So anyone who gets to all the locks they need that they want to conflict with so it's like who reached the top of the mountain first I'm before you that's going to determine the database Update in terms of that serializable schedule Okay, so that's the intuition and that's why it works Okay Question yeah, yeah, so the that's a great question So the question is how are we going to allow T1 from reaching the lock point first? We are not going to Here's what's going to happen T1 and T2 come start doing the work as we look at the protocol in a little bit as soon as T1 Either one of those T1 or T2 either one of them reaches the lock point first That will be the order if T2 reaches it first see imagine T1 and T2 are running in two threads and They keep context switching or something like that, right? And if T2 reaches the lock point first it will be like the final state of the database is T2 followed by T1 So we can allow any arbitrary interleaving now and as long as we follow the two-phase protocol Doesn't matter whoever reaches that first is going to be ahead in that CDL schedule Okay, and that's the beauty is right. We don't have to do anything else with the timing Let the threads all run with each other. Let there be hundreds of threads they can all compete for They can still go after the same database as long as they all follow the lock protocol and two-phase locking life Is going to be good Okay question Yeah, we'll get to that that induces a deadlock and we'll talk about that it won't prevent a deadlock And we have mechanisms to deal with that in a second So hold that question if they go in opposite order There's a case where two-phase locking can still end up with a situation where no progress is made That's called deadlocks and we'll break it Okay Good good thinking question You yeah, so the question is couldn't you acquire the order the objects in a preset order you could but it may get difficult imagine I've got a table R and One transaction is doing a file scan on it. The other one is accessing it through an index I don't know what the order is going to come from the index could be unclustered So I can't always because we want all these access paths for efficiency. We can't induce the order in a very strict way Yeah, and that was where in latches you said the program has to do that But if you ever write a complex C++ application with latches You find it's very hard to get all parts of the code to follow the same order in which they go through data structures And all kinds of crazy bugs pop up Okay Great question. I like how you guys are thinking. Yep That is correct So the question was can we say that as long as everyone follows two-phase locking the final schedule is conflict serializable? Yes, and It's also yes that we are not guaranteeing anything about deadlocks deadlocks can still happen We'll find a way to break it Very good. Yep, exactly right. Yep. Does this sorry? Which one is not shrinking this this is just a flat line saying all the locks are now being held for a little while Is that what you mean? Yeah. Yeah. Yeah, so I scanned my file I acquired all my locks now. I'm just adding all of them up because I have an aggregate stuff So it's flat then I start to drop the locks So it's just to show that the lock point is when you reach the peak after that if you're plateauing out because you're holding The locks while you're doing something the lock point is when you reach the peak So it's a very precise definition of when that event happened Okay Yes, absolutely that that will still be fine So as long as I'm going to a peak and then dropping it's fine No, it's not bad in some way. In fact, yeah, we'll see that we will need to do something like that a Weak a version of that to make a certain problem in this go away. Okay There's still one tiny problem in two-phase locking. So wait for two slides other questions Because if you get this intuition then you have the foundation to Understand 50 different lock modes and all kinds of other crazy stuff But if you don't get this intuition, there's no hope of getting to something more sophisticated All right, so two-phase violation peaks bad, right? No valleys So now let's go back. This is going to be very trivial looks like all of you guys got it You'll acquire the lock that will get granted now when you get that second request for the x-lock it will get pushed down so that you won't be granted that request and You are you will start to see some of that issues What's happening in the dependence graph? That was the question that was asked, right? The first x-lock is for the transaction see the dependence graph We have to T1 and T2 the second x-lock said I can't allow you to go any further It's basically saying if I let you go forward I will form an arc from T1 to T2 and I'm gonna kill that I'm not even going to let that happen if I don't allow that to happen there is No cycle to be formed because I'm allowed disallowing arcs from happening Right and so it's a pessimistic form, right? It's pessimistic because maybe just one arc is okay The cycle has not been completed and so we'll talk about the optimistic stuff later But that's kind of the intuition and connecting back to where all this dependence graph is You know when it gets and why that makes sense and why locking works is because it's breaking these arcs as they are getting far Let's say I have through these guys Let's say the guy on the right right after the x-lock A then x-lock B Right, so it won't be But x-lock A won't be so the question is after x-lock A T2 has an x-lock B But x-lock A won't be granted so it's gonna be waiting. That's the way. Yes T1 isn't using B So we could theoretically if the B-step is independent we could be using it. Yeah Yeah, so hold on for the deadlock stuff Yes, but if you did that and if T1 wanted to use B later that could be a Difference so your question is oh if T2 wanted A and B Could it say that I tried to get a but it is locked. Can I keep making progress with B? It can and there are sophisticated protocols I'll do that and many of these different locking modes will say do you really want an x-lock right away Or do you just want to check that you can get it and stuff like that So we'll get to more sophisticated protocols most of it in the advanced class But we'll allude to some of those techniques like that towards the end of this lecture. Yeah Yeah, absolutely you can do all kinds of that and we'll talk about the new lattice of locking where you might say if I've got I'm scanning a file with a billion rows Aquating a billion locks is very expensive. It's more expensive than reading the record. Can I do something better? So hold on to that Okay, great. All right, so basically all this is saying is that this is why locking works because we are trying to break these arcs Bring from from the dependence graph that we talked about the dependency graph that we talked about in the last class. Okay, so Two-phase locking works, but it has one problem that problem is called cascading a box remember databases start a transaction with the begin statement and then Transaction could end with a commit which means make all the changes and make it permanent What could end within the board saying whoops undo everything? It is the and you want to do this all-or-nothing component with transactions But the point is a box can also happen transaction could get aborted and now in the presence of a box You start to have this situation where even if I'm falling two-phase locking protocol I have a Hole that I've dug myself into and I can't get out of it. What's that hole? Let's illustrate that with an example So here is a schedule in which transaction T1 starts it's following two-phase protocol T2 is also doing the same and Acquires its lock then unlocks it because it doesn't need Any of a anymore, right? So it reaches its lock point and now after unlock all two-phase locking says you can't do any more locks But get it's still continue to do work on B because it hasn't unlocked B T2 gets the lock for a because that was released Starts to do its work But later on T1 proceeds and says oh no Something's wrong. I need to abort now They're all following two-phase locking But T2 has read a value a that has to be undone. So it's read a dirty value. It's a dirty read now We can't unwind ourselves from this even though we are following two-phase locking We're in trouble when we reach the abort call comes to the database. What are you going to do? You have to now undo T2. What if T2 was already committed and had already paid out that $25 you can undo that So we can't let T2 go If it is has got a dependency on T1 and T1 is not committed So how can we make two-phase locking safer to this type of problem? It's already safe for serializable purposes Okay, but now it it still will require us to deal with this cascading abort problem because transactions can abort Everyone does that make sense? Okay, so the way to make it safe is To do something a little bit more with the two-phase locking, right? So this permissible schedule we want to make it unadmitable in a new protocol and The protocol is called strong strict two-phase locking. The textbook also calls it rigorous two-phase locking And if you understand this lock point business that we talked about Right. So if you go back here, can you think about what you would do to this graph to make this problem better? So this is the correct protocol, right that we've been following so far, but cascading abort happens What can you think of something to do to the shape of this graph? To stop cascading abouts. What was the root cause of cascading? Yep Correct release all at once and when do you release it? Exactly. So what if we said you climb the mountain as a plateau and then it's a cliff Which means you never unlock early. No one can see that a transaction the air object that was grabbed Was basically see now if you had come up with that in 1970s, you would have won an award. So But it was very hard at that time to just get your head around all of this stuff It becomes easy because now I've shown you the graph This is always the case where when someone explains to you something simply you say oh, it is trivial But if you didn't have that it just took a long time to conceptualize that an abstraction in that right way Is the art of coming up with breakthroughs? So this is what we do The problem we ran into is unlock of a allowed the other transaction to come in who was following two-phase locking and start to see bad stuff But now if you do this strict strong strict two-phase locking you say climb the mountain It's a plateau all the way through and then in the end drop all the locks simultaneously And just a little tidbit obviously won't drop all the locks Simultaneously it'll be drop get dropped one at a time multiple operations are happening to the hash table So it's a little bit of semantics, but even though it takes a little bit of time You can commit the transaction at that cliff as we'll talk about durability You'll write that commit log and then you can start to draw So the there's a different type of mechanism to make that peak look like a peak like that sharp edge look like an edge But ignore that for now, okay? Obviously if you grab a billion rocks you can't make a billion operations to the hash table and instantaneously, right? It will take some time take some CPU time to make that happen okay, but Now hopefully this intuition makes sense this way whoever reaches the lock point is ahead in the dependency graph that you draw and No one can see changes made by a transaction till it's committed or abort it So no dirty reads will get passed across and the later abort doesn't cause you to unwind the transaction Whose state might have already been determined questions Yeah, so if I understand your question correctly is that the transaction that reaches the lock point first Can you make it commit first? Why is so it depends upon why is the plateau the plateau is because I've gotten all my data Locked all the data I need now I'm processing the data like computing an aggregate and are computing a join or something like that So that time here is still going to be taken for doing the work that I want to do for the query So I can't just drop it right there. I have to wait till all of that work is done Does that make sense or maybe I didn't understand your question? I Thought your question is why do we have a plateau? Why can't we just drop at the lock point, correct? Yeah, because the plateau is where the work on that lock on all the things that we have read is probably getting done So could you have a transaction could the plateau be zero in some cases? Yes if all the work it needed for example if this were an update query and The last thing that when it reached a lock point was the last record It needed to update it updated send drops at the plateau will be very small Yeah, that makes sense It's also possible that you know, I grab a lock here and then I'm patting for a long time before I grab at the locks You know, I read a table then I'm gonna do some work with it Then I decide I want to do something else based upon the contents of what I read So it's not you know the plateaus and peaks all we have is that the growing phase is only going up wrong It's monotonically increasing plateau and then drop in two-phase locking. It's a monotonically increasing and then monotonically decreasing curve Okay, they can be platos on both sides Yeah, but that causes cascading a boss right that was the whole point if yeah, go ahead Yeah, I see what you're saying very good very good So you are saying oh, you know what there wasn't really a problem here What if you said? Let this guy go Don't let it come it till the outcome is done. That's exactly what some protocols do like Microsoft hackathon server Hackathon system and I worked on that with a whole bunch of really smart people at Microsoft internally will keep track of this dependent stuff and then won't let this Won't let that transaction T2 to commit because it has taken a dependence on some of the transaction So even if you say commit here, it'll say no no no I got to hold you because you've read something else I need to know that before I let you go absolutely absolutely. So the more advanced protocols will do stuff like that So that's great. You're already designing the next generation of protocols, which is good. Yep Other questions Yeah, so the question is the longer I hold the lock am I not reducing the parallelism? Absolutely. That's why more lock modes They will hold the locks but weaker lock so other locks can get in so wait for that And we may not get to it today, but that's okay We'll get to it All right, so let's keep moving. Hopefully this makes sense to everyone and We now have a solution that works and gets us out of this cascading of boards All right, so let's take a look at this example really quickly just to make sure everything is solidified I think everyone got it based on this discussion But just give you a little bit of time to digest this to consider two transactions t1 and t2 and he tells me he has a bookie So he keeps transferring money from his account to his bookies account And then need to go and compute the total across both those accounts. So those are the two transactions a Non-2PL example and this is again to make sure we are all on the same page You'll acquire that s lock and then you'll get into this situation where you end up with an incorrect value Right by now. I think everyone gets that that that's the type of bad stuff that can happen To phase locking if we end up doing that We'll use the locks and then move things around so that we end up with the correct state Right still doesn't avoid cascading about which we just talked about but you can now you can take this thing put in a Dependence graph and do that by yourself and see that we broke that cycle. We didn't let the cycle get far Okay, all right and strict strong strict two-phase locking in this case would basically required that a and b both be held till the end and You basically are Effectively running t1 after t2, right? There's no parallelism there in this case Okay, all right. So going back to this Serial schedules, right a bigger subset of that. We are doing that because you want to allow more stuff use the database make it more efficient View serializable theoretical, but you know good to know what it does cascading about picture looks like this It cuts across that there are some complex serializable schedule That it will remove from consideration But make it safer so that you don't have the cascading about case so we've restricted that space a little bit more Okay Obviously a serial schedule is not gonna have that because you know the outcome before you go So it's covered under that stuff But when we start to do strict two-phase locking we are saying we're gonna have a little less parallelism But we're doing that because we want to avoid these cascading about Okay, all right Questions on this All right. So now Let's just make sure we have everything together We talked about the strong strict two-phase locking as many of you have already asked What about that situations where? Transactions T1 and T2 go after two objects A and B in a different order. That's exactly the example here a Locks a transaction T1 locks a transaction T2 locks B T1 and then asked for a lock manager says whoa, I need to put you on the pending queue You need to wait T2 later comes back and says I need B T2 can't get a because T2 T1 has it and vice versa So this is a deadlock. No progress is going to be made. We follow strict two-phase locking We can still get a self into this trouble Okay, so we need to get out of this and for that there are these mechanisms To find deadlocks. So that situation is called a deadlock and There are two ways that you can deal with deadlocks The first one is called deadlock detection Periodically, you'll run some background thread to go look through the lock manager and say do I see a deadlock cycle? We'll talk about that in a second and if so I'm gonna kill one of the transactions and break the cycle Okay, this is a cycle for deadlocks different than the cycle for dependency graph Okay, and then deadlock prevention, which is in some sense think of it as being pessimistic It's gonna say I'm gonna stop you as early as possible Okay, but now this is for deadlocks right not for the dependence graph stuff Similar concepts but different applications different problem that they are trying to solve So we create these things called the wait for graphs They look like dependence graphs similar ideas, but for deadlocks. Okay All right, so imagine I have a wait for graphs The are going to be constructed from the lock manager state of what's been locked Right not in terms of the read write stuff, which is what the dependence graph stuff So they're different don't get those confused Okay But they're also related if you think about it a little bit and we can have a separate conversation But the concepts are different this will get built from the lock manager the lock manager knows which transaction has been granted which type of lock for which object and who's waiting for that and From that information you can put this and put this thing together so here's a schedule of three transactions and T1 locks a and then tries to lock B T2 locks B then tries to lock C and T3 tries to lock C and then a So it's just showing an example with like three transactions previously saw a deadlock with two transactions But they can happen across three transactions. So as you can see this thing is done slightly differently in the lock manager We will go in the backward fashion in this to say oh who is waiting for what and that's what becomes an arc here Right the dependence graph was this way right in time here. This graph is this way in time right, so Say Steven is waiting for something from T2 T2 is waiting for something from T3 and The deadlock completes when that last request comes in at this point No progress will be made across these three transactions. The system will just wait forever Unless we break it Okay Questions does that make sense? any cycle in the ways for graph will be a deadlock and a cool Observation has been that most cycles tend to be of length to so a deadlock detection itself is a complex algorithm to run Computationally very expensive so many times people will run, you know So this is a question of how frequently should I run this background thread to go through all the locks in the lock manager and compute this graph The lock manager could have millions of entries in it So computing this graph can be very expensive, but they're cheaper algorithms to say find Grabs find cycles of length to which can be done a lot faster and wow balloons went up over here Because of this whole new camera thing here so You can have the background thread run more frequently to just find cycles of length to but less frequently for the bigger cycles Okay, that's just a tidbit In case you get super interested and excited about the deadlock detection stuff The main point is finding this graph if you have got a large Lock manager table where there are lots of entries is expensive So you can't just run this all the time. You're gonna have to run it periodically Okay, so that's the trade-off between the frequency of checking and how long the transactions wait You can check all the time you'll find the deadlock quickly But you're spending a lot of cycles and just finding deadlocks if you check Let's say every day then a deadlock that's been formed for a day. You won't find it for a long time Yeah, yeah Yeah, so the question is can I maintain this lock graph dynamically every time I do an Entry if make a request to the lock manager you could but then that will add more cost See you're trying to make this lock manager go really fast I'm acquiring a lock let it go and if you start to say I also need to update and graph that's a trade-off you're making It'll double triple the time of each lock requests And so what what you would rather do is to have these locks not have as many chance of Being having these types of conflicts stop them early. That's what the hierarchical locking will do as we'll see in a little bit And and so there are all kinds of methods But there's a cost to dynamically keeping that graph in memory all the time okay, and Sometimes when you're in a distributed system That state is not in one place the lock manager is at each node and the deadlock detection Someone is to bring all of it together to one node and then do this stuff So you don't even have a global graph till you bring it together in a distributed system and most database systems like snowflake Databricks all the things that Microsoft Amazon and Google after they're distributed, right? And so It's not in one place So even more expensive Other questions Okay, so deadlock detection. How are we going to do that? Let's say we found the cycle Now we have to choose which node to kill which transaction to abort and this is remember last time we talked about how a Transaction may start and sometimes the database may abort the transaction for you so you as an application programmer write your code And you have to be ready for what happens if it gets aborted so often in good applications You'll say if a bot retry five times or whatever before you give up and throw an application error Why would someone why would the database about that deadlock happen is the common reason, right? You were doing something some of the application code was doing something and the database detects a deadlock one of those application code one of those sequel queries has to get Aborted but and that's the bot is not because the programmer had an explicit abort call is because the database has to abort for a deadlock and How would you choose what the victim is there multiple ways you could say? lowest timestamp Or you could say I will abort the transaction that has done the least amount of work You might have some definition of work saying oh you fired ten queries in the between the begin and end statements Then have been fired the other one is just fired to so I'm gonna go kill the transaction which has done less amount of work By the number of items that are locked in the lock table because that may be a proxy of the amount of work That's a month done. Basically when you abort a transaction you're losing all that work, right? That'll have to get redone So you're trying to optimize for that But the balancing part is that if you abort a transaction That has done the least amount of work or made the least amount of progress Then you could end up starving some transaction that can never make progress because the older transactions could always be ahead of it So you want to avoid starvation and so sometimes what you happen? What you end up doing is to say if I'm using a timestamp based protocol, which is pretty common then When I abort the transaction and gets recent I will keep the original timestamp So the transaction gets higher and higher priority so that it has some chance of finishing at some point in time If you give it a new timestamp, it may never get run Okay all right, so Now when you have to undo a transaction and has to be killed there are two different ways to do it One is you could have bought the entire transaction and then sequel you're also allowed Let's see you writing application code in which there are ten sequel statements that the application needs right the first one Is your flight has been confirmed their hotel has been confirmed stuff like that and you reach the Reach the ninth step and realize the last query is the one that caused trouble You may want to say don't abort everything just roll back to the previous Safe point so you can explicitly put safe point calls So you as a programmer have to do that and then you can say if it aborts just roll back to that And I'll retry again just that last step So you as an application programmer have to explicitly put these safe points But there's a mechanism so that you can get tell the database if you need to abort me just abort me to this level okay all right the other side of Dealing with deadlocks is deadlock prevention. So what it's gonna do is I'm gonna stop the deadlock at the first touch okay, and so Here's what it will do is imagine two transactions T1 and T2 are have come into the system and T1 is Older than T2. So the first protocol called wait die says if the transaction that is requesting Has a higher priority. So it's older than the transaction who already has the lock Then what I'm going to do is to allow that requesting transaction Then the the requesting transaction can wait for it So if I'm an older transaction, I'm allowed to wait for the younger transaction who has the lock Okay, so that's the wait die the wound wait is a Little terrible if you think the old and the young analogy It's like family old transaction and a young person has it. I can abort it Involuntarily and take away its locks. So it's called wound wait Which means I wound the transaction because I'm older and I get the locks that they have Okay, both of them will prevent that cycle from happening right because the minute someone starts to wait and art starts to Form you're saying whoops, let's kill it. So it's a little aggressive But it doesn't require maintaining the wait for graphs and it doesn't require going to this lock table and stuff like that and Hence it is simpler in that way, but maybe more wasteful Okay, and so if you want to look at an example up over here, let's just go Look at that. You've got the first situation where there's T1 followed by T2 will go by timestamp So T1 is older than T2 in both these examples And so when T1 is waiting for T2 because T2 already had the lock on a in the wait die T1 is allowed to wait. It's older. So it can wait But if it is wound wait T1 will be nasty. It'll kill T2 and make progress okay, the other way around is T2 comes in and now wants to make a request for something that the older transaction has the situation flips around in wait Di T2 will abort. It's like young people can't wait for old people and wound wait says yeah fine if you're younger you can wait Okay, so you follow one of those protocols you can mismatch like that, right? Otherwise it won't work. So you just follow one of the protocols Essentially, it says as soon as I decide in some way in which the direction of the arc holds in that waits for graph And as soon as I see an arc happening, I stop it. That doesn't mean the cycle has formed I just don't form ox from if you don't stop allow ox to form no cycles will form, right? Yep Don't mix and match in the same system. You have to follow only one other is it won't work. You'll get into a mess. Yeah Yeah, so you can't mix and match and say for this part. I'll do that this part. It's just like follow one protocol Yep That's a great question There isn't a very just very good answer for both of that I think it's like depends on the application and the workload and the scenario But you know people said research into what ways to break dead locks and what to do with that So it's an open question But there isn't a not that I know of a very approved that says is better than me And if you find one, let me know Okay, so We've talked about why these guarantees why these systems work, you know Hopefully you're getting to understand that when we deal with transactions We like to visualize things that graphs and cycles are bad and all that's happening across the dependence drafts and wait for graphs as we are We are basically finding ways to not have these cycles get popped Okay all right, so So far what we talked about is all these schemes that we have Have a one-to-one mapping from the database objects to the locks And if you want to update a billion records as we just mentioned Towards the beginning of the class will be acquired a billion locks and that's very expensive, right? making changes in the hash table is pretty expensive and Can we do better? so There is this notion called Hierarchies of locking and space on a very simple observation Saying database objects. Let's think about a record Doesn't sit by itself. We put record into pages We put pages into tables tables into databases so we have a hierarchical structure for how we've organized the data and in fact that hierarchical structures also how we access the data right in the Execution code that you wrote for your operators. You'll open a file and start scanning the record right first You start with the database then the table And then the pages and then the records so you're following that hierarchical structure. So the question is can we Try to acquire locks at different granularities So I could have a page level lock a record level lock. Sorry a column level lock but also a table level lock or a database level lock and Allow mix and matching of all of those. Oh, and by the way, let's also have different types of locks some that are more relaxed So that sounds like bizarre, but that's where the breakthrough that this this paper had and the work that IBM and Jim Gray did was so Exciting and totally changed the field So here's how it looks. I have a database Structure hierarchy databases have tables tables have pages pages have records and records have columns or attributes So now what I'm gonna do is I will say I'm going to allow So if I've got a transaction t1 if it wants to acquire a table level lock It's kind of like I can lock the table object and implicitly it's locked everything below it Okay, so now instead of record locking a billion records. I'll just lock the table because I'm scanning the whole table Very cheaper, right and I can make everything that we talked about correct Right and we'll talk about this protocol at a very high level. There's a whole paper on it It takes a full lecture to go through this and again, that's going to be talked about in the database Systems that's but hopefully you get the intuition for how it works So database systems will actually allow things like that. They'll say, you know what you can The the locking mechanism internally can lock at the database level, which is pretty rare We talked about how in the early days of MongoDB That's the only lock they allowed at the database level because it was easy to implement Right and of course they changed it's easy to implement you get correctness, but no parallelism, right? So performance suffers But database systems now have these hierarchy of locking table level lock is very common for systems to have implemented Page level is also very common. It's very easy to put as you can imagine the first time you access a page in the buffer pool It's when you can go the buffer manager can make a request for the locks table stuff When you open the file that request can be made, right? If you're doing Record level locking then when you're scanning the records in the page is when that request can be made and Very rarely systems also do attribute level locking so the system that supports attribute level locking obviously you can do a lot more parallelism because I'm doing record level locking and two transactions one of them wants to update One column the other one wants to update another column. They really don't conflict But if I only do record level locking I will Make them go one after the other Right. So you go further down you'll get more parallelism But you'll also be do a lot more locking So what you want is to allow if you wanted to allow all of that Can you mix and match that can you get the best of both worlds and the answer is yes and To do all of that correctly, you're going to have to introduce this notion. Yep Yeah, very good excellent question if I have a Column store or decomposed storage model can I lock an entire column? Yes You just adjust the structure of this to set tables becomes columns and so pages you'll get columns and before caught below columns you might have pages and Then isn't that you would have columns? So absolutely as long as you have a structural hierarchy for a column store. It would look different. That's a great observation Yeah Is it common or rare it is pretty common because column stores get used in analytics systems That's the way in which you'd been an analytics system now if you're building This next question that you probably have is like how common is it to do updates in a column store? The answer is more and more common because now people are trying to build these things called h-tap systems So typically what people will say I've got an analytic system and I've got a transaction system But now I'm maintaining two systems. So an h-tap system says I want to allow transactions and analytics and do that efficiently But analytic workloads want column stores whereas transactional systems want to go stores So there's a huge bunch of research going on on how to make that work As long as you have a storage hierarchy you can make the theory of this thing work And that's the beauty of this technique it works with any hierarchical data structure But how to make h-tap systems work where I might have a row store and a column store is Something that is being researched intensely in the community right now Just to make sure the question is on this record Specifically like having walking on entire systems in OLAP systems That's what you would do. Yeah, if it is not read-only if it's not a read-only database if it's read-only database Everything you're talking about won't matter. Yeah, right. So there are OLAP systems We say I designate you as just read-only and no updates. This means none of the supplies right great other questions Okay, and we won't finish everything today, but that's okay. Let's see how much we can get through I Keep asking questions because this material takes a little time to digest So to make all these hierarchies work if we just kept a shared and exclusive lock We won't get much performance because you know if someone locks the entire database There's no parallelism right so it will backfire on us So we need this thing called intentional locks which sort of seems weird and connects to the question that was asked earlier What are they more than two types of locks? Yeah, there are lots and lots of different types of locks The simplest one is going to be one where we have something called intention to share and what this says is look I am traversing. I'm T1. I'm traversing table T1. I Don't want to grab a shared lock on T1 because it may be too much. I only want to look at some records in T1 right maybe I'm just accessing an index and only the records that match and I don't know what will end up Maybe that index can Ends up the predicate range is such that I need to access every record or it is so narrow that I might access zero records Or only one record in the table. I don't know that but I have to acquire that lock as I come down this hierarchy first So I could acquire the shared lock on the table, but that could be really bad if very few records actually need to be read So here's what I'm gonna do. I'm gonna say I haven't I intend to grab a shared lock somewhere down below So I will grab an IS lock an intention to share lock and the beauty of an IS lock is it's compatible with an IX lock Which is intention to write to get an exclusive lock somewhere down below So the main line is I could grab an IS lock so other readers can of course go through I'm just saying I intend to share down when I get below I'll actually grab a shared lock on the thing that I'm reading but at the top level and not Blocking so I'm have this fight between if I just do let's say Tuple-level locking life is good with strict strong strict two-phase locking But if I have to scan a lot of records my locking overhead is too high So I'm trying to balance the locking overhead with how much parallelism I allow So the IS locks come in different flavors. The three main flavors are intention to share intention to Grab a shared lock somewhere below I X is intention to grab an exclusive lock somewhere below and shared intention to Six locks is I will read everything below. I know that already But some things I may actually update so it may be I don't have Index built at all. Let's say on a student record Table student table and I want to scan everything and take everyone who's got a B minus and move them to a B I don't know which ones I'm going to grab an exclusive lock, but I know I'm gonna read everything So I will say please don't make me grab shared lock down below assume I've shared locked everything so just one lock and let's say there's only one student that I need to update out of the Hundred students. I will only grab one exclusive lock later for that one student that I need to change So it's very efficient two locks, right as opposed to having to grab a hundred plus locks Okay, so how does that work? We need a compatibility matrix before we can proceed and That compatibility matrix has a couple interesting points that I'd like to draw your attention to one is that an IS lock is Compatible with an I X lock and what that is saying is that yes, you intend to share Do some read below this level in the hierarchy You will grab a proper shared lock on the stuff that you actually want to read and hey If someone else wants to grab an exclusive lock down below they could say at this level I'm just an IX lock and IS and IX lock are compatible Right and you'll see that with an example and then the sixth lock is compatible with the IS lock because Six lock is like I've got a shared lock on everything below So of course read is compatible with the read including compatible with an intention to read and so that is also compatible Okay, so Let's see. We've got five minutes. That should be enough to go through some of the core parts here Let's take this example of getting a balance from Andy's shady offshore bank account I don't think he really has one. There's just an example. So Don't tell people it's real and then increase the bookies account balance by 1% and Let's go and look at what that looks like So we've got a table of all these Bank account information and we'll start with the first transaction to read that record at that point We might only want to read that one record. So what we might say is at that I declare an IS lock on the table and then you know assuming there's only one Andy I'm only going to read that one tuple and Basically write a shared lock. So two locks. I'm done and even if I had to scan the entire table I'll only grab shared locks on what I need to Now at the same time the parallelism that is allowable here is that transaction T2 could be working Concurrently while those locks are being held and can grab an IX lock and as long as it does not want to update the same Record that is Andy's record that transaction is allowed to proceed Yeah, so the question is If you know you're modifying the table, can you grab an exclusive lock? So there is some situations where you would do that for example Maybe and that's again a maybe if I'm trying to make a schema change to the table That means I'm going to change everything and for that you would say oh, I would need to grab an exclusive lock on the table So if you know that you're going to make changes to everything below Then you would stop the hierarchy and grab the lock nothing in the hierarchical locking says that I couldn't have grabbed an X Lock on table R. So it depends upon what I'm trying to do okay, yep Yeah, so you're saying should transaction T2 have grabbed a read lock on The tuple N and then upgrade it to an X lock later Sure, but also Yep, yep, and what are you trying to get so but that would prevent someone from doing an update This would not happen if both of them See right now we allowed an update to a record happen at the same time as a read to another record We allowed update if I grabbed an S lock on the table no updates can happen on the records Right, so look what happened here, right? We allowed as long as they're touching different parts of the same table We said it's okay, and this hierarchical locking allows that Yes, so the protocol is you'll go down the hierarchy of the database structure You can issue an IS lock I X lock or six lock any of the lock modes we talked about But if you do intention lock then down below you must grab a proper lock So if you do an IS or X lock you must grab a S or an X lock below, right? Yeah, the benefit is imagine this query. So here let's go to this example and that will make it clear Okay, so imagine I've got this transaction T1 T2 and T3 and we'll stop after this slide and pick it up in the next class T1 wants to scan all the records and update only one Okay, so now imagine this is a billion rows in it So here what we can do is T1 can come in and grab a six lock S lock says implicitly now down below go ahead and read anything You've got the free permission free rights to go do that But the record you want to go update ask for an X lock if you get it Go ahead and do your work only two locks as opposed to a billion locks being acquired, right? And another transaction could come in that wants to read a single record It is allowed to go through at the same time because it'll acquire an IS lock which is compatible with a six lock So more parallelism is being allowed fewer locks are being gathered are being issued Efficient more parallelism is allowed because at that same table that read request is allowed to go through A third transaction comes in that scans all the records It is also allowed to do all of its work till it gets to that last record so if it's going to transaction that last transaction if it wants to grab an S lock it will wait Till the table are because it has a six lock which is incompatible with an S lock So it will wait for that till it goes further down Okay, and then when it is done it goes get that request So, you know, you could have said oh transaction to could have grabbed an IS lock and then grab the billion reads Till it got to the last record even that would be allowed if the transaction So wanted to do that if it wanted to say I want to go and I don't mind paying the cost for the locks Because I want to get this palace and you could do that But he probably wouldn't do that because it'd make the protocol very hard but essentially what you saw here even with transaction T1 and T2 is that very few locks were a Had requests were made and you allowed this level of parallelism Right, so you really get a much higher performance database system Then you would get through the other ways if you didn't have the intentional locks and hierarchies You will not get this level of parallelism All right, I'm going to stop here I can take questions offline and then I'll pick up from this slide in the next class. Shubham hit it My living