 All right, so let's get started. So today we're going to finish up our discussion of transaction scheduling from last time. There's just a little bit more to do. We're going to look at a simple test, high level test for serializability. We're also going to look at, basically, techniques for getting distributed databases to be consistent and to support atomic transactions. So those will include strict two-phase locking and also two-phase commit. And again, we have some slides that are due to Mike Franklin from an earlier offering. All right, so again, with transactions, our goal is to have these four properties, the acid properties, atomicity, consistency, isolation, and durability, which we discussed last time. So transactions are the sort of fundamental tool that provides these properties globally in the database. And the idea of a transaction is to group together a set of reads and writes so that they can be executed atomically. So you can guarantee that the database state moves forward to something that's consistent and meaningful or it stays where it is. And it can't be in some inconsistent state in between. So canonical examples are financial examples where you're trying to move money from one account to another. And you want to make sure that the money is completely moved and it doesn't disappear in the middle somewhere. So that series of reads and writes has to be executed altogether or not at all. And we talked a bit last time about locks, how locks are used in a similar way in operating systems to guarantee that particular transactions have control of the resources they need for the time they need them, and then they can release them. And generally speaking, in databases, you want to use readers' writers' style locks where the reads are usually non-exclusive and the writes are exclusive. So transactions are a simple way, a conceptually simple way, if you think of a transaction as a series of operations that finishes and then the next transaction begins, it's very simple to show that the properties that you want are gonna be satisfied. The problem is though that with large transactions, you might lock large areas of the database which prevents other people from having responsive interactions with the database. Or if you have small transactions, you may find that actually while some part of the database is locked, the network is really dominating the time, the transaction for a small database, for a small data updates is often gonna mostly involve moving the data around. So both of those are bad. So we like to support concurrent accesses and as much concurrency as possible without violating the transaction semantics. And as we discussed last time, if you have two transactions executing serially, then let's see, or rather, sorry, here are two transactions that you wanna execute a serial schedule, simply puts one before the other. And both orderings will give you consistent results because the transactions give you consistency. But if you have a more general schedule, it might be more efficient because you're able to hide some of the reads, if one transaction might be waiting on this read, the other one might initiate another read and they can generally complete faster if they're in to leave. But you wanna make sure that this schedule within to leaving is somehow gonna be equivalent to one of the idealized schedules, the serial schedules that were on the previous page. So how do we figure out if one of these general schedules is equivalent to a serial schedule? We looked at some ways of doing that last time. We'll just quickly review that and look at a slightly more abstract way of doing the same thing. So if you recall a serial schedule is some scheduling of reads and writes that doesn't interleave the writes or reads from two different transactions. So essentially you're running one after the other. Equivalent schedules are different orderings of reads and writes that produce the same outcome as some other ordering. And in particular, we're usually interested in serializable schedules, which produce the same output as a serial schedule. So the equivalence here means that whatever the state of the database before all the operations happened, it's gonna be the same state afterwards with the two different styles of schedule. So, okay, that should be fairly intuitive from last time. And so serializability is a high level, fairly complex notion, but we broke it down and looked at conflict serializability, which is a simpler notion, which only requires you to look at pair-wise interactions between reads and writes. So two operations are conflicting if they belong to different transactions, but they act on the same data item, and at least one of them is a write. So pairs of reads don't ever conflict, and other operations only conflict if they're on the same piece of data. And two schedules are conflict equivalent now if you can produce one from the other by swapping non-conflicting operations in the different transactions. So by swapping the non-conflicting ones, that implies that the conflicting ones are staying in the same order, same relative order. So now we can easily define a general schedule that is conflict serializable if you can take whatever ordering operations are in that schedule and swap them one at a time between, just swap the orderings between transactions but keep the ordering within each transaction the same and transform the result into a serial schedule. One after the other. All right, so, and just to review this to make it very concrete. So there's really just two types of conflict. There's sort of read and write conflict where you have a read of something that's being written. So in this ordering, the read's gonna give you whatever the value is here before the write. If you tried to swap it, it's gonna get a different value, so the outcome's different. So you can't, in any reliable way, swap the ordering of those two things. And the other conflict is a pure write conflict because both these writes are writing the same item X, but if the second one is the last transaction, then the last update, then B is gonna get written. If T1 executes its operation last, then A will get written. So the state after the operations will be different in those two cases. So those are the two kinds of conflicts concretely that we deal with. Yeah, and so, but anyway, the schedule's a conflict equivalent if we can do those transformations that don't involve one of those two conflicts. All right, so, so because we're doing swaps, which we're trying to move basically operations in one transaction to one side of the ones in the other transaction, you could do that a step at a time, and we did an example of this last time. You basically swap the ones that don't conflict, and we saw that we could, in the example last time, produce a serial schedule. We also looked at this one last time. Do you remember the answer was a serializable or not? Actually, this is a slightly different question. Last time we asked if it was serializable and we said no. Well, is it conflict serializable? No, it definitely not, because if it's conflict serializable, it would also be serializable. So if we said it's not serializable, also it must not be conflict serializable. But the difference now is we have a test for conflict serializable. It's conflict serializability. So we should be able to give a different type of answer about why we couldn't serialize that, yeah. The serializing means we wanna push all the operations from one transaction before all of the operations of the other. That's what we're trying to do in order to show that this ordering which might be more efficient is gonna produce always the same answer as the serial one. Well, when we serialize them, each transaction, it's completely clear when they're operating one after the other that they're executing exactly the operations they're supposed to execute without any conflicts with other operations. The author that wrote the transaction would have figured out exactly what would happen in each circumstance if only their operations are executing. But ACID requires that we also have isolation, which means that if somebody else's transactions are executing at the same time or overlapping, your answers should always be the same as if they're executing by themselves. So you guarantee that they're executing by themselves by just pushing all the steps of your transaction together either before or after the other transactions because in general, you don't know. Isolation does allow people to still be interacting with the database, but it just says that the results of your transaction on some consistent state should be predictable, should be what the author expects. So here we're trying to show that this general schedule can be converted and still produce the same output always as one where there's just two separate transactions happening. And the idea of this conflict, the arguments around the conflicts are if whenever you move around two operations that are not conflicting, the output must always be the same because they have no way of interacting. You're operating on different data items. It really doesn't matter what order that happens in. And if you're only doing reads on the same item, it doesn't matter what the order is. But from the previous slide, we know that if you have a read and a write on the same item, you can't swap those. And similarly, if you have two writes, you can't swap those. So now let's try to answer this question. Can we give an answer in terms of the conflicts for this example? So what are the conflicting pairs here, first of all? Ah, yeah, okay, good. All right, so there's three of them here actually. So RA and WA, all right? They're in different transactions acting on the same data. So we can't swap those. These two, we can't swap either. And then finally, the writes, we can't swap. So in terms of the conflicts, we get these three. Oops, sorry. Well, I've actually just shown two of them there because as soon as I have these two, you can see that it's impossible for me to push this write here before the other write. That would change the outcome. So that means it's impossible for me to complete T1 before T2. On the other hand though, I also can't push this read after this write. That would also be a conflict. So I can't execute T1 entirely after T2 either. So there's actually no ordering that would work here. So it's not conflict serializable. So it's not serializable, which means it's, sorry. I said that around the wrong way. So it's not conflicts are serializable. It happens that it's not serializable at all. All right. But anyway, now we have this lower level test that we can apply. It's simple and mechanistic. All right, so on the other hand, if you have a more complex set of transactions that can be pretty tedious to figure out all of these dependencies and then answer the question or rather figure out all of these conflicts and then figure out if we can actually shuffle things. With long sequences, you have an exponential number of potential orderings of the two sequences and you'd have to try them all out. So there's a more compact way of answering the question which uses a dependency graph. And the idea of the dependency graph is it's a simple layer built on top of the conflict arrows that shows you what makes the argument that I just made that one transaction has to be after the other one. And when you get a cycle in that ordering, it means that there's no correct ordering. There's no linear ordering of the transactions which would be a serial ordering. So the dependency graph consists of edges from one transaction to another. And the edges are added whenever you have an operation from TI that conflicts with an operation of TJ and TI appears earlier than TJ in the schedule. Actually, the operation would be strictly speaking, that's the operation on TI should be earlier in the schedule than TJ. Okay, and you can see it's basically the generalization of the argument I just made. If there is such a conflict, it means you can't push that specific operation the other side. It means basically TJ always has to come out, come after TI in any feasible ordering. So therefore, if there's no cycle, then you can come up with an ordering of transactions and all of the conflicts will be basically in the same order and therefore it'll be serializable. It's an if and only if condition. So let's look at an example. So this schedule is going to be conflict serializable. Let's, what are some conflicts again quickly? Yes, that one? Yep, and the reverse are read A from T1 and write, it's either write A from T1, read A from T2 and then the two writes again. All right, same thing among the Bs. Okay, so if you recall from the previous slide we said that whenever we have an operation here that conflicts with an operation later on from the second transaction then we should draw an arrow from the first transaction to the second one. We also typically add an annotation saying which particular data item conflicts. So we really got two sets of arrows here, one for the A's and one for the B's. But overall there's only one ordering constraint that's implied by all those conflicts which is T1 has to be before T2, all right? And if that's all there is, we're actually okay. We can actually push, basically we can execute T1 before T2 and we do that by pushing all of the other operations of T1 to the left and the operations of T2 to the right and that they won't conflict because there won't be any backwards arrows that we'd have to cross. Okay, is that clear? Okay, so there's a dependency graph, so let's, yeah, no cycle, so we're okay and we can actually use any ordering that satisfies the dependency graph ordering to order the transactions. Here's one that's not conflict serializable. So we have a similar set of conflicts for the A's and now what about the B's? Which way did the B arrows go? Yeah, they're going from T2 to T1 because the operations in T2, all right, well sorry, there's the dependency arrow from T1 to T2 because of the ordering of the A constraints. Now, the operations on B have to come before the operations, but from T2 have to come before the corresponding operations in T1 or we'd have an illegal swap to do, so we have the second constraint arrow that says that T2 has to come before T1 in order for these constraints not to be violated. So you can see there's no way to actually order them that way and it translates to a cycle in this ordered graph, which means there's no linear ordering of that graph. All right, so yeah, so the cycle means that you have some sort of dependency which really means the necessary ordering has to follow the arrows, but whenever there's a cycle, there isn't such an ordering. Okay, so, yes, and so conflict serializability isn't the whole story. There are actually legal schedules, so serializable schedules that won't be conflict serializable. They're more complex typically and they require you to understand the sort of side effects of all of the operations and they'll involve more than two transactions usually, I think always actually. So conflict serializability is what database schedule is really used for practical reasons. It's very simple and efficient to check and the operation of constructing the dependency graph also gives you an ordering. In fact, it'll give you several potential orderings in general. So yeah, all right, so we're gonna talk next about two-phase locking which is a way to actually implement this ordering constraint, kind of in a live way. So rather than planning ahead and thinking how do we interleave these transactions, we're gonna define a scheme called two-phase locking which will basically force the schedule to be serializable. In fact, conflict serializable incrementally. So as the transactions are being executed, it'll allow them to interleave to a certain extent as long as there's a serial corresponding execution to what happens. So that's the idea of two-phase locking. Okay, and we'll do that in a second. Before we do that, here's an example of something that is serializable but not conflict serializable. So it has a bunch of conflicts which prevent a simple serial ordering. So you hear they are, and here's the dependency graph. So you can see that there's a dependency saying or there's an ordering T1 should be before T2 so as not to violate that constraint. But there's one back the other way as well. There's two writes which is trying to force T2 before T1. So we get these two dependency arrows based on A there. But then in addition, because of the bottom right for T3, both T1 and T2 are trying to write, well they do write T, write to A as well. Both of those writes lead to these dependencies here from T1 to T3 and T2 to T3. So yeah, so the cycle part is here between T1 and T2. Now strangely if we only had that cycle we wouldn't be able to fix it. But because of this additional write there's actually a different schedule that's gonna always produce the same outcome. Does anyone see what it is? So is there a way to serialize? Well there is. There's a way to serialize those transactions so that it's equivalent to a serial schedule. It's equivalent to this schedule, excuse me. A serial schedule that's equivalent to this one. Does anybody see what it is? Yeah, yeah good. So you do the simplest possible thing arguably which is just fix the, ignore the constraint here between the writes of T1 and T2 because T3 is gonna overwrite A anyway. And T3 doesn't read anything so it's written value has to not depend on what happened before. Therefore what T1 and T2 did? Yeah, yes. Well that would not, I mean, yeah. That would mess up this argument here, right? We're relying on the fact that this write really doesn't depend on what happened before because as soon as we swap those we've changed the state before, yeah. Yeah, so it only works because it's only a write. All right, so yeah and in general this can get arbitrarily hard because you can have interactions of many different transactions masking each other, creating constraints that propagate and it ends up being NP complete to decide in general if a schedule is serializable. So we're not going in that direction. We're going in another direction which is basically trying to do this live. So last time we did talk a little bit about locks. So locks in databases are typically readers' writers' locks where you have a shared locks usually annotated as S and S lock which allows shared access for reading purposes to specific data items and then an exclusive lock, an X lock for normally for writing. And well here's just a simple, it's a fairly obvious matrix showing that different transactions can hold multiple instances of a shared lock but you can't have an exclusive lock if somebody else has a shared lock or vice versa and two different transactions can't have exclusive locks. So exclusive means exclusive, it means there's only one actor that has that. All right, so two phase locking. Two phase locking is the live transaction scheduling method that allows some flexibility and overlap in the schedules of transactions but automatically avoids non-serializability of the transactions. So the constraint for two phase locking is that the transactions must acquire the locks that they need. They can acquire them one after the other but they have to acquire all of them before they, well obviously they have to acquire the locks that they need before doing the operation they need. They need at least a shared lock or an exclusive lock before reading and they need an exclusive lock before writing but basically they have to request the locks before they start giving them up. So the transaction can't request any additional locks once it starts releasing them. So basically the sequence of lock acquisitions forms a hill with a peak where all of the locks are held at some point. So it's called two phase locking because there are two phases. There's a phase where you're acquiring locks, only acquiring them, not releasing anything and then another phase when you're only releasing them. So the picture looks like this. This is in terms of the number of locks held. Has to be going up for a time. As soon as it goes down it has to keep going down. You can't acquire any locks once you've started releasing them. So yeah, and here's the acquisition phase is to the left, growing phase and then there's a shrinking phase and normally this point here is called the lock point. It's the point where all of the locks are held right before their release. Well, we'll get to this a bit later but it's also desirable to try to lexicographically order the locks that are acquired so that you can avoid deadlock. Basically, you wanna try to always acquire sort of A before B, B before C, et cetera so that you kind of have some D acquiring, trying to be acquired before A and creating a cycle. Okay, so that simple constraint from the previous slide actually guarantees that the dependency graph is acyclic and therefore that the schedule, the actual execution of operations that form part of the transactions, the two PL transactions are actually gonna be conflict serializable. Yeah, well, exclusive means only one transaction can hold a lock, that particular lock, one client. Yeah, I mean, technically you can have exclusive locks also for reading but it's overkill but you might have one for instance, if you started, if you've done a right, you may just hold that lock and read rather than re-acquiring a lock but normally for reading, the lock is a shared lock and other actors can hold shared locks as well but they can't hold exclusive locks on that data. Yeah, probably, I mean, I think it's an implementation issue but, well, there's two possibilities. One of them is you acquire the exclusive lock right away. The other possibility is you acquire the shared lock and you can upgrade it later. That's legal. So I think it's context dependent what you actually do, yeah. Well, so that's, well, you know, it's doing a series of reads and writes so once it's done reading or writing a particular item, it's able to release the lock on that item. You know, it's still gonna have more operations to do so. But the very interesting thing is that there's some significant advantages to holding the locks until the same time, until you don't need any of them. That's a strict two phase locking strategy. It's not as friendly to the other transactions, right? Because you're holding resources that you don't need but it turns out to have much better behavior if there's an abort. Basically, it provides an easier way to unravel the transactions safely. Because you're in a bit of a mess if you've started unraveling. You're in a bit of a mess if these transactions are partly unraveled and one of them fails. You've got to sort of clean up and fix the other ones. So we're actually gonna see that later on. There's a very good reason why it does pay, especially in sort of mission critical settings, to have a stronger version of two phase locking. But anyway, this is the friendly version which just holds the locks as long as you need them. So the only constraint is you always acquire and then so for instance, two phase locking forces you and you'll see this in a slide or two that if you're ready to release something but you haven't acquired all the locks you need, you have to delay releasing the one that you're ready to release, otherwise you'll violate the two phase locking constraint. So two phase locking does force you to hold on to locks until you've acquired all the ones that you need. Does that make sense? Otherwise you'd be sort of going back down the hill and then trying to go back up again, but that's not allowed. Okay. All right, and so it turns out those lock points. If you look at the ordering of those lock points, those will give you the serial ordering. So if you just sort of push the transactions apart using that lock point, it will give you the safe serial ordering that corresponds to the overlap schedule. So two phase locking will in general allow you to overlap the transactions and have some concurrency, but it will nevertheless be equivalent to a serial order. So one thing is, you know, there are some heuristics for avoiding deadlock, such as lexicographic ordering, but deadlock can still happen in different situations. So we're going to talk about that later. And yes, we were just discussing there's an important variation on two phase locking, which is strict two phase locking where you hang on to the locks right till the end and it'll prove to be useful later on when you were trying to recover from a problem. Lock management, we mentioned the lock manager last time, so that's a single entity because the locking is so complex, you can see that. The central lock manager that manages the lock requests, you know, checks for two phase locking, checks that two phase locking is satisfied and looks for conflicts. Let's see, so it's basically housing a database and providing atomic updates to the database of locks that are held. And so when somebody else requests a lock, if there's not a conflict there, requesting a lock that nobody else holds or if it's a shared lock that somebody else holds, it grants the request, creates an entry for that transaction holds a lock in the database. Otherwise puts the requester on a wait queue that will get notified when the thing they're waiting for is available. And it's an atomic database, so the locking and unlocking operations are atomic. And we have this upgrade operation as well, which is when you're holding a shared lock first because you're reading first, when you want to write, you upgrade rather than getting a new lock. Okay, so all right, so here's a transfer transaction again. So we're transferring $50 from an account A to an account B. Here's the detail of writes and reads and algebra. And then there's a second transaction that's just going to print the totals. So ideally, you know, from the programmer's point of view, the critical semantics of this transaction is that the total of the two accounts is always the same. We're just moving data around. We're not deleting or removing or adding things. But if we don't serialize these transactions, what are the possible orderings? Or excuse me, what are the possible outputs of T2? So what are the possible totals? If we start with $1,000 in A and 2,000 in B, what possible totals could T2 print for different orderings, basically different orderings of the read-write operations? Well, what one do you want? You want 3,000, right? And it will print 3,000, it'll print 3,000 in particular if T2 comes before or after T1. But what are the other possibilities? How many other possibilities are there and what are they? What's that? Can you turn that into a total? All right, so that's 39.50. So you're, let's see, so where is T1 happening? It's happening, is that right in here? I think that's right, yeah, if you put it in here, you've already decremented A, so you've lost 50 from A, but you haven't added 50 to B. So if T1, excuse me, if T2 happened right in there, then yeah, it would be 39.50. Does that make sense? So you could potentially execute the second transaction after debiting A, but before adding to B. But there's another total, which is more surprising, yeah. Yeah, how does that happen? Yeah, that one's a bit more of a brain bender, but yeah, if you read A, A gets its initial value. If you read A over here, before it's decremented, it's got its full value. B ends up with a larger value than before. So if you do the second read of B over here, you'll actually get 30.50. So yeah, it's surprisingly simple transaction, but you can get any, you can mysteriously add or subtract the amount of the transaction in the printout. So obviously, conflict serializability is pretty important, and how does the 2PL help us with that? Well, so now we've expanded the schedules to include both the read operations but also the lock acquisition and release. So X again is an exclusive lock, and S is a shared lock. So here, this transaction's asked for the exclusive lock. It's only reading first, but then it's gonna do a write. So it just acquires the one lock. The other transaction's requesting a shared lock. A already will, excuse me, transaction one will have the lock first. So transaction two can't acquire that lock until it's been released, but it acquires it down here. Reads A, let's see. I guess, oh yeah, it's only got a read, right? So it unlocks A as soon as it's done its read. Then it requests the read of B, actually reads B and unlocks the shared lock that it got. So this is a shared lock, but A is gonna request an exclusive lock, because it's going to write. So the exclusive lock is not shareable once there's a shared lock, so transaction one is gonna wait until transaction two gives up the lock on B. So in effect, we have executed transaction two in the middle of transaction one, which is bad, right? That's not a serializable ordering of the operations, and it will give us the wrong answer. Okay, and you can see it's, well, why is it not a two-phase lock schedule? Yeah, yeah, right, yeah, we do it twice in fact. Yeah, so we're not allowed to release locks until we've acquired all the locks that we're gonna acquire. So we shouldn't have unlocked A here. We could have held it until here legally. We did the same thing over here. We, transaction two actually released its lock on A as soon as it was done with A, and then requested B. Yeah, so let's look at the, yeah, so it's not two-phase locking, and it's also not serializable, and it also gives the wrong answer. So now let's tweak things so that we do have a two-PL schedule. Still though with some overlap of the transactions. So this time we acquire the lock on A in transaction one. We operate on A, but then when we're done with A, instead of unlocking it, we acquire the other lock that we need. So we grab B first and then unlock A, then act on B and finally release B and we're clear. So now that forces transaction two to wait a bit longer before it does it the read of A. Now when it's reading A, transaction one is completely finished with A, but transaction two is unable to get the sort of inconsistent middle version of B because transaction one is being forced by the constraint to get the lock on B before it can continue, before it gives up the lock on A. So that prevents B from reading something inconsistent. So now by the time transaction two is trying to acquire the lock on B, it has to wait until transaction one's completely finished with B by which time it'll be in a consistent state again. So finally, yeah, so we read the post transaction value, post T one value of A here, but we also read the post transaction one value of B as well. So this is a 2PL strategy and it's also serializable and what's the corresponding serial order? Yeah, so it's T one and then T two because we basically forced the second transaction to wait item by item until after the first transaction it updated those values. So you get the right answer here and it's conflict serializable and correct. Any questions? So now let's look at the issue that we would wanna solve with the stricter version of two phase locking where you hold the locks right till the end. So let's say we have a 2PL schedule which this, if you check, this is a two phase locking schedule. Or is it a minute? Yes, it's okay, sorry, yes, this is fine. Yeah, so it releases a lock on A here but it acquires a lock on B first. Yeah, that's right, okay, sorry. So this is a 2PL schedule. Now suppose T one aborts over here. Then, well what happens? Because you've aborted T one you have to undo the state of T one but you also have this transaction T two which started executing but hasn't finished yet. And so T one has to also roll back T two because T two read some of the state that was updated by T one. So by using two phase locking, strict two phase locking, all the locks held by the transaction are only released when the transaction completes. So in fact in this case, T one is gonna hold those locks all the way up until it finishes or until it aborts. So in fact in this case, T two won't be able to begin and update the state. All right, so all locks are held by transaction released only when the transaction completes. The shrinking phase then is sort of a cliff that happens right at the end of the transaction. Either a normal completion with a commit or if there's an abort, in which case you actually roll back the state till before the transaction. All right, so let's see. So now let's look at this schedule and check whether this is a strict two phase locking schedule. All right, so transaction one is acquiring a lock on exclusive lock on A again. And then it grabs the B lock before it releases its A lock. And that's the answer there actually. So in any way, it releases the final lock down here. Is that strict two phase locking? No, okay. All right, so because we unlocked A early, this is not strict two phase locking. What about over here? Let's see. Yeah, and you can see there's unlocking happening again before all the operations are complete. So this is not strict on the right-hand side either. And a cascading abort is possible here on either side actually. If either of these are borts, there's interaction of the state actually. Certainly if T1 are borts, because at this step here, T2 is red state that was set by A. So there'll be a problem. So anyway, now by contrast, suppose that we add strict two phase locking. So now we didn't release A early. We held on to it right to the end. And notice that that forced the beginning of the T2 to be a lot later. In fact, it basically serialized the transactions. So now if there's an abort somewhere in here, we don't have to enroll T2 because it actually hasn't started yet. All right, so quick reminder, there's some deadlines coming up and nothing this week. So you can relax a little bit. But the project three, the two main deadlines are coming up next week. And hopefully most people got their design docs in yesterday. All right, so we'll take a five minute break and just then do a quiz and wrap up. Okay, so let's review some of this material before we keep going. So first of all, is it possible for two read operations to conflict? No, not by themselves. A strict two phase locking schedule does not avoid cascading abort. All right, double negative, but right. Okay, two phase locking leads to deadlock if the schedule is not conflict serializable. Whoops, I'm gonna review that one. I don't think that's right. Schedule is not, okay, let me get back to you on that one. I don't think that's right, but let me check. All right, because it could be serializable. All right, let me check that in, sorry about that. A conflict serializable schedule is always serializable. Yeah, that's true, okay. All right, what about the following schedule? Is it serializable? So what kind of conflicts do we have here? We have this one and this one and this one. So is it serializable? Yes, yeah, right, there are three conflicts. We've gotta move these past each other, but they're okay. All right, so deadlock if a schedule is not conflict serializable, well, all right. So two phase locking, we actually haven't gone into this actually, I'm gonna move on. Let's see, we're gonna look instead a couple of ways of directly detecting deadlock and dealing with it. Okay, so there are a few ways of dealing with deadlock. One way is to rely on timeouts, associating a timeout with each lock. If the timeout expires, it's evidence that there's deadlock in the system. So what's the problem with that solution? What's that? Yes, well, but that's a penalty that you pay if there's some inconsistency in the state. Okay, any other problems? Yeah, well, you have to also be careful about which transactions still hold the lock and whether they can continue. Anyway, so the goal of deadlock prevention is to prevent circular weight, so we'll look directly at that problem. So one approach is to assign priorities based on time stamps. So if we assume that a transaction TI wants a lock and another transaction TJ holds it, then there are two policies that we can implement. One of them is called weight die, which means if transaction I is older, then TI has to wait for TJ, so the older transaction waits for the newer one. If the ordering of time, commencement time of the transactions is otherwise, you have to abort TI. The idea is that you allow a weight dependency, in this case, going forward in time, so the dependency arrows, depends on arrows are going forward. When the weight would go the other way, so if TJ is actually older than TI, that would imply the arrow going back the other way, you just don't allow that and you abort TI. So, and the other approach is just the opposite in time. So if TI is the older of the two transactions, you abort TJ, which allows TI to get the lock. Otherwise, TI waits. Now, TI is the younger of the two transactions. It's waiting on a TJ that's older and so the dependency arrows are going back in time this time. But in both cases, it's a linear ordering by time. In one case, going forwards, in the other case, going backwards. So it's acyclic, so you're gonna have a deadlock problem, yeah. Well, in general, you could. I mean, that's sort of the even more extreme, I guess, two-phase locking strategy, but the difficulty is that that would allow virtually no overlap of transactions that share resources. So that's the minimum concurrency solution. It's basically, it's forcing serialization of anything that has any mutual, any shared data. Yeah. Well, yeah, the, well, the flip side is the boards are fairly rare, actually. So the idea is that mostly you wanna get good throughput, so mostly you wanna emphasize concurrency as long as it's safe. So the preferred strategies are either using a kind of a weaker two-phase locking where you're locking and releasing as much as you can safely, which means still going up the hill and down the hill, but releasing as soon as it's feasible with that constraint or the stronger constraint, which is really overkill if you just want consistency, but it helps you with the aborts, all right. So, and part of making this work is to make sure that if a transaction does restart, that you restart it with its original timestamp. So it's ordering and the list of transactions stays the same. So let's move to deadlock detection. So that's a less conservative strategy. So we allow potentially cyclic dependencies to occur because they won't always cause deadlock. I said potentially cyclic because transactions may have orderings that are going back and forth in time as long as they actually don't close the loop, you're still okay. So those timing strategies, perhaps, you can argue they're too conservative. So the other approach is to wait until you get some evidence of deadlocks, such as timeouts, and then try to fix them. So a more general version of checking for deadlock is to use a wait for graph. So here the nodes are transactions and you add edges explicitly from one transaction to another if the first transaction is waiting for the other one. So this is more or less tautologically discovering deadlock. And if you have a cycle, then you break the cycle by removing one of the transactions, basically aborting it. It allows the others to continue, but it breaks the cycle. So here's an example. We have four transactions. You can, all right, so let's see. So the first transaction's just acquiring a series of shared locks. It's gonna, well, first of all, what's the first dependency? So A and D are different objects, but here we have an exclusive lock being acquired by T2 or being requested by T2, and then a shared lock requested by T1. So who depends on who in this ordering? Well, who's waiting for who, I should say. Yes, T1 is waiting for T2. So we should draw an arrow from T1 to T2. All right, so there's that arrow. Now what about, what's another object that's being shared? So what about C? Yeah, so T3 should get it first, so T2 should wait for T3. Oops, sorry, I went too fast. So T4 is trying to acquire B, but T2 already has it. So where does the dependency go? Yeah, T4 to T2, yep. And there's one more that's important, T3 to T1, all right, so, and that's problematic, right, because there's the cycle now. And so we can get deadlock in that situation. So the idea is to kill off T3, abort that transaction and just leave the state defined by the other transaction and then often reapply T3. All right, okay, so any questions on that? Yeah, well it's a transaction, so it's supposed to have an isolation property. In general, transactions don't know what other transactions are happening. The transaction's normally an operation that's pushed by a particular client. If there are interrelated actions that you wanna happen from one client's perspective, then you should really be bundling those into a transaction. So normally different clients are executing these transactions, which means that it's usually safe to apply T3 because T3 in general, the owner of T3 in general doesn't know what other transactions were happening. And so if this happens a bit later, it shouldn't matter. Okay, so all right, so we've talked a lot about getting consistency and using serializability to help with that. So we're gonna finish up by a discussion of durability and atomicity. So we wanna make sure that our transactions stay atomic even if there's some failures in the system. And also that when there are failures that we can bring the database up into ideally the same state it was in when it went down or at least unroll it into some consistent state. So let's look at some failure modes and how we can still guarantee atomicity. So first of all, we wanna make sure that the changes that we make are an all or nothing form. And the protocol that's used for that is two phase commit, which is basically including a commit operation that's propagated to all of the nodes that creates a state and understanding throughout the database, both the main nodes and the worker nodes that they know that this transaction's been committed. If that's not possible for any reason, if there's any failure or inconsistency then the transaction's aborted and that information propagates to everybody and it leaves you in the previous state. So two phase commit is the distributed protocol that guarantees that outcome. And it was developed by Jim Gray, who was the first Berkeley computer science PhD from 1969 and sadly was lost at sea in 2003. But he wrote many seminal papers on databases not just this, not just two phase commit. All right, so in two phase commit, there's one coordinator and then and workers which have replicas of the data. And at high level, the coordinator's going to try to execute the commit and all the data nodes. And the workers reply that they agree to the commit by voting for commit. And if everybody votes for commit, then the coordinator says global commit, which is the definitive announcement of the commit. But it has to be a unanimous vote and it has to be heard from all of the workers. If the coordinator doesn't get an unanimous vote, it'll broadcast an abort and the workers obey those messages. Okay, so as a protocol, it looks like this, the coordinator sends a vote request to the workers. Workers wait for that vote request and they send either their commit agreement or they send an abort if for some reason they couldn't do it or they're not ready. And then they abort having sent abort. And then the coordinator receives those vote commits. If it gets a unanimous vote, it sends the global commit confirmation to everybody. If it doesn't receive the unanimous vote, then it sends an abort to everybody. So this, the coordinator is the authority that's announcing the result of the transaction to everybody. Everybody's supposed to adapt their state to that voice. Yeah, yeah, yeah, so why should all the workers participate if this is not a replication situation? Yeah, there wouldn't be a need for nodes that are not updating to do this. So we are looking here at the fully replicated case. We're essentially trying to guarantee consistency across the database. But if it is a distributed and heterogeneous dataset, then there are a number of nodes that might be affected and you have to get consensus from all of them to complete the commit. All right, so if everything goes smoothly, the timeline looks like this. The coordinator sends its request for votes. All of the workers respond after a time. If everything went well, they respond with a commit vote. And once it's received all of those, then the coordinator sends the commit confirmation. So you can implement the coordinator with a state machine that looks like this from its initial state. So this is the state machine for doing the commit protocol. So you initialize it and say basically commit. Well, here it receives a message to initiate the commit. It sends out the vote request to all the nodes and then waits for the responses. So it sits here and it will receive, in general, either a vote for an abort or a vote for a commit from all of the nodes. Once it's received votes from all nodes or until it times out, it'll count up the votes. If it has any of them for commit, it does commit. And in the other cases, it'll do the abort. For the workers, you initiate the commit on a worker. You, let's see, receive a vote for, excuse me, you receive the request from the master node to commit the transaction. Either you send a vote to commit or you send a vote to abort. So you decide at this point, if you abort the worker nodes just abort at that stage, they don't continue in the protocol. If they're voting to commit, then they go into a ready state and which is another kind of wait state. So here they're waiting for the master node to either send the global abort message or the global commit message. So let's look at some of the failure modes. So the coordinator's waiting for votes in the wait state. If it doesn't receive end votes, which includes both the abort, but also timeouts, it simply times out and sends out a failure confirmation. So that's the simplest case. And here's the situation with the timeline. So coordinator broadcast vote requests, some of them come back, one of node dies or the message is lost. So after timeout, the coordinator decides, okay, this didn't work out and it'll send its abort message. So coordinator failures are a little bit more complicated. There's really two stages where failures can happen and different outcomes. So if the coordinator initiates the transaction and sends its vote request, I'm sorry, if the, sorry, if the commit sequence is initiated, which means the state machines in all of the workers start running, but the vote request isn't received, then the worker times out and aborts. And the aborts were already handled as votes back to the coordinator. Otherwise, the worker waits for the global abort or global commit messages and it's waiting down here. And here it's more complicated because if something goes wrong at this stage, the worker is in an uncertain state, it doesn't know what state the coordinator's in and the coordinator is sort of, it's only communication linked to the other nodes. So the only safe thing to do is to block waiting for the coordinator to come back online, which it should do quickly. If it doesn't do that, nothing much is gonna happen because the database is effectively gonna be out of commission. So in this situation where there's an apparently a coordinator failure, then worker nodes just sit and block and wait for a resolution. Okay, so here's the timeline for that. So in the first case, the vote request didn't get out. All of the workers will time out and send their abort message. In the second type of failure, the vote request did come out from the coordinator, which means vote commit messages came back to the coordinator, but it failed to continue from that point. Something went wrong. The workers are gonna, they're waiting for something to happen but they can't proceed. They can't change state because it's not clear whether this transaction is eventually gonna sort of come back online and complete or whether it has to be aborted. Most of the time the coordinator should send the abort message, but the nodes actually wait for that to happen so that everybody stays in the same state. It's possible, for instance, that some of these nodes might have received a commit message whereas others didn't. So the safest thing is to wait for the result from the coordinator once it comes back online. All right, so last topic is durability. So we wanna make sure that this process which may involve things going offline or being either was hampered by network failures or whatever, we wanna make sure everything's consistently restarted in those situations. So the idea is that these state machines for workers and coordinators are mapped on to stable storage. The states are held in stable storage which means non-volatile storage that's atomic. So it's implemented using disk storage or maybe SSD storage, but it's also atomic. So it's not just simple disk storage because simple writes could be inconsistent. You have to make sure that the updates to disk are themselves transactional somehow. That way you guarantee that the state changes for all of the state machines are themselves durable. So if something goes wrong, you come back, the state machines will actually come back to the same states. All right, so we said that the workers should be, should be waiting for the coordinator and normally they would block. On the other hand, they can try to proceed by querying each other. So if another worker, so one worker can query another worker and discover if it's in an abort or a commit state, then the coordinator must have sent that particular message. Therefore, the worker can follow that message. In other words, you can get the message indirectly by querying other peers. So some number of workers may actually successfully either commit or abort in that situation. So that gives you a hint why it's important though for the other ones that don't get any word to continue because this transaction actually may be still going headed for success once the coordinator comes back online. All right, on the other hand though, if another worker is still in the init state, then both workers should decide to abort. Somehow there's a bigger inconsistency in the database and the safest thing is to abort. All right, but if all workers are in ready that means that somehow the message, the next message to commit or abort from the coordinator, the global message wasn't received by anyone yet. So once again, the fallback is to stop in a blocked state to see what happens. Well, but we did take care, okay, so you're asking if these states were stored in memory? Is it that? Yeah, so you're asking if the coordinator did come back and if it had lost its state, then potentially you could have a lot of workers in a blocked state, which the coordinator didn't know about. Yeah, I mean that's theoretically possible, but we did take some care in implementing the state machines to make sure that all of the states were in stable storage. So in theory, the coordinator should always remember what state it was in. It sort of can't really go, it's kind of transactional, so it always has to sort of commit to its current state before it can continue. So if it does go online and come, excuse me, go offline and come back on it should always be in the same state. Yeah, it's a very important part of this, is making sure these state machines are persistent. So, okay, any ask questions? All right, so let's review some of these ideas. So first of all, strict two-phase locking schedules prevent deadlock, not in general, no. In fact, without additional constraints such as lexicographic ordering, they don't. Let's see, a two-phase commit in a distributed system ensures which ones, which types of acid property did they insure? All right, who votes for atomicity? Okay, what about consistency? Isolation, durability. All right, good, yeah, well the primary ones are atomicity because of the propagation of consistent state at the end of the transaction. That's the primary role. So consistency is a sort of an indirect effect of atomicity. Normally consistency is more directly looking at consistency constraints. So two PCN atomicity are sort of prerequisites for consistency. So that's a partially correct answer but it's strictly not a property that is guaranteed by two PC. You need really other infrastructure to check consistency. Isolation, no, and durability, that's right. That's an important part of the design is to make sure that the state machines are persistent and will come back in the right state. All right, does two-phase commit prevent workers from blocking during a commit? Right, in fact, blocking is one of the states effectively one of the states in the elaborated state machines if you include faults. So no, and let's see, coordinator maintains its state after a power failure. True or false? Yeah, it should be true. I mean, we worked hard in the design to make sure that it can come back in the same state so that whatever state the workers are in which will often be somewhat ambiguous that they can safely continue after the coordinator does one of its global messages. All right, so to summarize, we revisited serializability and gave a high level test which was the dependency graph test to check for serializability. We looked at two-phase locking and strict two-phase locking and argued that two-phase locking actually provides guarantees conflict serializability and it does it in a simple way based on the lock point. And then we talked about ways of detecting and preventing deadlocks. And finally, we talked about how to distribute a database as consistent using a two-phase commit and persistent storage for the state machines.