 He's dealing with life issues. Apparently he owes thousands of dollars in child support. That's, yeah, that's insane. All right, so he's dealing with that. All right, so remember last class we were talking about two days locking and we were sort of referring to, when we talked about running the two days locking protocol, we would acquire these shared locks and exclusive locks on database objects. And it didn't necessarily specify what those objects were. So if we just assume that it's tuples, then there's going to be a bunch of problems that are going to occur because, well, it's going to have this high performance overhead, because if we now need to update a billion tuples, we have to go acquire one billion locks. And that would be really expensive for us to keep going back into this lock manager over and over again acquiring these locks and to cause problems. So what we can do now is we can actually have this notion of a hierarchy in our database where we have higher level objects that have a larger scope or purview of the things below it. And now we can acquire locks at the higher level objects and implicitly that acquires locks to the things below it. And the goal for us here is that we want the database system to acquire ideally the fewest number of locks we need for that transaction to run correctly and we want to possibly do it in a way that has the lowest overhead of going into the lock manager and over again. So I showed this diagram before. So we're going to organize the database system at a logical level into this hierarchy where you have a database, a database has tables, tables has pages, pages has tuples, and tuples can have attributes inside of them. So now when my transaction comes along, T1 here, I can get a lock on the table, not saying whether it's shared or exclusive at this point, and that implicitly acquires all the locks below it. Now every other transaction is going to have to come in and go through the same hierarchy, right? So you can't have a transaction show up in here and just grab tuple locks. The protocol, the way the protocol works, everyone starts in the top and goes down. Yes? Wait, but this is a guarantee. So the statement is, my example here, if I acquire the lock on the table, I don't know what somebody already holds a lock on the tuple and I'm placing quite on this lock. Yeah, so maybe as you realize, T1 has to start at the database and go down. Not every system supports locks at the database. The very least, you start at the table and go down. So everyone goes to follow the same protocol. No one can come down here and do stuff without going to the top. Your question is, do I assume that the previous transaction already holds a lock on the table? I feel like if this transaction will lock down. OK, I understand what you're saying. So yeah, maybe this example is overly simplified. So the statement is, I showed T1 here. This one acquires the lock on the table and implicitly that gets to everyone else. But if another transaction shows up, can they not lock anything on the way down and just get the locks at the bottom? No. This is what the intention locks will handle for us. Everyone has to acquire a lock at the levels as you go down. So and I showed this last time briefly, I'll just say it again, that not every system's going to support this hierarchy. If they support two-phase locking, at the very least, they're going to support tuple-level locks. Then they're probably going to also maybe support table locks. Database locks, as I said, are rare. Page locks are common, but not ubiquitous. Postgres doesn't support page locks. MySQL does. Other systems do. And then the attribute locks, those are super rare as well, because that's really fine grain. That's like saying, within one tuple, I want to acquire a lock on a single attribute. Again, the only system I know that does this is YugoByte. There might be others out there, I just don't know. Again, the idea is everyone has to follow the same protocol. You go from the top, acquire locks on the way down. So maybe, yes, the mistake is I should show someone as a client, you have to acquire a lock on the database. So to answer his question is, if everyone has to acquire locks on the way down, but maybe I don't want to get a shared lock on the entire table, I just want a shared lock on a single tuple, how do I actually do this? And this is what intention locks are going to help us do. The tension locks are basically going to be hints at the higher level parts of this hierarchy that we can tell other transactions that down below, here's what's happening, that our intention is to acquire an exclusive lock or a shared lock at some lower point in the tree. And then this other transactions that show up, if they would try to acquire maybe an exclusive lock on the entire table, we would know someone down below has a shared lock on something. And therefore, we wouldn't be allowed to do that. And we can figure this out without having to go to the bottom of the hierarchy and look at everything. So there's three types of tension locks, intention shared, intention exclusive, and tension, or shared intention exclusive. So attention shared says that there's, up below me, in the tree, in the hierarchy, there's a transaction that has an explicit lock on some object in shared mode, right? So if your new transaction is showing up, you're telling everyone else, I have intention to acquire a shared lock down below. So don't do something that would be incompatible with me. Tension exclusive is the same thing. It's my, I'm saying that at some lower point in the tree or the hierarchy, I'm getting something with an exclusive lock. And again, that warns other people from doing something that would conflict. And then shared intention exclusive is basically trying to do a shared lock and attention exclusive at the same time. So you're saying that I'm taking a shared lock at that point of the tree, at that node, everything below it, but then somewhere down below, also too, I'm taking an exclusive lock on something. So we can span our compatibility matrix now, and this looks like some of the screenshots. Yes, question. The question is, if I lock something in shared mode, is that equivalent to locking everything below some tree? Yes. With shared intention exclusive or shared mode? Yeah, so I get the, at that node, everything below is in shared mode, and then somewhere is an exclusive mode. The statement is, does this mean I'm acquiring on a single node something in shared mode and exclusive mode, they're not compatible. If you're the transaction that's doing, doing the locking, then it's compatible with you. It's only compatible within the single transaction, yes. Yeah. It's basically, it's a lock upgrade, right? We talked about this before. If I hold a shared lock on something in regular TPL, ignoring the hierarchy, you okay? All right. If I hold a shared lock on something, and then after I do a read, then I wanna go back and do a write, I can go back and say upgrade my lock to a write. I can't downgrade, because if I downgrade, then that's triggering the shrinking phase. Yes. The statement is, why would you need shared intention exclusive? Between IX and SIX? Cause, well, if you go back here, there's some cases where, IX is actually compatible with IS and IX, where shared intention exclusive is only compatible with intention shared. So intention exclusive says, somewhere down below me in the tree, I'm going to take an exclusive lock. I don't know where yet, you don't know where yet, right? So someone else can come along and say, okay, well, I'm gonna take an intention shared, I'm gonna read something. It may not be the thing that you're reading, right? And at that level in the tree, that's okay. Now down below in the hierarchy, if you try to take a shared lock, the thing I have exclusive lock for, then the two-phase locking stuff ticks in just like before, you have to wait. Yes. Statement is, and he's correct, that the intention locked is says that, I, it's a hint to say, I'm gonna require a lock in this mode in the future on something down below, but it may not actually have happened yet. Correct, yes. Get in, it's two-phase locking. So it's like, as I'm going down the hierarchy, it may give the point where like, okay, well, I'm at the, I'm trying to acquire a shared lock on the tuple. I can do attention shared above, that's fine. Then I get to the tuple I actually want, and huh, someone holds their exclusive lock on that, and then I have to wait. Yes. This question is, why can't T1 and T2, in my matrix here, why can't they both be shared intention exclusive? You don't know at this point when you take this intention locks, of what, like, are we writing the same thing? We don't know. All right, so let's go through some examples, and we'll see how this makes more sense. And again, the idea is that you want to acquire the minimum lock you need at the highest point in the tree, and then as you get closer to the bottom, whatever the hierarchy is, then you get the explicit lock. So you can take intention locks as far as you can down, and then at the leaf nodes in the hierarchy, then you want to acquire the exact shared or the exact explicit shared or explicit exclusive locks. And then so as you go down, this basically means that you need to hold the appropriate lock on the parent. So if you want to get shared or attention shared on something, you need to at least hold attention shared on the parent. You could also hold shared because shared is more powerful than intention shared, but that's the basic idea. And then the same thing, if you're going to get a exclusive attention exclusive or a shared attention exclusive, you need to hold these attention exclusive on the parent node. All right, so let's not deal with, this is all just text and a table. Let's look at actual examples, let's see how it works. All right, so we're going to get, we have teachers acting as T1, T2. We're going to get the balance of my offshore account in the database. And at the same time, we're also going to get Mushu's account balance or increases account balance by 1%. So we're going to want to get exclusive and shared locks for the nodes of the tree, right? Because that's the bottom part where actually, the data actually is, but I should say that the data is, it's the lowest part in the hierarchy that represents the actual objects that we're modifying. And then we'll use the attention locks for things above it. So for simplicity, we're going to assume a two level hierarchy. As I said before, some systems will actually have page level locks in the hierarchy. This is essentially what Postgres does. It's tables and tuples. All right, so T1 shows up. And again, this one's to read my account in R. All right, so I want to read this tuple. So I have to enter the hierarchy, starting with the root, that's the table, table R. And so I'm going to get a attention shared lock at the table. Because again, that's saying down below somewhere, I'm going to get a, I'm going to do a shared lock. Then once I allow to acquire that, then I traverse down here into the hierarchy and I can go acquire our shared lock on the tuple. All right, that works, that's fine. So here we want to update Mushu's record, right? So we want to do a write on this tuple here. So I can get a intention exclusive lock on the table. That's allowed to be compatible with attention shared. And then I get the exclusive lock on the tuple and then I can do my write and I'm done. I was looking for a complicated example. So now we have three transactions running at the same time. T1 is going to scan all the tuples in R and update just one of them. T2 is going to read a single tuple in R and T3 is going to do scan all the tuples in R. And assume that they're going to arrive in the system in their, in T1, file by T2, file by T3, right? So we'll walk through them as they go to the hierarchy, assuming they don't know what's coming in the future. But they know what's, any transaction that preceded it has already acquired the locks. All right, so T1 starts again, once the scan all tuples in R and update it's open in one of them, right? So I want to read everything and then update this last one over here. So here is now where we can use the shared attention exclusive. Because I want to read everything below me. And so instead of grabbing the shared lock for every single one of these tuples and then just the exclusive lock of this guy over here, if I do shared attention exclusive then I implicitly get the shared lock for everybody. And then now I only need to get the exclusive lock for that one tuple. T2 starts, wants to read a single tuple. So this one here, we do attention shared at the top. That's compatible with this. Shared attention exclusive, right? Because there's no right conflicts. And then when we get down here, we can take the shared lock on that tuple. The last guy shows up, he wants to scan all tuples in R. And so we could do the same, we could take attention shared at the top and then try to do shared, shared locks implicitly for all the ones at the bottom. But again, the database system can recognize that, because it knows what the query is trying to do because it's sequence declarative. It would know essentially how many things do you think you're actually gonna have to lock. So in this case here, I can decide, okay, well, that's gonna be a lot of locks. So let me take a shared lock on the table entirely. And that way I don't have to do the individual locking. In this case here, shared lock is not compatible with shared attention exclusive because someone could be writing to something down below. So T3 is gonna have to stall and wait. In the lock manager, lock manager will prevent this. Right, so then T2 finishes, he commits, he goes away. So that's fine, the attention share goes away. And then once T1 commits, that we get rid of the shared attention exclusive, get rid of the exclusive lock. Now T3 can acquire the shared lock on the table and then it can complete the scan. Yes, in the back. So you make the decision? Yeah, so his question is, I said that the database system is gonna try to figure out what's the right granularity of the locks you need to acquire. In my example here, it knew that, okay, I'm gonna scan all these tuples, so let me get a shared lock at the table. Because I don't wanna get the individual locks for all the tuples. It may be the case that the, so some databases will try to figure this out ahead of time. Other times it could have a trigger and say, okay, well, I thought you're gonna be required a smaller locks, but it looks like you're acquiring a lot more. So rather than this me keep going quiet individual locks, let me go back and upgrade your parent node lock, right? And that way I get the shared lock on everything below me or the exclusive lock on everything below me. All right, again, this is what the, in case of Postgres, because it only has two levels, it's pretty simplistic. I don't think they do lock escalation. DB2 can do these kind of things. Yes. So there's an inherent trade off between. Yeah, yeah, so the statement is, he's correct. There's a trade off between the amount of parallelism you can have versus the, by having fine-grained locks versus having to go back to lock-mentor over the grain. Absolutely, yes. Why, why I shared, sorry, sorry, yes. Why I shared attention exclusive, SIX is compatible with what, sorry? Yes, because I shared. Yes. That's fine. So his statement is, I have these attention locks up above, and but what if down below in hierarchy, then I have a conflict because like they try to acquire the lock on the same thing. Then it's just two-faced locking. I can't get the shared lock on the thing you have a crucial lock on or vice versa, right? So going back here, your question is why is SIX not compatible with what, sorry? Because someone has a, if I have SIX on some node and then you try to come and get a S-lock, I'm writing to something, I don't know where it is, right? So you could in theory then say, okay, well let me take a, well actually nothing's compatible with SIX. Well like it's, you're basically saying like if I am, taking a shared lock on everything, it's why, sorry, your statement is, sorry, you had to repeat what you said, sorry. Sorry, because when you acquire S, I wouldn't try to. Your statement is like, if you acquire S-lock up above, you don't need to acquire individual shared locks below. Therefore that's why the incompatibility happens at the higher level. Yes. Oh. Yes. Look at that. Yeah, okay. All right. Yes. Yes, so statement is, he's correct, the purpose of the detention locks is to prevent you from, prevent a transaction from acquiring a bunch of locks only to realize that the thing it needs to do, it can't acquire the lock one. So you try to figure this out sooner rather than later. Yes. Okay. Yes. And there's no, sorry. Go ahead. There's no way you can upgrade locks, right? No, you can upgrade locks, that's fine, yeah. You can get denied when you try to upgrade, right? Right, you'd have to make sure, like, so if I have this, I want to upgrade to this. Well, if I have this and somebody else has this, that's okay, but now if I upgrade, and now I would conflict on this, I'd have to block. Like when you upgrade, you have to check to see like, okay, does this conflict with anybody? I need all the other locks being held at the same time. Yes. The lock, again a lock, just wait a minute, does the lock manager has this knowledge? Yes, because the lock manager has a global view, it sees all the locks. Again, this is different than the latching stuff. The latching was always embedded in the data structure. There is no global authority to say who holds that latch, because all you're doing is compare and swap. This is, there's additional metadata being stored in the lock manager saying, this transaction holds this lock on this object in this mode. Yes, yes. So the statement is, my example here, T3, I'm saying once I scan all the tuples, what if I wanted to scan the half? What's the threshold? What's the, at what point does it say, okay, don't take the shared lock, take the intention lock, and then take a share, depends on the system, depends on the hardware, depends on the query, depends on so many different things. Yeah. And again, this is where the advanced systems, the, I say advanced, the enterprise systems, the expensive ones will do things, have more sophisticated than the open source ones. Is it better in practice? Like is it just sometimes simple is good? Again, it depends. It's a cop out answer in databases. People say, oh, what about there? Could I do this? What about that? It depends on so many different factors. I can't give you like a formula, let's say this is old, you always want to do this. Other than always use SQL, and almost always use Postgres. There's no other, I have no other strong edicts to impart on you guys. Okay. So any questions about the hierarchal locking? Okay, cool. So, let's jump now into today's lecture on timestamp ordering. All right. So again, so far we've talked about two phase locking. And again, this was trying to generate a serializable ordering of conflicting operations for transactions while they're running. And I said this was a pessimistic approach, meaning we assume that there's gonna be conflicts in transactions. So this is therefore, we're gonna require them to acquire the locks before they can do whatever it is that they wanna do. And then we can use our two phase protocol, two phase locking protocol to decide, you know, is the transaction allowed to acquire that lock or not. With timestamp ordering, this is gonna be an optimistic protocol where we're gonna assume transactions aren't gonna conflict. So therefore, rather than going through all this overhead and maintenance of these locks, for the most part, we'll allow the transactions to kinda do whatever they want. And then when they go commit, then we'll go check to see whether that was okay or not. Right, so this is assuming that the conflicts are rare. So therefore you're better off just checking at the end whether things are okay, because most of the times it will be okay. Not always the case, but that's the general idea behind this. And the way we're gonna use to figure out what the serializable ordering is over transactions is gonna be through timestamps. So we're gonna introduce this notion of a timestamp when we assign to transactions. And this is gonna be a unique value. Typically it's almost always, it's gonna be an integer, whether it's 32-bit or 64-bit, depends on implementation. And transactions will be assigned these timestamps and that's gonna determine their logical ordering for their execution and how they apply changes into the database. Now this is gonna be a sort of tricky concept about this. There's gonna be a notion of a logical ordering as defined through the logical ordering of transactions operations as defined through these timestamps. And then there's gonna be a physical ordering, like a wall clock ordering. And those two aren't always gonna match up. We'll see examples of what I mean by this as we go along. So we have two transactions, TITJ. Each of them are gonna be assigned a timestamp. I'm not saying when do we assign this, we'll cover that in a second. But we're gonna say that if the timestamp of TI is less than the timestamp of TJ, then the database system has a guarantee through its timestamp ordering protocol that the execution schedule that it's gonna produce for the changes they're gonna make to the database will be equivalent to one where there was a serial ordering of TI followed by TJ. Yes. Why is it 2A? Because of a good point. Yes, it should be, if you say the. Yes. Yes, thank you. It's very precise, I like that. All right, so this timestamp is, again, it's gonna be this monotonically increasing value that we keep assigning to transactions, right? And so, again, we'll see these different protocol schemes, some of them are gonna give you the timestamp when you start the transaction, some of the protocols are gonna be the timestamp when you finish the transaction, then we'll talk about multi-version concurrent control next week, and there'll be actually two timestamps, the time you start and the time you commit. We'll explain what that is next week. So there's a bunch of different ways you can actually implement this. The most simplest way is probably, well actually the logical kind of simplest, but one approach is use the system clock or what I'll call the wall clock, like think of this as like physical time. And you typically would use the UTC time zone because you wanna avoid daylight savings and all that BS, which comes up this weekend, right? Because you can't have timestamps go back in time, right? You don't want daylight savings hits and now the timestamps I'm using for my transactions are now an hour in the past, right? Because now that's gonna break my serial ordering. For logical counter, just think of this as like, this is a single counter, you just add one to it every time you hand out a timestamp. And then there'll be a hybrid approach which is more common in distributed systems, we'll cover after Thanksgiving, where you can actually use a combination of the wall clock time and the logical counter. And sometimes you can intermix things like with host IDs and IP addresses or something that way it's easier to break ties. But again, the key thing is like we can't have timestamps go back, right? We'll see this when we cover multi-version this week Postgres actually has a problem because they have wrap around because they only use 32 bit integers for timestamps for transaction IDs. So like at some point you'll hit the limit and it wraps around a zero and then now you have transactions that are in the future physically but logically they look in the past and Postgres has to deal with that. We'll talk about how to do that next class. All right, so today we're gonna talk about the two main timestamp ordering protocols. There's the basic timestamp ordering protocol and then there's optimistic occurred control and this is another problem in databases in this course where there's the naming kind of sucks. So there'll be a basic timestamp ordering protocol which is the name of it but it's in the category of timestamp ordering protocols and then there'll be an optimistic occurred protocol which is a timestamp ordering protocol but that is optimistic. Again, I didn't make these names but it is what it is. So the basic timestamp ordering protocol is an optimistic protocol. The optimistic occurred control protocol is a timestamp ordering protocol. Right, buckle up, okay. And then we have time which we're probably gonna write at time but we'll carry over this next class. We'll talk about isolation levels and how to handle other anomalies. All right, so the basic timestamp ordering protocol is we're gonna allow transactions to read and write objects without locks and what's gonna happen is every object in our database is now gonna have two timestamps. They'll be the timestamp of the transaction that last wrote to it and the timestamp of the last transaction that read to it. And that means that anytime we read it, we're gonna update this timestamp and every time we write to it, we have to update the timestamp. And these timestamps and the objects always have to be moving forward in time. All right, and so we just need to make sure that again we don't have weird time travel effects where a transaction in the past updates an object that was written in the future. I'll explain what this is. And so, and I'm putting the word future in quotes again because this notion between the logical time and the physical time. Logically, I have to make sure things are happening always in increasing order but physically I may be allowed to read something, write something before something else logically happens. Again, we'll see examples of this as we go along. All right, so do a read. Well, I should say when my transaction starts, I get a timestamp, right? It doesn't matter how, whether it's wall clock time or a counter, it doesn't matter. So now when I'm gonna read an object, if the object I'm trying to read has a right timestamp that is greater than my transactions timestamp, then I know that a transaction in the future, logically in the future, has wrote to this object before I was able to read to it. And therefore that would violate serial ordering or serialized ordering and therefore I have to abort my transaction. Because I can't read things with a timestamp in the future, right? And then when my transaction gets restarted, I'll get assigned a new timestamp, right? Because that avoids starvation that if I get restarted, I come back and I can always get a newer timestamp. So at some point I will have the higher priority, yes. Yes, so what he's saying is like, and these objects that have these right timestamps and retimestamps, these could be modified by different transactions running this attempt, correct. So these objects are like the global shared database and I have multiple transactions trying to update at the same time. So the timestamps could come from different transactions, correct, yes, yes. Yes, so he says, and he's correct, that like, I'm talking about locks, but I'm talking of this protocol at a high level, when I actually implement it, does that still mean I need to acquire latches on like the pages and anything update and indexes? Absolutely, yes, yes. Yes, so his statement is TS of TI, that's the timestamp to transaction TI that is acquired when it calls begin, but the right timestamp on object X is whatever the timestamp was of the transaction that modified it, correct, yes. No, no, no, no. So, you know what I said about there's two timestamps? That's for MVCC, ignore that. In basic timestamp ordering protocol, every transaction has one timestamp that it's given when it starts, one and only one. All right, so if we don't, if this conditional holds true, then we're TI is allowed to read X and then we're gonna update the read timestamp of this object to be either the, what the current, the max of the current timestamp is or our read time, or our timestamp, right? Cause again, we wanna make sure that we keep track of like, here's the last timestamp of an object that read this tuple, but we don't wanna have that go back in time. It's always gotta go forward. So that's what we take the max. If we wanna ensure repeatable reads is not required by the basic TO protocol, but this is something actually it could do and it'll start looking like OCC, which we'll talk about next. If you wanna ensure that if we go read the same object again, we get the same value, then we can make a local copy in a private workspace of the value of X in TI, right? And this is a shortcut thing to make sure that like if we go back and read it again, and by someone wrote to it, we don't end up aborted. Yes. Under what condition would the timestamp of TI be larger than the? RTS? I have, so transaction one, transaction two, say one and two other timestamps. Transaction two starts, it reads A, updates the read timestamp to two, transaction T1 starts, they read the same object, but the read timestamp is two, that's okay, right? Because I'm only checking whether someone wrote to it. I don't care if you read it and I wrote it and you're in the future, it doesn't matter. That's okay, yes. All right, so the statement is why do I have to do this? You don't, I'm saying this is an optimization you can do, right? If you don't want to have to abort because you would violate this, then you can make a local copy. So he says, shouldn't you abort? We'll get to this in a second. Yeah, like doesn't matter, we might not. All right, to do writes, the check, the variant check is a disjunction or or between either my timestamp is less than the read timestamp of the object we're trying to write to. Again, that means that someone in the future has read the object that I'm trying to write to, and therefore if I was going in serial ordering, they would have saw my write, but they didn't. Or if someone in my timestamp is less than the write timestamp, again, that means someone in the future wrote to this object I'm trying to write to, and I'm trying to now overwrite their change. So if either of those conditions, we don't satisfy either of those conditions, then we have to abort and restart, otherwise we're allowed to write to the object and then we update the write timestamp. And again, we can make a local copy if you want to ensure repeatable reads. Yes. So statement is, doesn't this enforce a more stricter serializable ordering, or does it have a fewer opportunities for parallelism than two phase locking? Yes. But it's, again, it's correct that we care about that first. Yes. So this question is, when I read or write an object, what timestamp I'm giving it? You have to give it with the timestamp or the transaction, not something else in the future, because then, because you don't know what that is, right? All right, let's walk through an example. All right, so T1 is gonna read, read B, read A, and then read B, read A, and then read A again. T2 is gonna read B, write B, read A, write A, right? So T1 starts, again, when a transaction starts, we're assigned a timestamp. So for simplicity, we say T1 has the timestamp of T1. So now it's gonna do a read on B. And so we're gonna go look now in the database, look at the read timestamp column. I mean, it'll be stored in the header of the tuple, right? But you can think of it like a virtual column. The timestamp, the read timestamp is zero. Therefore, we're allowed to read it. We also take the right timestamp. The right timestamp is less than us. So that's okay. So we read on the object, update the read timestamp, and that's fine. Now T2 starts, because there's a context which, it gets timestamp two. It does a read on B. And again, we just go check that the right timestamp is less than our timestamp. It is, so zero is less than two. So we're allowed to read it and then we update the read timestamp to two. Then we do the write on B. In this case here, we check the right timestamp and the read timestamp, they're both either less than or greater than to our timestamp two. So we're allowed to write the object and then we update the right timestamp to two. Then we go back to T1. He does, they do a read on A. And again, the read timestamp of A is less than, or the right timestamp is less than our read timestamp. So we can update the read timestamp to one, that's fine. Now we do a read on A here, same thing. The right timestamp is less than our timestamp. So we update the read timestamp on A to two. Then we come back here now, we do a read on A. In this case here, we're allowed to read the object because the right timestamp is less than our timestamp. But here now the read timestamp on the object A is greater than the our timestamp. So therefore we don't wanna update it. We leave it alone. Right now we do the write on A. Again, check the right timestamp, the read timestamp. They're both less than or equal to two, so that's fine. We go ahead and do the write. And then now we're safe for us to commit because there's no violations. Yes? If you switch to. Yeah, so the statement is, if I switch this read A and write A, this happened before this one, this transaction would write A. That would update the right timestamp to the object of two. This guy then tries to read A, sees that the right timestamp is greater than his timestamp, which is one. So therefore he can't read in the future that he'd have to abort. So his statement is, isn't this the equivalent to just taking a global lock on the entire database and just do serial execution? Well no, because if T one only reads A and T two reads only B, reads writes only B, then there's no conflicts and I can interleave them. Yeah. What do you mean in principle? If you read the same object, sure, yeah. But like two-phase locking, we do that too. Yeah. There's no, like if I have one two of my database and I have a million threads trying to update one thing. Yeah, global lock is the same thing as a lock. Yes. So his statement is, how is this optimistic? Because I don't require locks and I don't require locks. Yeah, so he's sort of right. The squint, if I go check the rewrite timestamps, that's equivalent to trying to acquire the lock and then like killing myself if I can't acquire it. So yeah, sort of. Yes. So it's optimistic in that you're not acquiring, you're not acquiring locks. Yes. This statement is you don't need a lock measure that is correct, because the global state of what wrote and who read and wrote what is in the database itself. Yes. So the statement is in this case here, if I have to abort, would I have to roll back everything I wrote? Yes. Locks or latches? It would be latches, right? Because you'd protect the table, you protect the pages or whatever it is that you're rolling back. Yes. Yes. Yeah, so his statement is, and he's correct, this protocol favors the newer transactions because it's the older transactions that they're the ones that would violate the invariance we check, and therefore if someone in the future is over-writing them, then they have to kill themselves. Correct, yes. We'll see OCC, we can actually go both directions. Yes. Yeah, so the statement is if I abort, do I need to roll back the writes and roll back the timestamps? Yes. Yeah, so this protocol, this is a really simplistic view of this. The timestamp audience do I want to focus on? In practice, you're absolutely right. You'd have to maintain the rewrite sets and know who read what, keep track of all of these dependencies, and then roll back appropriately. Yes. All right, so let's look at where we do abort. So in this case here, T1 is gonna do a read on A, a write on A, then a read on A. T2 is just gonna write on A. This is called, in database, it's called a blind write, meaning like I write to an object without reading it first. It does occur, it's not super common, but there are situations where people do this. All right, so T1 starts, it gets, so it has the timestamp of T1. So it does a read on A, it's allowed to do that, updates the read timestamp. Now T2 does a write on A, it's allowed to do that because the read timestamp is less than two, and the write timestamp is also less than two. So we do an update on A and update the write timestamp. Now this guy wakes up again, he does a write on A, but now it would be a violation of the write timestamp because T1's timestamp is one and that's less than two. And so someone in the future has written to this object before we could, and therefore we have to abort and roll back, right? So one thing to point out though, does it actually matter, right? So again, the serial order should be T1 followed by T2. So T2 did the last write on A. So does the world really need to know that like what this write actually was? No, right? We could just ignore it and that would still be correct. So this is an optimization called the Thomas Wright Rule. The basic idea is that if I try to write to something and someone has read it in the future, I can't allow that, but if I try to write something and somebody else wrote it to in the future, then I can just ignore my write because nobody would have saw it anyway, right? And I allow the transaction that tried to do that right that would have a conflict, let it keep on running without aborting. So now this violates the timestamp ordering protocol, the timestamp protocol that we specified, but it's okay, right? Because no one ever saw the T1's write, they saw T2's write. So this write rule, so this is in the literature, I think it came out in like 1970 or 79 or something. And I was looking at the guy that invented this. There's a guy named Robert Thomas, Robert H. Thomas. And there's no, like he was at this think tank in Cambridge near MIT called BBN, who did a lot of early work on the databases in the 80s and 70s. I was trying to figure out whether he was dead, I can't figure that out, but then it turns out, this guy is also this Robert Thomas guy who invented this write rule, he actually invented the first malware in 1970, like the first worm called Creeper. And what's interesting about it is like, all the security literature about malware, they refer to him as Bob Thomas, but in all his database papers, he's Robert Thomas. As far as I can tell, it's the same dude, right? Bob H. Thomas is very, very likely to be Robert H. Thomas to be at BBN in the 1970s, 1980s, right? So the guy that invented this rule invented the first computer virus, one of the first ones. All right, so let's see how we do the Thomas write rule. So now do read on A, that's allowed, do write on A, that's allowed. Now I do the write on A here. Again, the timestamp of T1 is less than, the right timestamp of this object. So normally we'd have to abort, but we just say, all right, well, let's just ignore that, right, T2 is already committed, that's fine. We've already told the outside world that T2 is committed. So we just ignore that, let T1 read its own write, all right? And we don't update the timestamp, and that's still okay. Yes, her statement is, would that lead to a cascading abort in this example here? Yeah, so her statement is, T1 is seeing a value that it should not see. At this point up here, down below, it's gonna read its own write, yeah. How do you make what happen? We'll see this in OCC, but multi-version can do this too, yes, because you know your timestamp, you know what your timestamp is, so you wanna see the version of the object in your timestamp, or you have a private workspace as an OCC, yes, yes, so same it is, we create the illusion that T1 actually wrote to A, even though it didn't happen, yes. It's swapping what, sorry? So you're saying logically this is equivalent to letting T1 do its thing, and then followed by T2? Yes, yeah, that's serializable ordering, yes. You got a question? Sorry, yes. Yes, again, everyone, like you, this is PowerPoint, and I'm showing you like, simple diagrams. In the actual limitation, you still have to protect your data structures with latches, all things we talked about up until now, absolutely yes. This is a higher level logical thing, assuming you have the lower level physical latching implications in place, yes. So same it is, and on the Thomas Wright thing, would I still abort when I do the read here T1? Again, if I have a private workspace, I would put my write of T1 into something that T1, sorry, put the write to A into something T1 can only see, then when T1 does the read, it says, oh, okay, it's in my private workspace, it doesn't actually go to the global database. This is what OCC does, we'll see this in a second, yes. Okay, so again, the basic time-safe only protocol, we'll guarantee that we'll generate a schedule that's conflict-serializable, if you do not use the Thomas Wright rule, if you use the Thomas Wright rule, you may end up in view-serializable land. But in general, there'll be no deadlocks because no transaction ever has to wait, but we could have the problem that someone brought up before where if I have a really long transaction, that does a lot of updates, and a bunch of other small transactions that are really short come and keep updating the things that I'm trying to read or write, since my transaction is really long, I'm gonna have a really old timestamp, since it's getting older and older over time, and therefore the youth is gonna keep coming to my house and killing my transaction, right? So this is unavoidable in this protocol. So I will say also too that this is an older protocol, could be like the late 1970s, early 1980s. I'm not aware of any system out there that actually implements exactly what I'm describing here today. And therefore I don't know if anybody actually does the Thomas Wright rule, but it's in the literature, it shows up a lot. But I would say the reason why I'm showing you this is because it's gonna give us the building blocks to OCC and multi-versioning. So if you understand this, you can understand how we're actually gonna do ordering of transactions and reads and writes in OCC and multi-versioning. So part of the reason why nobody implements this is that it's really inefficient because now every single time I read a tuple, I actually got to write to the tuple potentially because I have to update the timestamp. And I need to persist that timestamp on disk, right? Because if I crash and come back, I wanna know what the retimestamps were. How is that true? Now you could, if you crash and come back, you could potentially reset to the zero because there are no transactions active. You would roll back everything. But it may be the case that a transaction updates some pages, then you run out of memory and it has to flush those pages out the disk to make room in your buffer pool and you may have to bring them back in later. So in that case, you would have to write the timestamps out. But again, this basically means that every read becomes a write and that's terrible for performance. Yes? The same as does interface locking suffer the same problem. I don't write the lock manager contents out the disk. But if I update a page, we'll see this when we talk about logging recovery, if I update a page and my transaction has committed yet, I need to swap it out. I gotta write the dirty page out the disk. Yes. But like in two-base locking, I could read a billion tuples and yes, bringing them into memory, but when I run out of space and need to evict them, I didn't update any timestamps as you do a basic timestamp ordering. So when I evict them, I just drop the frames. So in that regard, it's more efficient. And as I said also too, there are problems with this basic protocol is that the longer running timestamps get starved out because the likelihood of something, some other transaction is gonna read something in the future before the long running transaction gets to it, increases over time. So you say that your question is, you can or cannot? You say statement is you cannot evict dirty pages before transaction is about to commit? Yes, you can, we'll get to that. Yes. Give me two weeks. Yeah. Not all systems do that. Most of them do, right? We'll talk about stealing in the buffer pool in a second or in two weeks. And this is why again, I wanted to teach you the basics of a buffer pool. So when we go back and talk about transactions and talk about now doing recovery, you understand the buffer works and now you see how we extend the protocols to do eviction to account for dirty writes from transactions that are still active. Okay, so if we assume transactions or conflict between transactions are gonna be rare and that most of the transactions are gonna be short lived, then using locks and using these timestamps, updating timestamps over and over again is gonna add this unnecessary overhead. So a better approach convention than to build a concrete protocol that we optimize for the use case where most transactions are not gonna conflict and most of them are gonna be quite small and in practice, most transactions are short in most systems, in most applications. Yes, you have bulk updates where like people load a bunch of stuff once a week, once a day, but like just think about like when you go load a webpage, it's not doing a lot of writes on a bunch of different objects, it's doing the writes on just your stuff. So conflicts are gonna be, conflicts are potentially are more rare in most cases. So this is what OCC is gonna do for us and it's gonna be an extension of the basic timestamp ordering protocol that we talked about, but instead of having this optional private workspace where I said we could put our writes into that so that when we read them again, we go to the private workspace and OCC you explicitly have to have this. So anytime I'm gonna read an object or write to an object, I'm not gonna do anything in the global database, I'm gonna have this private workspace that's specific to my transaction and I put my copy of the read there and I put all my writes in there. And then when now transaction goes to commit, you go look within your private workspace and you go look at what other transactions are running at the same time or in the future or in the past, we'll explain both ways, and you go to see whether the changes you're trying to make from your private workspace into the global database, you see whether there's any conflicts. And if yes, then you have to abort. If no, then you're allowed to apply your changes to the global database and everyone can now see your changes, right? So 2PO was invented in I think 1976 at IBM. OCC was invented in 1981, actually here at CMU by HT Kong. And he was not a database professor, he was a networking person, of course like what's the point of having a network if you can't connect to a database, right? So he was like an old school systems guy, he's not here anymore, I think he's still alive. That's not a joke, I bet he's a really nice guy. So he got hired by Harvard in the 90s because the Harvard CS department was a train wreck and they hired him to like fix it. No, no, I should, no, no, I interviewed there, there were very nice people, but CMU is better, CMU is CMU, right? No, my wife wanted to stay in New England, I got the offer to CMU and in exchange for not staying in New England, she moved to Pittsburgh with me, she's my girlfriend at the time, and she got a puppy. Yeah, so that was the deal we worked out. So, what's the point I was gonna make about this? All right, so in my opinion, there's this guy at MSR named Phil Burstein, he's fantastic. He's probably the world premier expert in concurrently troll and transactions, right? Like a lot of the stuff I'm teaching you here is from like seminal papers he wrote in like the early 1980s, he built one of the first distributed databases in like 1970s, like this guy is pure genius when it comes to databases and he got denied tenure from Harvard because it was like early CS and they applied all the rules, like Harvard applied all the rules for getting tenure that they would give for like biology and all the sort of traditional sciences, but in like computer science, we don't write journal papers, we write conference papers, we crank out things real quickly. So they were like, what is this database stuff? What is this transaction stuff? And to me, it's a groundbreaking seminal work and the Harvard CS department's like, I don't know what this is and they denied him tenure. So they hired HT Kong, they stole him from us to go fix their department. We'll cut that out of the video. Okay, I met him when I interviewed there, he was super, super nice. He was a really nice guy. All right, so OCC is gonna have three phases. And again, there's another one, the naming is gonna suck, okay? So bear with me here, this is what HT, this is what the virtual paper talks about. So there's a mere read phase for the transaction where the transaction has to go to do reads and writes, but it's gonna apply those, it's gonna copy all the things that reads and all it writes into its private workspace. I just think of this as like some chunk of memory or some can be backed by the buffer pool or it doesn't have to be, but like it's specific to this transaction. No other transaction can read in that private workspace while transaction is running. Then we have the validation phase and this is where the transaction says, I wanna commit. So as soon as you call commit, you automatically enter the validation phase. You now get a timestamp and you see whether your transaction has made changes that conflict with other transactions, either in the future or the past. You only go in one direction. We'll talk about both of that. And then the right phase, if your validation succeeds, then you're allowed to apply your private changes to the global database. Otherwise, your transaction will abort and have to restart. All right, so let's look at an example here. So first thing to point out is we're getting rid of the read timestamp in the database. Now we only have the write timestamp, right? Because all we care about is what is the last transaction, what's the timestamp of the last transaction that wrote to this? And then now here also I'm showing in the schedule we have these regions for the three phases, read, validate, and write. And again, this is not something you explicitly would call in SQL, underneath the covers, there's the systems doing its own bookkeeping to tell you what phase you're in. And I'm showing the commit down here outside of this, but it's basically, the transaction would say I wanna commit, automatically enter the validate phase, but then sort of logically the changes are not applied and you don't tell the outside world you commit until you complete the last two phases. All right, so T1 starts, begins with the read phase as soon as I call begin. And then we set up our private workspace and it's gonna have the same thing that we did in the global workspace, the object. In this case here I'm showing the values to give it a key value pair, and then the write timestamp. So now the transaction wants to do a read on A. I copy that whatever the current version, the current state of object A into my private workspace along with the, whatever the write timestamp was. Now I switch over here, T2 starts, same thing. It's in the read phase, created private workspace. It does a read on A. It also then gets a copy of the same object. So in this case here we have the global database and then each transaction has a copy of A in its private workspace. So then now T2 goes to commit. And so now we enter the validate phase. So this is when we actually get a timestamp in OCC. Again, basic timestamp ordering protocol, you got it when you call begin. In this world you don't have a timestamp until you call commit, until you try to validate. So in this case here for T2 we would look to see, did we write anything in our workspace that we need to apply to the database? In this case it's a read on transaction. So we're fine there. And the right timestamp of our object A in our private workspace matches the one in the global database. So we're allowed to automatically enter the write phase. There's nothing to write back. Our transaction allowed to commit and we're done. We blow in private workspace. T1 would start, it does a write on A. And then at this point here, we don't have a timestamp yet. So when we update the object in our private workspace, we set our write timestamp to infinity. Like it's just something in the future, we don't know what it's gonna be yet. And then now when I go to validate, I have to go check to see whether is there any other transaction that wrote, read or wrote to the thing that I wrote to that has, would have a less smaller timestamp than I do or it's in the future. In this case here, it's time stamp two. When we apply or change, we update the timestamp in the global database. Right, and then at that point that we're done and we've committed. All right, so let's go through the three phases. The validate phase is the more tricky one. So read phase is as I already said, we just track the read write set, such as actions in a private workspace. And every time I read them, I always go, every time I read the same object or same record again, I look at my private workspace first to see what is there. So for this, I'm ignoring how we handle indexes. I'm actually ignoring how we actually would implement this because you think about it like, I would do a lookup on an object. Typically I want to go through an index and find it. I need to, how would I actually do this if I have a private workspace? Because I don't want to update the index to point to my private workspace because I don't want other people to see this. So for now, we can ignore how we actually do this. Yes, yes, the question is, in this case here when I do the write on A, I don't have a timestamp yet. So when in my private workspace, I set the write timestamp to infinity. Then I want to go to validate. I now get a timestamp. So I want to check to see the question, do I write to this and then check? I mean, it doesn't matter. Yeah, when I check, it's not infinity. It's, I'm checking whether two is less than the write timestamp of the transaction up there, sorry, of the global state. The action were complicated because you have to go maybe check the private workspace of other transactions. We'll get that in a second, yes. So this question is, if T1, T2 got timestamp two, that means it would happen down here, would it crash? If you're doing forward validation, meaning you're looking at transactions in the future to see whether you conflict with them, T1 would actually, the first guy would actually abort. If you're doing backwards validation, this guy would abort. Because somebody wrote just something in the past that you missed. We'll get to that, we'll just get to that now. Okay, so again, when T1 commits, invokes commit, then we enter the validation phase and we have to check whether we have conflicts with other transactions. And again, we want to guarantee that the schedule we produce is equivalent to one that has a serial ordering. So we're gonna look for a read-write and write-write conflicts with other transactions. And we're gonna make sure that our conflicts, if we abort a transaction, we're always gonna go in one direction. As a reoccurring theme we see throughout the database is to avoid, deadlocks avoid issues or incorrectness problems. If everybody goes, always checks things sort of the same direction, then we can guarantee that we'll end up with a serial ordering. We saw the same thing with the moon and weight versus weight and die. So the two approaches do backward validation and forward validation. So with backward validation, the idea is that we're going to check to see whether there was other transactions that wrote to something, or read something, or wrote something in the past that we missed when we had to do a read or a write. So say we want to commit transaction two at this point here. So we don't care about transaction three because they're still running. They haven't applied any changes yet. So we don't care if we miss them, right? We care about going backwards in time. So we would look at what are the objects that we read and wrote, check to see whether the, if we'd read to it, whether the right timestamp has changed, meaning a transaction in the past wrote to an object that we should have saw if we were running in serial order. And if we didn't see that right, then we kill ourselves, right? If it's a right-right conflict, that's okay because we would just overwrite what they've done. So we call this sort of the validation scope. This is checking to see whether something in the past, was there a logical right in the past that we missed when we read that object? Forward validation is going the other direction. So this one is T2 is trying to commit here. I don't care about editing in the past, right? Because if this guy wrote to something that I wrote, sorry, this guy wrote to something that I didn't read, then they would have killed themselves, because I'm always going in one direction. So this guy only cares about what else is running at me at the same time I am, and did I write to something that they didn't read yet? Now they may read that object in the future after I've committed, but that's okay because my changes would have been applied to the global database, and therefore they would have saw it. So let's focus on the forward validation. Again, we can go to details of both of them, but they essentially work the same way. If you understand one, you don't understand the other. So again, every search actually gets a timestamp at the beginning of the validation phase, and then we're gonna check the timestamp with all other committing transactions, sorry, for the committing transaction, check with all other running transactions at the same time, and you're basically gonna peek in the private workspace and say, okay, what did they read? And then therefore they missed something that I wrote. So we're gonna allow our transaction commit, sorry, if we allow transaction commit, then all the three conditions must hold. So the first one's sort of obvious, it basically says that if TI completes all the phases where TJ even begins execution, then there's never possibly could be a conflict, right? Cause I ran, I committed, and then you show up a day later, you know, of course you can see my change, we weren't running at the same time. So that one's pretty obvious. And I'm only showing two transactions here, but like, you can imagine in a real system you would have maybe hundreds of thousands of transactions, and this is obviously easier than checking the rewrite sets cause it gives the time set, you know, you haven't started yet, so that's easy. All right, the next validation step is that TI is gonna complete before TJ starts its write phase, and TI did not write to any object that was read by TJ. And this is basically saying my write set has no intersection, the intersection of my write set with the other transaction read set is an empty set. So I mean, I did not write anything that they wrote. Sorry, I did not write anything that they read. So visually it looks like this, right? So here we do the validation phase, and T1 already read A, wrote to A. And so in this case here, because T2 read to A, it has the right time stamp of zero in its private workspace, T1 has to peak in the private workspace, say okay, you read this, but I wrote to it as well, my time stamp is infinity, I'm way in the future, you should have read that and you didn't. So therefore I have to kill myself because if you go then commit, you would have committed without seeing my write. Yes. Then you can't just check the database, you have to check the other work spaces. So the statement is for this one, you can't check the database, you have to check the other work spaces, yes. But then you need to have T. So the statement is like the system needs to know what are the other private work spaces that are available. Yeah, what's wrong with that? If there's multiple threads, we check with all of them? Yeah, but it's our database, you can do whatever you want, right? Like, is it the most efficient way to do this? Maybe, maybe not. Depends on where the transactions are running, if they're thrown out of the machine, yeah, that's a problem, right? Fill in the same box, it might be okay. Yes. 1599 sequentially, but cannot pause or throw it in. So after you pass T2, T2 is still executing, it's workspace is expanding. And then maybe when you finish 1599, you can do a whole table actually at that time if you do something you like, and then like it gets ready. Yeah, so his point he brings up is that these validations checks are not atomic, meaning I'm not pausing the other threads while I do the checks. Therefore, if it takes a long time to go through 100 private work spaces, like could that be a problem? Because by the time I then go commit, someone else read something that it implied to the global workspace. Yes. So again, for simplicity, assume that it's serial validation, which is not the most efficient but for simplicity, that's the way to do it. The way to handle that is you could hold a latch on the page you're gonna update in the global database while you then go check. Then nobody can come read it. I'm trying to get through this logically, but physically that's the way you would do it. Another example here, so T1 is gonna read on A, write on A, T2 writes on A, reads on A. So I do my validation step here. It's safe to commit T2 because T2 will commit logically before T1. So T2 called validate, it got a timestamp. It said, oh, there wasn't anything in the global database that I missed and there wasn't anything in a private workspace that I missed that I didn't read. So therefore, I get my timestamp here, then there's a context which T1 starts running. It gets a timestamp that's in the future of T2. And therefore, T2's writes would not have been read by T1's writes would not have been read by T2's read. So therefore, they're allowed to commit. All right, so the last step is the, we just checked that the read set of my transaction doesn't intersect with the reset. My write set doesn't intersect with your read set and my write set doesn't intersect with your write set. All right, so go into the example here. We do validate, we get timestamp T1. It's safe to commit because T2 will see the database after T1 has committed. So it'll see the write that T1 makes to A, then when this guy then reads it, they will get it. Because we apply the change, update the write timestamp. We go away, then we validate, or let me do the read and T2, it would then see that change. All right, so the last one is to do the write phase. Again, there's propagating the changes we made from the private workspace into the global database. Ignoring how we handle indexes for now. So sort of the question he brought up about like, oh, isn't this really, how do you prevent this race condition? You can, for simplicity, you just do serial commits where there's only one transaction that can be in the validation and write phase at a given time. If you wanna do parallel commits, you can use more fine grained latches to protect the data structures. And the way you avoid deadlocks is that you acquire the latches for the things you're gonna update in the global database in primary key order. And that guarantees that all threads or client keys or latches in the same direction and you don't have a deadlock because I hold the latch in key two, you hold the latch in key one, and we try to get latches and have a deadlock. If everyone goes in the same, swims in the same direction, you don't have this problem. Yes. Even though you hold a global, it is single validation or slash write factor latch. Wouldn't it still have the problem that we mentioned? Like the other transaction are in the read phase so they can read. So his statement is, would you show up, like if you hold a global latch for the validation and write phase, would you show up the problem he mentioned? Yes, so that's why you acquire latches on the pages where the tuples are. Okay, so OCC is great when the number of conflicts is low, right? It's even better when most of the transactions are read only, right? If the database is large and the workload is not skewed, and there's a low probability of conflicts, this is gonna be preferable over locking because, yes, I may have to copy big copy things in my private workspace. We can talk about in F-COS how to do that more efficiently, but I'm not going into a global lock manager or to acquire locks and things. I can read along and I'm assuming everything's okay and the validation step will be fairly straightforward. The overhead's gonna be, though, is copying data locally. And we'll see multi-versioning is gonna have the same problem. Yes, that's sort of unavoidable. We talked about the validation step and the right phase is those bottlenecks you have to acquire latches to make sure those things happen in quite order. And one big problem, though, under OCC, though, is that the aborting transactions could potentially be more wasteful than two-phase locking because I won't know until I actually go to the validation phase that I'm gonna abort. So I could do a billion updates only to find out when I go to validate that the first two-by-update has a conflict and I have to roll back all my changes. You wouldn't potentially, you would not have that in two-phase locking. Although, of course, in two-phase locking, you could have dead locks and you roll back changes. So there's no free launch of databases. I'm not saying OCC is better than 2PL. It's, you know, there's trade-offs for both of these. Yes. This question, in reality, is forward validation used more than backwards validation? Actually, I don't know the answer to that. I have to go, I have to go look to see what systems, so most systems are gonna do 2PL with multi-versioning, which we'll cover in the next class. I'll have to go look to see which one's actually support. OCC. All right, we only have four minutes left, so I'm gonna do a teaser. I'm gonna show a problem, not tell you how to solve it, and we'll solve it next time, okay? So, so far, oh, you have the slides so you can look ahead. So everything we've talked about so far has been assuming we're dealing with fixed-size databases. Like all we're doing is reading and writing objects, right? Updating existing ones. But now, there's transactions that wanna do insertions, updates, and deletes. Sorry, insertions and deletes. Now, the number of 2PLs we could have in our tables could change. And this exposes a bunch of new problems, or new type of problem that we haven't talked about so far. All right, so this is called the Phantom Problem, or the Phantom Anomaly. Or I said there was sort of the three basic anomalies, dirty reads or unrepeatable reads and loss updates. This is the fourth one. There's actually a fifth one we'll talk about next week as well called Write Sq. But, so say I have now two transactions, T1, T2, and instead of showing reads and writes, now I'm gonna show SQL statements because now we can talk about the reading data that doesn't actually exist. So we have two transactions. The first guy wants to get the oldest person that's in this people database or people table where the status is lit. And at the same time, there's another transaction that's gonna wanna insert into that people table with the status of lit. So when I run T1 first, we find somebody who's 72, who's still lit. Good for them. Then there's a context which T2 starts running and what do they do? Well, they insert into the people table where someone at the age of 96, the status of lit. So now when I run T1 again, or go back to T1 and I run this SQ again, now I get 96, right? So you may think, oh, this is like an unrepeatable read. Unrepeatable read is like at the lowest level that I'm doing on a single object. So in this case before, there wasn't a record 96. There wasn't a person that was aged 96. So I suddenly didn't see it. So two-phase locking's not gonna help me here because how do I lock something that didn't exist? And how would timestamp ordering help me because there aren't timestamps to even check at this point here, yes. All right, so the statement is in this case here, why wouldn't I acquire a share lock on the table to avoid this problem? Yes, that would solve that problem, yes. But assume you didn't do that. Assume you're doing like two-pole level locks. Yes. All right, so change max to count. Then you get different values. Yeah, if I lock the entire table, yes, I don't have this problem, but then it's like, then it's basically a serial ordering. Yes. What's the point of this class, right? It's like, we can do anything to do this. Yeah, well, what a good reason, all right. All right, so the reason why this happens is because if we're doing two-phase locking, can only lock things that exist. I understand you can take higher level, higher echo locks. That is gonna be one of the solutions we will use. But the notion of complex realizability that we talked about so far, really only, so far we only talk about things that have to exist when I start running. If it doesn't exist, then I have this problem that's called phantoms. So next class will be how we actually solve this. And then we'll talk about multi-versioning, which I've been loading to you for weeks now. We'll finally get to it, okay? All right, guys, have a good weekend. See you next time. Super snakes.