 All right, so last class we sort of rushed at the end and we need to get to this last piece. I want to get through how to deal with phantoms, isolation levels, and then we'll jump into the MVCC, a multi-version concurrent control. So recall last class that I mentioned at the end that all of the examples when we talked about 2PL and we talked about OCC and basic timestamp ordering, all of these were assuming that the database was fixed in size, meaning we were only doing reads and we're only doing updates. We weren't doing inserts, we weren't doing deletes. If we had to handle these now the data is changing, we could have a problem like this. So assuming we're saying we're running under 2PL or OCC, it doesn't matter. And then we're running this in serial. I'm trying to run this with full serializability. We have the example here. It's the same query as I said before. I changed it from max to count, where I do a select beginning, do select count on the age, say there's 99 people in the table to start with. I just plug this in. Then now there's a context which T2 then runs. They do an update to the table, insert a new record. But then now when I run this again, now I get a different count. Now we said, of course, we could lock the entire table and that would avoid this problem. But if you have a billion 2PLs, are you really going to lock the entire table? One billion rows just to insert one thing. No practical system could do this. So the reason why this is happening is because if we just assume it's 2-phase locking, with T1, in my example, it could only lock things that exist. Again, knowing the coarse-grained table lock. So it only locked the existing rows, got the share lock on them as I read them, and that didn't prevent the other transaction from inserting a new row. So then now when I come back and run the query again, now I get a different result, which would not happen if they were running in serial order, which is what we want to try to achieve. And so this is known as the phantom problem. The idea that 2PLs can appear and disappear like a ghost or a phantom or apparition while my transaction is running and the basic 2-phase locking protocol stuff and the OCC is not going to handle this. So there's three ways to handle this. And so the spoiler would be this is probably the most common one. For MVCC, it's a whole different beast. We won't discuss in this class. We'll talk in the advanced class. But index locking is what most of the traditional systems will do. Predicate locking only exists in the literature. There's one system that does an approximation to this. This would be ideal, but this is like a unicorn. It's very difficult to do. And then the top one is reaction scan. It's the simplest one. And you only typically see this in memory systems, because you don't want to read from disk again. So go through these one by one and just see how each of these approaches would solve this problem. So the first approach is do reaction scans. Basically, while my queries are running, I keep track of my where clause. And I keep track of not necessarily the read-write set. Well, I check the read-write set of anything I read, even though or scan, even though I may be doing update. So what I mean by that is I can have an update query. And all I need to track is what things I saw in my where clause, and maybe not necessarily what things I actually ended up updating. So then now when my query commits, or I'm sorry, my transaction commits, the database system is going to go look at what was the scan portion of each of the queries that I ran during this transaction. And I'm just going to run them again and then see whether I get back the same result the second time. If I do, then I know there's no phantom. There's nothing that between the time I ran the query the first time, and the time I go to commit, nothing didn't magically appear or disappear. So I know that I'm running in serializable ordering. Pretty straightforward, pretty easy. And again, this is I don't know. Well, there are some systems like DynamoDB and Fauna can do this where you, again, do this with reconnaissance transactions where you run the transaction once, run all the queries, keep track of the read-write set, and then you go to commit, and then you actually check to see whether you get the same result. Hecaton does this, which is an in-memory engine for SQL Server. But most systems don't do this because the scans that could be really big. And therefore, if I had to swap things out on the disk, I'm basically doubling the cost of every query. The next one is predicate locking. So this is the original locking scheme approach that IBM invented for System R back in the 1970s. And the way they think about this is it's not actually taking physical locks on actual tuples or objects in the database. It's more this notion of a logical lock where I have this multidimensional space to say, here's all the possible values I could have for every attribute in my table. And I sort of, what's a multidimensional, not polygon, but like a region within that space that corresponds to my query. And then I just check to see whether there's another multidimensional space from another query or another transaction that intersects in any way with mine. So again, this was never actually implemented in practice. There is a academic system called Hyper. It was academic by the Germans and then it got bought by Tableau and got bought by Salesforce. It's a real system, but they use something called precision locking, which is a, it's an approximation of this predicate locking, but it does it just for, it only really works for in-memory systems, but we'll cover this in the advanced class, not here. The basic idea is this, right? So here's our two queries we have before, the select count on the age, and then somebody inserting things. So without actually even running the query, I can just say, all right, for basing this where clause here, here's this region, it's sort of simplicity reasons, it's a two-dimensional projection. But here's this region that corresponds to this where clause of status equals lit. And then for this other one, here's the insert query. Here's the reason it corresponds to where age equals 30 and status equals lit. And so I would know that if there's an intersection of these regions overlap, therefore I would know what they have, they have a conflict, and therefore I could be at the reason about which of these two is allowed to proceed, which one has to block, right? Simple where clause of one predicate, yeah, you can kind of figure this out, but again, think of like really complex queries with nested queries and joins, it's kind of impossible to, or not really tractable to do this in real time. So this is why nobody actually does this approach. The one that's more common is index locking schemes. All right, and then there's basically four approaches that I'll talk about. And I'll just go to add a high level with these R and just you'll see how we can then use basically the indexes that you guys built for in project two, plus the lock manager that you built in project four, the combination of these two things can actually help you achieve serializability with avoiding fan-ups. All right, so the basic idea is that with key value locks is that we're gonna lock a single value in the index. And what this is gonna allow us to do is now it's also take sort of virtual locks for keys that don't actually exist yet. That's the basic idea here. And then we would know that people are then scanning along different regions of data and someone's trying to insert into those regions even though a value may not exist, the region would be protected by one of these index locks and then we could block and avoid having fan-ups. All right, so we have a real simple key value lock here to say, okay, I'm locking exactly key goes 14 in my B plus tree. And you wouldn't actually store this in the index itself. You store this in your lock manager. It would know that you're storing something in the lock manager that corresponds to a key at a particular location. And it's sort of a logical concept. And it's what necessary to say it's not like a latch you would store in the page in the B plus tree, right? Because when you go to commit, you have to unlock things you don't know where that page is gonna be because that lock could have got moved around. So this is a logical thing. You know how to go to the lock manager and find this. I can also do gap locks and this will handle again, non-existent keys in ranges, right? So say this is my leaf node. So I have 10, 12, 14, 16. So I can have a value in between 10, 12 and 12 and 14 and so forth. So I can take a lock here where I want 14 to 16 exclusive and this will prevent anybody from inserting something right in this range here, right? Key range locks is sort of the combination of all these where now we can take within a single lock request we can take an actual key and or multiple keys and multiple ranges at the same time, right? So we can take a lock like this where you say I want the next key from 14 inclusive up to 16 exclusive and that'll cover anything at 14 exactly and then anything comes after that. So that would be called a next key lock or you can take a prior key lock where you would say I want 12, sorry, 14 inclusive and then 12 exclusive. So you get everything within this range here. So you typically only take the locks in one direction so you only take only prior key locks or you only take next key locks. You don't want to mix them because you have deadlocks, yes? So the statement is this would work if you are, if you're trying to, if someone's trying to modify something within a specific range but if you, if someone could be at the very end at the very beginning at the same time they have to lock the whole thing. Yes, it's on a modal, yeah. Yes, his question is what is the gap? It's not even deleted, something doesn't exist. So in this case here I have the prior key lock from 12 to 14, 12 exclusive, 14 inclusive. That means that I, like quite as gap lock here then I handle 12.1, 12.2, 12.3, right? Anything greater than 12, if it gets inserted will be covered by this lock. Yeah, so it is future insertions would be locked, yes. All right, so now we can combine all of this and apply the actually hierarchical locking technique that we talked about under 2PL where we can take intention locks on wider ranges and then get more narrow, get more fine-grained locks that are at a higher level as we go down. So in this case here we could take a range lock or intention exclusive lock on this range here and then an exclusive lock for this range here and then maybe just for this one as well we take exclusive lock on a single key, right? And it's all the same stuff we talked about before where these essentially hints up above and say here's what's happening within these range of values apply in this case here. All right, so that's basically how index locking works and again it's a weird concept because it's like even though we're relying on the index to tell us how to apply these locks it's more of a logical thing to saying here's the range of values that we're protecting whereas in 2PL we were protecting like physical things like physical tuples, physical pages, physical tables. All right, but it's unavoidable because again, in the case of my phantom the predicate is essentially what the logical predicate is essentially telling us where we would have the conflict and we need a notion or something in the system to be able to protect ourselves from this. Yes. Your statement is this will only work if you're deleting or inserting and you actually have to change the index. Yes, because think about it, if the where clause does not account for like if the where clause for the one transaction that could experience a phantom doesn't actually scan something or just scan, do a lookup that would then be effective by whatever the modification was that's creating the phantom then yes, you would have to use the index. Yes. Why do you have to use the, can you just imagine to have like a higher level of abstraction that just says like a range of values? Yeah, so his statement is why do we need an index for this? Why not just have a higher range of values? I mean that's essentially what the predicate locks are doing. Yeah. So the statement is for my example here which approach would actually be better? This is where the index would help, right? So if I have an index in this case on status then like I could take the range lock, you take the key value lock on status and they wouldn't have to take the individual locks for or I would take the individual key value lock on status equals lit and that would prevent anybody from inserting something that would have the same status that would cause the conflict, right? In practice, the way to think about this is like, oh, okay, does that mean I need an index for every possible column? Cause I didn't ever know when these conflicts could occur. Most of the columns in tables like for OATB workloads, you're not actually using them for lookups on ware clauses. Like I could have a hundred row or sorry, a hundred columns on my table and I would use a subset of them actually for my ware clauses, right? Like I want to get your data out but I don't need how to do this. If I'm looking for your account record, you know, you on Amazon, I wouldn't look up like your zip code or your city necessarily. I want to look up your account ID. So the where clause would have the account ID. So we just need an index and to protect that for those lookups. Okay, so this just jumps into so related to his question like this seems all super expensive to do and it is, right? And I sort of said at the beginning when we started talking about the current control, I'm going to teach you guys serializability because that's the, that's sort of again, the gold standard protection you want to have in a database system when you're doing transactions. But I might have already said this before. Most systems don't implement serializability by default. Some systems don't actually implement serializability at all. Oracle doesn't do that, right? But Oracle is the, they made more money than every database in the world, combined probably, right? So why, why is this okay? Well, because for a lot of times people are not willing to pay the performance penalty for something that's kind of hard to understand and reason about like the correctness of transactions. And so by default, most systems are going to have not serializability but a lower level of consistency, a lower level of isolation. And they're going to choose this because to do this because you're going to get better performance. And if you really care about protecting your data and avoiding race conditions and all these other problems that we've talked about, these anomalies, then you jack up your isolation level to go to serializable, if your system supports it. So this is what the isolation levels are going to do for us in a database system. We're going to be able to control to, to what extent will a transaction incur anomalies or problems from other transactions running at the same time? Again, if I'm only executing maybe one transaction a second, the isolation level doesn't mean anything because no other transaction is running at the same time and therefore I won't see another number anomaly. But in a highly concurrent system with a lot of active transactions then the likelihood that you could be exposed to one of the anomalies that we've talked about before could go up. And again, so in exchange for not seeing these problems you pay a performance cost. So these are the basic anomalies we've seen up before. There's actually a fourth one we'll talk about in a second. Right, so the ANSI standard for SQL specifies four isolation levels. And exactly how the different database systems are going to implement these things is going to vary widely. Like in Postgres, you say you want to run re-node committed, which is the lowest isolation level. They actually don't support that. Right, because they're going to be doing this NVCC stuff and they would actually have to do more work to make you do this. So they just don't do it, right? So going from top to bottom, like this is going from the least amount of protection to the most amount of protection. So read uncommitted basically says all the anomalies, the dirty reads, the phantoms, the unrepeatable reads, these could happen potentially. Uncommitted, which is usually default for a lot of systems, is where you'll be susceptible to phantoms and unrepeatable reads, but you won't read uncommitted data. Repeatable reads, get rid of the unrepeatable reads, obviously, and then, but you still may have phantoms, and then serializability means the no phantoms, no dirty reads, and all reads are repeatable, yes. The loss of write problem, that comes under, that would, you would not be exposed to any of these because that's dealing with cascading boards. So if you allow for read uncommitted, it actually would have this potentially. I should not take it back, sorry. For lost updates, the, under two phase locking, if someone holds the write lock for an object, I can't overwrite it, yep. So again, like we'll see what the, we won't do benchmarks here, but in practice, of course, obviously, I just talked about all this index locking stuff, that's not cheap, and so a lot of people probably don't need that protection, but trust me, there are cases where you do want it, and this actually came out in the news yesterday. So there's this guy who just got busted for hacking $3.3 billion in Bitcoin back in 2012. If you go read this, the actual attack is pretty simple because whatever the exchange they were using on the Silk Road apparently wasn't using transactions. So he would just issue like 200 withdrawal requests the exact same time, and the logic on the web server was basically that debit credit thing I showed before, check whether your account has this amount, if yes, then let you withdraw it. So he would run these at the exact same time, it wasn't in the context of a transaction or they wasn't running with full protection, and so the all 200 transactions were happening at the same time, would all check, does this person have this amount? If yes, then withdraw, so they'd all withdraw at the same time, and he bled them dry, right? And then they finally caught him 10 years later. So there's a lot of examples, there's a couple other Bitcoin examples where crappy websites weren't using transactions and they got lit up. Yes? The feds don't mess around, right? Like, so even, is Silk Road illegal? No, can you do illegal things on it? Yes? I think if you sell drugs on it, yes. Actually, no, they took that guy down too, yeah. Take that back. Whatever, this dude, he stole, it wasn't worth three billion when he stole it, it's worth three billion now because it was 50,000 Bitcoins. He's going, he's going to jail, okay. More of the stories use transactions, trust me, right? So this is another chart again of the four isolation levels, right? Again, read uncommitted, it's not yes it will happen for these things, it's maybe, because it depends on what other transactions are running at the same time and what are they doing and what are you doing, right? But at least for, if you go up to serializability, the system should in theory guarantee that you don't have any of these anomalies. Yes? This question is, what's any table of an unrepeatable read but not a phantom? So you would have like, read an object, the phantoms are the ranges, right? So if I read an object once, read it again a second time, I get it back at different value, that's unrepeatable read. Phantoms would be like, I scan a bunch of data and there's things that I scan at the second time and there's things that I didn't see before that now appear or things that I did see that disappear, right? All right, so the way you can actually implement these is for serializable, it's strict two-phase locking that we talked about before, or strong strict, plus the index locks and you acquire the locks first. Repeal reads, basically the same thing but without the index locks. Re-committed is the same thing, but as above, but like, as soon as I get a shared lock and I read my object, I release it, right? Because then come back in again and get another shared lock that the value might have changed and then read uncommitted if you support it, basically does all the above. So on P4, we will lay out exactly for the different isolation levels you have to support what is the protocol for how you acquire and release locks? Yes? So your question is for index locks, why aren't we doing crabbing when you lock the route? I see what he's basically saying, like why do I have to do the index locking on the values of the leaf node? Why not do something on the inner nodes? The problem is though with B plus trees, assuming it's a B plus tree, not a hash A whatever, with B plus trees, the keys that are in the inner nodes may not actually exist. I guess you get the ranges though, that would still be correct? They might be overly, like two-quarters grain and that's why you wouldn't do that. Yeah, I don't know the answer. Okay. All right, so quickly here's how you can set them. And I showed examples where PostgreSQL and MySQL, MySQL you set the isolation level four, PostgreSQL you set it when you call begin. And again, not all systems support all these isolation levels. I just wanna show this quickly here. So this is sort of a table that I've been maintaining for a couple of years now based on something that Peter Bayless, a huge paper professor at Stanford put together in a blog article. But basically here's the default isolation level for a bunch of the different systems. And then here's the maximum support. So to point out right here for its SQL server, MySQL, Oracle, PostgreSQL, these are like the top of the four most popular relational database systems in the world. And except for MySQL, they're all running with need committed, which is the second most isolation level. MySQL actually gives you repeatable reads. Now Oracle has this thing called Snapsa isolation, which we'll talk about in a second. So they don't even support serializable. Snapsa isolation is one tick below serializable. But again, if you ask for serializable, you actually get this. And like, you're just supposed to know this or read the documentation for Oracle. For Ingress, CoqRishDB and VoltDB, they all give you serializable default, which is kind of nice. For Google Spanner, they have strong serializable or also external consistency. Basically this means that you get serializable plus the order that the transactions are submitted to the system is the order that they will commit. Then serializable allows you to swap the order. Like, T1 could show up first, but it'll get committed after T2. Strong serializable prevents that. And Oracle claims they need this for, or sorry, Google claims they need this for AdWords, for some reason. It's also called linearizable. So his question, is it also called linearizable? Yes, a linearizable usually deals with objects on a single box. That's a whole other rabbit hole. There's a notion of the concurrency and correctness for like, in like the OS world and then the Davis world. And like, yes, things basically come to the top and they don't come in. Yes, I can, yeah, that's a whole other thing. Yeah. We'll talk about linearizability in extra because I'll see you later. All right, and then last one for DB2, there's this thing called cursor stability. I'm not gonna teach you what that is. DB2 is the only one that does it, but it's basically one tick below repeat or read, which is here, right? So here's the basic same hierarchy and it's going from the least amount of protection to the most amount of protection. So read uncommitted, recommitted, ignore the snapshot isolation path, then there's cursor stability, repeatable reads, and then serializable and strong serializable. Then snapshot isolation has a different anomaly that repeatable reads doesn't have and what snaps to isolation will not be susceptible to phantoms. And then quickly, I ran a survey a few years ago, I gave a speech about we spent all this time in academia worrying about building database systems that are serializable, but when you actually ask people what they actually run, you see that serializable isn't very popular and it said most people are running their transactions with read committed because that's the default. So again, in some cases, this is okay, but if you really care about making sure things happen in the right order, then you want to turn up your isolation level. Now the question is, okay, how do you actually know when you would have one of these anomalies? This is unknown in research. Nobody knows how to detect when this actually happens. All right, any questions about this? Yes, the question is what is read committed? So read committed would be phantoms and repeatable reads may happen. So read committed is basically, I'm going to read something, I get the share lock on it, assuming I'm doing 2PL, I read it, then I immediately release the share lock. That violates 2PL, but again, if you, like that's okay, because you're running a lower isolation level. So now when I go back and read that object again, I get the share lock again. In between that time, someone might have come and updated it, take Suicide Lock, update it, committed and release the Suicide Lock and would have changed it. Yes, so fundamentally that's fine even from a correct level. It depends on the application lock, right? Like I read something, release the lock, if my bank account is this, take money out, so then I go back and do get the Suicide Lock on it, but in between someone looked around with it. Yeah, okay. So let's jump into now, NVCC. So with multi-version concurrency control, the basic idea here is that the database system is gonna maintain multiple physical versions for a single logical object in the database. And in practice, that logical object is gonna be a 2PL, but at a high level it doesn't actually matter. And so what's gonna happen is when a transaction wants to write to an object, want to update it, instead of just overwriting the current value as we talked about so far, we're instead gonna make a copy of the original value and then update that. And what exactly that copy looks like and how we actually maintain the diff and so forth, that'll cover this. So then now when a transaction wants to read an object, it has to figure out what should be the version it should actually look at. And it depends on what isolation level I'm running at. I may wanna read whatever just out there, whatever the latest version is, or I may wanna read only the version that existed at the time my transaction started or that was committed by a transaction that was already committed by the time I started. And so we combine this technique, this idea of multi-versioning with both the pessimistic and optimistic and curative protocols, the OCC and the 2PL stuff we talked about before. And this idea of multi-verging sort of permeates all throughout the entire system. So this is an old idea. It goes back to the late 1970s. There was a peachy dissertation by somebody who this is usually credited as the first implementation or description of MVCC, but it was a purely academic endeavor. The first implementation of MVCC was done at DEC, which doesn't exist anymore. They got bought by Compaq, Compaq got bought by HP, I think they killed everything off. But the first implementation was this thing called RDB, VMS, and then Interbase. And then they were both written by this guy, Jim Starkey, who claims to have vented blobs. He went off and founded NeoDB, which is another MVCC system, sort of a well-known, I'm gonna say the Jeff Dean of Databases in the 80s, but sort of. And so, Interbase still exists today. There is a commercial version of it, but I think it sort of targets mobile phone applications. But it was forked off and then open source is a thing called Firebird. If you ever wonder why Firefox is called Firefox, because when the Netscape died, they were originally called the new browser, the open source version was gonna be called Phoenix, like rising out of the ashes. They couldn't call that because there was some other software called Phoenix. Then they were gonna call it Firebird. Couldn't call it that because of this database system. So then they came up with Firefox, right? RDB, so Firebird is still around, kind of think of like it's, it didn't take off like PostgreSQL, but it is an open source system that is actually maintained. Again, Interbase is still exists, but it's a commercial product. RDB got bought by Oracle, I forget when. And then this is always a confusing part about enterprise naming. There is Oracle of the company and then there's Oracle of the database system, the relational database system that they started in the 1970s. But then Oracle also sells Oracle RDB, which is a relational database, which is based on this RDB VMS, right? So there's the Oracle relational database system and then there's Oracle RDB, which is a relational database system. Two separate products. But nobody, like this is legacy, nobody would run this anymore, right? All right, so the key idea we're gonna get with MVCC is the writers are not gonna block the readers and the readers are not gonna block the writers, right? So that means that I can read an object and take a shared lock on it potentially. And then anybody that wants to come and update that tuple and make a new version won't block my shared lock because they can always create a new version. And likewise, if I'm a transaction that's updating the tuple, creating new version, the readers can come along and read the older versions without any problems, right? The idea now is that the transactions essentially can be called create, have sort of this virtual consistent snapshot of the database based on these version IDs. We'll talk about how we track them in a second. And I don't have to acquire locks on anything I need to read. I say, I only wanna read the versions of the database that existed at this timestamp. And I don't worry about what other people are reading and writing while I do this. The other advantage you get from MVCC is what are called time travel queries. And this is actually one of the original things that Postgres sort of touted in the 1980s when they were first building it at Berkeley. The idea here is that if I don't do any garbage collection, I don't clean up any versions, I can run queries on a snapshot of the database that existed at some point in time. So right now when I run my query in Postgres, I would get whatever the latest version that's visible to me. But you could say, I wanna run this query on the database as it existed three weeks ago, right? And you can go look at these versions and say, okay, go back in time and figure out what should be the version of the two-byte I need to look at, the state of the two-byte I need to look at three weeks ago. Postgres doesn't support this anymore by default. There's extensions to add this back. Basically what happens in the 1990s when they started building the open source version of Postgres that everyone uses today. It was a fork of this code at Berkeley. They had to add back the vacuum of garbage collection because if you don't do garbage collection, you run out of this space pretty quickly, right? And most people don't need this time travel feature. You do see it now in some of the newer systems that come out, they claim to support this. The cloud does make this easier. I've not actually come across a lot of use cases for this other than like regulatory reasons. Like the banks need to know what was my state of my database three weeks ago. Most people don't need this. All right, let's look at an example. So we have transaction T1, T2. T1's gonna re-day, and then read again, re-day again, and then T2 is gonna write A. So the first thing to put out now in our database, we're now gonna have this column of the version. And so you wouldn't actually store it exactly the way I'm showing here. You wouldn't have a name like this. We're gonna have these begin and end timestamps and that's gonna determine the visibility of this version. So for simplicity reasons, I'm saying, for PowerPoint demonstrations, the version column will just keep easier for you to just keep track of things. So these begin and end timestamps, they're gonna determine the range or the lifetime of this particular version. So at the very beginning, assuming that there was some transaction that inserted this record, they had timestamp zero, and then the end timestamp is infinity or nothing right now null. And so that means that this is the latest version. So if anybody comes along with a new transaction with a new timestamp, their timestamp would be greater than zero and therefore this version would be visible to them. As T1 gets timestamp one, T2 gets timestamps two. So now when T1 wants to read A, we go look in our database, our table, ignoring how we got here. Doesn't matter whether it's indexes or not, just assume we got to this tuple. And we would look and say, is my timestamp greater than the begin timestamp and is the end timestamp null, meaning this is the latest version? If yes, then I can go ahead and read this. And unlike basic to, I don't need to keep track of any read timestamps. And I don't need to make a copy of the thing I read into my private workspace like at OCC, it's just in the global database. So it's basically the same thing as OCC, except without the private workspace, the database itself, or the table itself, is the private workspace. I'm using these timestamps to figure out what I can see. All right, so now T2 wants to do a write on A. So this one here, instead of overwriting the existing tuple, I'm gonna make a copy of it, I call that version A1, set my begin timestamp to be this transactions timestamp, timestamp two, and then I'm gonna update the end timestamp for the previous version to be two. So now again, if anybody comes along with a timestamp less than two, they would know that they need to read version A0 and not A1. We're just not even asking how am I gonna know, both these transactions are still running, how do I know if I come up with timestamp three, or a transaction with timestamp three, that I should be reading A0, A1, because there's nothing in here that's telling me that this transaction has committed or not. It just says somebody created a new version. So there's additional metadata we can maintain, like a transaction state table that says here's all the active transactions, here's currently what their status is, and so I would know what's the, if I wanted to go say, can I read this in the tuple, can I read this version of the tuple, I could do a look up in here and say, is this transaction active or not? That's one way to implement it, you could also implement a flag in the tuple itself to say, oh by the way, this version was created by a transaction that has not committed yet, and then when the transaction commits, you go back and flip that bit and all the versions that they created so that people know to look this up. Postgres does it in the tuple itself, other systems maintain this state table, you know, there's no meaningful difference, the concept is still the same. All right, so now I come back in T1, it does a read on A, and now at this point here, I'm gonna read version zero because my timestamp is one, somehow I got to A0, I see that one is between zero and two, we're gonna begin at end timestamp, so I know this is the version I should be reading and not A1 because this transaction is still active, so this avoids a repeatable read problem because I'll go back and read the same version that I read before, right, because I have a consistent snapshot, right? Yes, this question is, do you always create a new version when you update something? Yes, you have to, but where that version is and what that version is, there are three different ways to do that, we'll get to that, yes. This question is, if I'm running a read committed isolation mode, in my example here, if T2 committed and then T1 reads it again, would I see the version zero or would I see version one? If I'm running read committed in Postgres, you would see version one, I assume it is when, sorry, when I'm reading, when my transaction is running, and not only do I have my timestamp, I'm signed when I start, but also the timestamp of what, sorry, the current timestamp? So the way to think about this is, depending on what isolation mode I'm running at, that's gonna determine what I can see over here, right? So if I'm running with full staff isolation, then I would not wanna see anything that has a timestamp that's newer than my timestamp, even though that newer version might have been committed. Yes. So this question is, what would happen here if I give T1 timestamp two, and then T2 timestamp one, with the exact same ordering? So, all right, so T1 starts, is it happening in the same physical order here, or no? So T1 starts, they do a read on A, T2 starts, they do a write on A. So again, I would not see, this guy would not see this guy's, this guy would not see this guy's updates because its updates would happen, would be uncommitted by the time this guy reads it again. It's just kinda hard to do this on a power one. Like at this point here is A1, which was written by T1, is that not in the database? But like, does this show up in the database after this write? Yes. What do you mean by the timestamp, sorry? Yes. So for this right here, A1 would have the timestamp of one to infinity. So the second read, again, like if depending on what isolation level I'm running at, I could look at and say, either in a state table or the head of the tuple, is this thing committed yes or no? Or is the transaction that created this version committed yes or no? All right, let's look at another example here. All right, so T1 is gonna do a read on A, write on A, then a read on A, T2 is gonna read on A, and then a write on A. So we start off, do a read on A, so our timestamp is one, A0's version is, begin timestamp is zero to infinity, so we go ahead and read A0, then I do a write. So again, I copy the version of the tuple, tuple into the A0 into a new version, A1, do apply the change there, update the timestamp now to be from one to infinity, then also update the end timestamp to the previous version to be one. Do the context switch, T2 starts running, it does a read on A, this timestamp is two, it's gonna read version A0 because T1 has not committed yet. And again, either because it's in the state table or because it's in the head or the tuple. For our purposes here, it's in the state table. Then I do a write on A, and in this case here, I have to stall because I'm gonna, I'm essentially gonna have to get another scoops of luck on the new version, right? And I have to wait to see whether this guy commits or not. Under snaps isolation, you would stall, I think under serializable, you would actually, you would have to abort, right? Then, so T1 runs again and it reads, and it would see the version that it wrote earlier, and we avoid unrepeatable read. Then it commits, and then I can unblock this guy, and then I create the new version. Yes. Question, what is the isolation we're running in this example? Snaps are isolation. Excellent. Okay, so again, when a transaction starts, we're gonna see a consistent snapshot of the database that existed at the time the transaction starts. And this means that we're not kind of any torn writes from actual transactions, meaning we'll only see parts of the writes, we see all or nothing. And then if two transactions try to update the same thing, we'll do a simple scheme of just the first writer wins. The second one could stall, or we could just go ahead and kill it, yes. So this question is, in my example here, why did this guy have to stall? Why could I just create a new version? Because we would have to basically do it right now onto this thing to point to, to tell it that here's the next version you should be looking at. There's actually metadata I'm not showing here about the pointer versions, for the version chain. We'll see that coming up. Okay, so I've said that the, having consistent snapshot, you're guaranteed repeatable reads, you're not gonna read uncommitted data. You can still have phantoms, we can ignore that for now, but there's another anomaly that can occur under multi-versioning that you don't have under TPO and other isolation levels, or lower isolation levels, and that's called the write skew anomaly. And the way to think about this through this visualization that Jim Gray, the guy that invented, or made a two-phase locking in one of the 21 of the 90s, this is sort of a metaphor he likes to use, and I always like to use it as well. So say I'm a database of marbles, and the marbles can either be black or white, right? I have two transactions run at the exact same time, and they're gonna change, the first transaction is gonna change all the white marbles into black, and the second guy's gonna change all the black marbles into white. So the way it would work is, they would both first do a read on an existence snapshot of the database under multi-versioning, and they would see the state of the database like this. But then they would go and update the corresponding tuples that were black down here and white up here and flip them, and now they both think they've done what they're supposed to do. They have all the white marbles or all black marbles, but then when we write it back to the database, because we didn't take write locks on these guys down here, there's no technically no conflict between the two transactions, and we end up with incorrect state. Yes, because serializable isolation would either be the all black or all white, because that would be the equivalent to a serial ordering. So it'd either be T1 runs, and then T2 runs, or the reverse. Yes. Your statement is, if the granularity of the transaction focuses on the lower half of the up half, it's still a what, sorry? My example here, no, this would not happen. What's that? That's right, but the transaction has changed all. Change all the marbles from black to white or change all marbles from white to black. So they'd either be all white or all black. You wouldn't have a mixed. Ignore just for multi-verging. Honor snaps to isolation, this would happen. Again, if you start taking high-core locks, then you start doing the things to avoid this problem, and never it's serializable. But in snaps to isolation, as I'm describing here, where I show up and I have a consistent snapshot of the database based on my version, this would not happen. Yes. But would you be able to kind of like, in checking the kind of operation that you're doing, if it's like, I'll change all, it gets such that it's not like that. You change actually that into number for quantity? Your question is, couldn't I detect based on what the command is or the SQL command that's showing up is, and then change my isolation level accordingly? Yeah. No, it doesn't, you kind of have to like, how does it is? It's basically like saying, I'm going to drive a car without the seatbelt, but if I'm about to crash, I'm going to quickly pull the seatbelt on. It doesn't work that way. Yeah, not exactly the same metaphor, but it's basically what you're saying. All right, so the thing I want to point out as we go along in the implementation of MVCC is that despite the name being multi-version control, it's not just a protocol, right? It's going to encompass a bunch of changes of how we're going to manage transactions and data all throughout the database system. And again, this is why I had to cover this at the end of the semester, or near the end of the semester, because a bunch of the things that we talked about before now get slightly affected and changed by handling multi-versions. And just to show you that this is pretty widespread, pretty much every single database that's been embedded in the last 20 years with some rare exceptions, if you're running like an embedded devices or like super high-performance transactional systems, everybody is pretty much running some variant of multi-versioning today. Yes, we'll take that one offline. Okay, same question, okay. All right, so, I mean, I don't think we're going to get through all of this, but let's see how far we can go. All right, so the first thing, the first thing we got to deal with is what is the Curacao protocol? And it's basically all the things that we talked about the last three lectures. So I can do multi-version 2PL, I can do multi-version OCC, I can do multi-version timestamp ordering. All of that still applies here. And you would take basically the, you implicitly get shared locks because of the versions. And then for exclusive locks, you know, if you're going to write to something, you take an exclusive lock on the tuple, and then you're allowed to create the new version. And whether or not that exclusive lock exists in a lock table, or you can put it in the header of the tuple itself, which is what Postgres does, again, the basic idea is still the same. So we're not going to go through it how you would actually apply 2PL and all these things to multi-verging. Again, it's pretty straightforward. It's just whatever you, whoever's going to write the latest version, or who wants to create a new version has to go through these protocols. The trickier thing is how we actually want to store the versions. So, in my examples before, I didn't show you how we got to the different versions. But I said, hey, they were just in these tables. And somehow, we knew we had to get to A0 or A1. In practice, what happens is that the database system is going to maintain a version chain for tuples. Sort of think of this like a single direction link list that's going to allow you to traverse the different versions of a tuple and define the version that your transaction needs to read, right? Basically, all of the indexes will now point to what I'll call the head of the version chain, which either could be the oldest version or the newest version. And we'll describe that in a second. And then what the actual, the version chain is pointing to will depend on how you're implementing the multi-versioning, right? So, the three approaches to do, to do append-only, time travel, or delta storage. Append-only is what I've shared so far. You basically make a whole another copy of a tuple and write the new version. Time travel would put the older version in a separate table, like a separate physical table. And then delta storage is probably what you're most familiar with when you think of something like Git. It's like the diff that you store of the change and you store that in a special space for each table. So go with these one by one and we'll talk about the trade-offs of all of these. All right, so with append-only storage, there's a single table, yes. So this question is, I'm laying out three approaches here, are they mutually exclusive or could you have some kind of mixed scheme? You could have a mixed scheme but now you're maintaining basically two different paths of code and nobody would do that. It's a huge pain. Look, this is the right way to do this. The way Postgres does it is actually the worst way to do it. But it was the 80s, right? This is what you should be doing. It doesn't make sense to try these other ones. And the time travel one will show up because it's when people take a single version system and they try to graft on multi-versioning as an afterthought. And that's what SQL is over it does. All right, so append-only, if we have a single table and every time I create a new version, I'm just gonna insert like a new table into the table. That helps that my version chain to keep track of things, right? And I'm showing this by this pointer column here. I'll also say too, again, we're not in a real system, you wouldn't actually maintain these version IDs. And I'm not showing the beginning and end because I'm trying to make it fit and keep a simple and PowerPoint. All right, so say that I have an operation here that wants to create a new version. So I would traverse the version chain starting at a zero. I follow the pointer that takes me to a one. And then I create a new version. First my copying the old version in and then overwriting with my new values. And then I go back and update the version chain pointer from the previous version that to now point to my newer version here. So this is the example of how we're going oldest and newest. So the index would point to a zero so that's the oldest version. And then my transaction would then traverse that version chain till it finds the latest version that I'm allowed to see, assuming it was a two. Yes? Oh, clearly. The question is, are the values only dependent once they're committed? No. So this question is, now if I have a read and I want to read the previous a one, how would I go back to it? There's no point in back to a direction. This question is, why isn't there a point in back to a direction? So again, as I was saying, so the version chain is from a zero to a one to a two. So assume the index points to a one, sorry, sorry, the index points to a zero, the oldest version. This question is, if I have to read the latest version that I have to go all the way down? Yes, this is what PostCoS does. There's pros and cons to these, right? What's a pro? I mean, so what's a con? He already said it. You have to go all the way down. What's a pro? He says it's simple. Yes, but there's something else. If I update the index and I create a new vert, sorry, I update a tuple and I create a new version here, I don't have to go update the indexes because they're always pointing to the latest version. So I've got a thousand indexes and I insert a new version. I don't have to update a thousand indexes. If I'm always pointing to the newest one, then I got to update all the indexes to now point to the newest one. No free lunch, yes. So what's the B one here? Why, like, I assume it's B zero, but like someone could have Garber collected B zero. It's, yeah, it's just, B one doesn't need to be here, it's more about the A's. Yes. Questions, do you ever compact? Yes, we'll do Garber's collection vacuuming in a second, yes. So his question is, if we're here and A one's aborted, so remember what I said, first rider wins. So at this point here, if a transaction is trying to update A and is able to get the exclusive lock to create a new version A two, that means that the transaction that created A one is already committed. So we don't have to excise, we don't have to pull out single, something in the middle of the Virgin chain. We're just always appending. Yes. Do you update the index? When or what? When. So here, right? It depends on what the Virgin chain is. So it was oldest and newest, then we just append versions at the end of the chain. Of course that means now that I have to traverse the chain, which could be long, depends on how aggressive I am doing my compaction on Garber's collection. But I don't have to update the indexes because the index always, anytime I put a new version because the index is always gonna point to the oldest version, right? If I'm doing newest oldest, this will be faster if I'm always trying to find the latest version because I don't have to traverse a long version chain potentially. But again, that means that every time I update the new version, I gotta update all my indexes now to point to the new one. There'll be a trick the way, we'll see a way we can get a void, have an update all the indexes in MySQL. But again, you pay a penalty for maintaining larger values in your secondary indexes than you would normally otherwise have to do. So Postgres, as far as I know, it might be the only one that does this. There was an attempt to get off of this, switch to the MySQL delta approach. I think it was called X heaps. That project might be dead. And this would be a very hard thing to change in Postgres right now. But it'd be nice if they could. But actually, regardless of whether you're doing, well, if you're doing Delta storage, you would still, you would do newest oldest. And then you still have this update the index problem. All right, time travel storage. Again, this is what I'm saying. This is seems always, these systems that I know that do this, it's always like grafted on from a single version system to add a multi-version system. Like in SQL server, you can pay an extra package to say, I wanna do time travel queries. And this is essentially what you get. So the way this works is that the, you have a main table and the time travel table. So there'd be like physically distinct two separate tables in the database system. And so every time I do an update, I'm just gonna copy the current version in the main table into the time travel table, update the version chain to now point to the previous one that used to point at. And then I update this, update the value over here with the new value and then update its pointer to now point to the next oldest version or the previous version that I had in my main table here. So now if I wanna run, what's the latest version of a tuple? If this thing only contains committed data, then I can just scan through this and not have to check any versioning things because I would always see the latest version right here. If I then need to say, show me a version that, a previous version back in time, then I would follow the version chain to go to the time travel table and do lookups there. Yes, so the statement is for newest oldest, we could avoid updating the indexes by having an intermediate pointer. We'll get to that. Nobody does that, we'll get to that later. Yeah, okay. So again, this seems kind of, it's a question as to how is this any different than the append only? The difference is that there's two separate tables now that are physically distinct. You're not intermixing old versions within one table. So that means I could put maybe this table be backed by a really fast disk and then this time travel query, I don't care about archival data because I'm not gonna read it that often and it's not in the critical path of update transactions. I can maybe put some of this on a slower disk. Yes. This is not specific to time travel storage, but when you add a new version and at the same time you have the index on A2, add a new entry for people's industry? This question is if I, when I had, how do I handle indexes with different versions? How do I handle index entries with different versions? We'll get to that later. Yes. That's hard too. Okay. So the one that is more common in the research literature, at least definitely our experiments to show that it is better, is to do delta storage. And the idea here is that every time I do an update, I'm only gonna copy out the values that were modified in the update to this delta record that's stored in a separate table space. And so in this case here, if I'm gonna update A1, I'm gonna put the old value as a diff in my delta segment, then update the actual main table data and update the pointer now to point to the diff. So now if anybody wants to come along and read a previous version, that's, you know, it's not in the main table, they just follow along this pointer and then apply the diff to go back and say, what was the state of the tuple at this previous version? And I keep doing this and I maintain different, you know, entire version chain of these diffs back in time in the delta storage segment. So this is better because obviously if I have a thousand attributes and I update one of them, I don't have to make a entire copy as in pendulum and time travel data, right? I just store a diff and then it makes it really easy to do garbage collection we'll see in a second because I just blow this thing away. Like I figure out what's the latest version I can actually see and maybe just drop the whole thing because it needs to sort of be ordered in like in insertion time. It's almost like it's basically the log structure stuff we talked about before. Okay, so any questions about delta storage pendulum only and time travel? Yes, there's been attempts to, yes. But this is what MySQL does, this is what Oracle does, this is what CockroachDB sort of does this. This is what pretty much, if you're gonna email MVDC, you would do it this way. All right, so we obviously have to deal with old versions and we have to reclaim them because otherwise we're running out of space. As I said Postgres had a problem. So we're gonna say that the data systems has to have a way on a multi-versioning to figure out that what physical versions can be reclaimed from the database. And reclaimable would mean that there's no actual transaction that can see a version that we see a particular version. If we know no transaction we can never see it. Even though, sorry, if any transaction can see a version meaning like their time stamp is within the range where this thing, a version would actually be visible, we can't delete it because we don't know what's gonna come, if it's gonna try to read it later on. So we have to basically maintain a watermark or threshold to say here's the minimum transaction ID of all action transactions. And anything that's not visible below that, we can go ahead and prove. Of course, the question is how are we actually gonna find these older versions? And then when do we know it's actually safe to reclaim them? Safe to reclaim them again, just to the transaction ID, but how to go find them and keep track of them is the hard part. So basically the entire lecture is here's approach one, here's approach two. I can't say which one is better other than Delta storage. And in various systems do a bunch of different things. All right, so the first approach to do tubal level vacuuming or tubal level garbage collection and basically where the database system is not going to keep track of for any individual transaction, here's the versions that they invalidated. We basically have a mechanism to do searches on the table to try to find the data we can reclaim, the version we can reclaim. And then for transaction level garbage collection where every transaction is actually gonna maintain their own right set of the things that they invalidated hand that off to some kind of background thread and then it knows how to go find that data and remove them. So the trade off here is like if it's transaction level, if I have to update a billion tuples then I have to maintain a record that I've updated a billion tuples and hand it off to some kind of background thread. In this case here, I can have hints for the first approach that says, hey, here's maybe where you should go look for some versions that are invalid or reclaimable but it doesn't act to give precise locations. So tubal level GC. All right, so with two approaches there's the background vacuuming and then the cooperative vacuuming. So what background vacuuming is that they're gonna have separate threads, separate workers that are periodically gonna wake up from sleep and start scanning tables and look for old versions. Again, there's this threshold that says here's the smallest actual transaction timestamp. Anything less than that, you go ahead and it can reclaim. So in this case here, we have transaction T1 is running. It has timestamp ID of 12. This guy has timestamp of 25. So the vacuuming says, okay, what's the min of these guys? It's 12 is the smallest one. So anything less than 12 we can reclaim. So it literally is just gonna do a sequential scan on the table, all right? And just look at the begin and end timestamps and say, is there a minimum transaction not within these ranges? If no, then we know it's safe for us to go ahead and reclaim this. Yes, we have to update indexes, right? There's all the other stuff we talked about in a second. But you know, it's just literally just doing a brute force sequential scan to try to find things. This is a way over simplification of how actual Post Coast does it because I'm not defining whether these versions are in the same page or not, right? Because for single logical tubal, physical versions may span multiple pages. So I had to take lots of those things and try to find what's the beginning of them. But it's way more complicated, but this is basically the high level of what it's doing. And obviously this'd be super expensive to do to complete sequential scan every 60 seconds, over and over again. So what you can basically do is maintain a bitmap. It says here's all the pages that were modified. I don't know how they were modified, but we wrote something to them. And the last time I ran the vacuum so that you can just scan through those, figure out what all the pages it needs to look at and only examine those. Again, not entirely 100% accurate because again, the version that you may need to be able to prune may exist on another page, but at least you would know that a modification was made to some tubal here and I could then do a lookup and say, what's the virgin chain to go find? Trace the virgin chain to find the thing I need to prune out. With cooperative claiming, the idea is that as the worker threads are running queries, if they come across something that they know is reclaimable, that knows that they're not visible to any transaction, then they're responsible for going in and cleaning this up. So say T1 does a lookup on some index. We're going oldest and newest. It ends at the beginning of this virgin chain here and then it knows what the global minimum timestamp is. So as it scans along and it's reading the beginning and timestamp of each version, because it has to figure out what it needs to read. If it sees something that knows it's not visible to not only itself but anybody else, it can go ahead and delete things as it goes along, right? And obviously this means that you're going to have to go update an index to potentially point to the new version. But the idea is that instead of having dedicated threads that are crunching the background trying to find the old versions, you just pay a little extra overhead when you actually do scans, whether the reads or writes to update up the things as you go along. Of course, now the problem with this approach of cooperative cleaning, like if I make a bunch of old versions and I never go back and ever read that logical tuple again, it's never going to get reclaimed. So you still need the background threads to occasionally go look around and try to find reclaimed old versions. But the idea is, again, it wouldn't have the same overhead that you would have like a dedicated background thread. Yes. No, to be clear, to save it as, I'm not saying that this guy, because he has the lowest timestamp, but he's responsible for cleaning. It's any thread has to do this cleaning. I'm just, I just drew the line for the first one. It could have been this one, too. The point I was trying to make is that the system maintains what's the lowest transaction ID so that when, and everybody knows this, that when they scan, they see things that they know that's not visible to anybody else, their response for cleaning things up as they go along. Yes. Why does it work? Sorry? Why does it work to oldest and newest? So this question is, so I say here, it only works to oldest and newest, because if it was newest to oldest, then I would never see old versions, because I would follow the index immediately on the newest version. Maybe I'd need like the second or third version down, but in that case, if I'm seeing the second or third version down, then I might have the oldest timestamp. And I should be, in theory, potentially see the things, could see things in the future, so therefore I can't reclaim them. So the other one is transaction level GC. And as I said, this is just when the data system maintains, for average transaction, their own read and write set. And then when they commit or abort, they tell a dedicated vacuum thread, hey, by the way, here's the things that I know I invalidated. Your job is to go ahead and clean things up. So basically like this, my transaction T1 starts because the timestamp 10. I do an update on A, create a new version, A3, and then I maintain my old version write set here to say, hey, by the way, I wrote A3, A2 was the previous version I invalidated. So go clean it up if I commit. And we just have a pointer to that, like a record ID. Then I do an update on B, same thing, create a new version, it goes to my write set here. Then now when I commit, I just pass us along to the vacuum thread, it knows that the timestamp that would be visible, the timestamp that could see these changes has to be less than 10 because that was my begin timestamp. So therefore, once I know there's no actual transaction with this that could actually see this, I'll go ahead and run and clean these things up. All right, so that's how you basically garbage collection, the example I said before about how MySQL does garbage collection is basically works the same way. You're pending these deltas of this delta segment and you know what the timestamp is of each delta record. And if you order them in that entry based on that, then I can just prune them out by cutting off like any portion of the file or segment that is not visible and I just reclaimed that memory entirely. You don't have to do looking, right? Because you know exactly what the old versions are. You mostly have to do this when you're actually doing a penalty. All right, the next trick you think we gotta deal with is indexes. So the primary key is always gonna point to the head of the version chain and which could be the oldest or the newest depending on the implementation. And when we actually have to go update the index or depends on again what the version chain ordering is. In most systems, if you update the primary key of a tuple, that'll be treated as a complete delete, a delete followed by the insert. Because it's too tricky to maintain like, okay, here's for this primary key, here's a new primary key and then here's the link together logically and here's the pointers to do them separately. If you change the primary key, the Davidson will treat that as a completely different logical tuple, right? It's the secondary indexes that are more tricky and that depends on how, again, what the version chain is and how we're actually storing the versions. And I think I mentioned this before. There's this blog article from 2016, six years ago from Uber, where they talked about how they went from Postgres to MySQL. And I said the true story is it actually was from MySQL to Postgres back to MySQL because they hired some guy who was like, hey, it was really big on Postgres which is fine, but they didn't think through what the workload is actually trying to do and they switched to Postgres and the way Postgres does multi-versioning and how they handle updating indexes and multi-versioning is actually the worst case scenario for what Uber was trying to do and they ended up having to switch back to MySQL. So instead of paying, I'm sure these cost the millions of dollars to do this, they could have just given us the million dollars at CMU and we would have told them not to do this, right? But I'd say this is a good example of why you need, I understand what's going inside database systems because you can make costs and mistakes like this. All right, so there's two ways to do this and some of you have already mentioned a couple of these but we'll go through both of them. So first is have a logical pointer and then basically this is a fixed identifier that doesn't change for the physical version of the tuple and this could be the primary key itself or it could be a synthetic tuple ID which is what he brought up, like basically an additional indirection layer. The alternative would be to do physical pointers where the index always points to whatever the head of the virgin chain is. So there's pros and cons to all of these. So say we have a simple database that's doing append-only newest oldest version storage. So if I do a primary key lookup, like get A, then the primary key would have a record ID that would point immediately to the head of the virgin chain. Right, that's fine, that's fast. For a secondary index lookup, I could have a record ID as well to point to the head of the virgin chain but of course, again, if I update the, if I run the updated new version, then I gotta update all these secondary indexes because they're pointing directly to it, right? So if I get a thousand indexes, they all have to point to this thing. So anytime I create a new version, I have to update all 1,000 indexes. Right, this is what, well, Postgres is doing oldest and newest but this is essentially what they do as well. The way my Segal does this for secondary indexes is that the key, sorry, the value of the secondary index is actually the primary key itself, right? And then the primary key then, so I do my secondary index lookup, I get the primary key as the result, then I do another lookup in the primary key index to then jump down to the virgin chain here. So now in this scenario, no matter how many times I update the virgin chain, I only have to update the primary key index. All the secondary indexes don't have to change because the primary key doesn't change and they can always find out what the right location of the virgin chain is by going to this, right? And this is why I say more, this is why if you update the primary key of a tuple, even though it sort of conceptually seems like it's the same thing, we have to treat it as a separate new logical tuple because we have to go update all these secondary indexes as well. And then the example that he brought up is, what if I had this like virtual tuple ID and some kind of mapping table that says, if I secondary index has this tuple ID, I do a second lookup in this tuple address lookup table and then I get my head of my virgin chain here. You could do this, nobody actually does this because now you're basically maintaining a second, another whole second index and you might as well just use the primary key the way my SQL is. All right, so does that answer your question? Yes. All right. So the tricky thing is also going to be, although we have one minute left. What's that? He says, let's call it a day. You know, there's a lot, there's a lot. Yeah, let's call it a day and then we'll do a demo with Postgres to show how they do this next time, okay? Nice. All right, see you guys.