 Bad dinner, wicked dinner, bad, bad dinner, wicked dinner. Okay, all right, we are starting now today. So, still the same administrative stuff, right? Project three is due on Sunday, November 14th, and homework four is due on Wednesday, November 10th, right? And for next week, we actually have the company Vertica, come CMU to, I mean virtually to give this database talk. Vertica is actually a company spin-off by my advisor's advisor, Michael Stoombricker from MIT, is one of the modern implementation of a column store. And I think it was later bought by HP, and right now it's bought by another company, right? But they are going to come and tell us about what they are doing in the space of a column store, all right? So, today's lecture would be the last lecture on concurrency control. So hopefully at this point people are still following. If the topic a little bit dense and complex these two weeks, well this kind of like one of the most challenging or complicated topics we are discussing in this course, right? But today is the last one, last lecture on this topic of concurrency control. I hope everyone is still hanging there, okay? So last class we talk about the technique of doing optimistic concurrency control, right? Especially there are two specific algorithms we talk about, right? Basic time set, time set is already in as well as optimistic concurrency control, right? And then last week we talked about pessimistic concurrency control with locking as well, right? So one thing I end up last lecture was that just beyond the fundamental theory of serializability, there are actually cases where you are going to insert or delete two posts, right? And in that case, there could be additional issues like a phantom to make the result of your query, quote unquote, incorrect. And because there are many different potential issues, and even without considering a phantom to deal with all the conflicts, right? To make sure everything to be serializable, we also talk about lots of things and lots of things we got to do, which essentially can be costly and have additional overhead, right? So in database systems, in actuality, systems actually usually allow users to specify lower isolation level than serializability, right? That's what we end up talking about last week. Essentially at the highest level, serializable level, you don't allow phantom and you don't allow any of those conflicts we talk about, right? And but in the second level, repeatable risk, you actually allow phantom to happen. But then you ensure there's no cycle in the conflict dependency graph. That's what we talk about in our concurrency control theory class, right? So it will allow more scheduling, and there's less overhead to check the phantom as well, but it has this potential issue. And next level, you can even allow unrepeatable risk, right? So again, enable more scheduling, but potentially problematic result, right? Depending on whether your application kind of allow that. And lastly, read uncommitted, it just allow anything to happen, right? So this would be the metrics of what isolation level you may get, as well as what kind of a problem or anomaly you may allow. Just note that when the database system is set to be a lower isolation level, for example, read committed, then it's just possible that the database system would have unrepeatable read or phantom, right? It doesn't have to have, right? In some cases, if the workload is simple, it may not have that. Then in some system, because of how the system is built, how the components are designed, even though you specify at a lower isolation level, it'll actually give you something stronger, right? So at a lower isolation level, it could have those problems, but it doesn't necessarily always have those anomalies, right? Okay? So generally speaking, when you are allowing a lower and a lower isolation levels, of course you can do less and less checks, then allow more and more potential scheduling of the transactions, right? So just take the two-fist locking, for example, right? If you're assuming that we are using two-fist locking as our concurrency control protocol, then under the serializable level, right, you're going to check the phantom, for example, using the index lock that we talked about last class, and for example, strict 2PL, right, then you will achieve serializable. But then, if you only want to achieve repeatable reads, then at this isolation level, you only need a strict 2PL, right? You do not need this index lock on the phantom problem that we talked about last class. And then again, if you go to a read committed, then you will still use strict 2PL, but then, sorry, it's not strict anymore, right? You'll still use 2PL, but then you can actually release the share lock immediately after you use it, right? Then, I mean, you would achieve this read committed isolation level. Again, allow more scheduling, but then potentially have the repeatable read problem. And then lastly, read uncommitted, I mean, you just know share lock, right? It depends, right? In some cases, you will have exclusive lock, sometimes this is not, but essentially, you don't need share lock, right? And this isolation level is, in fact, a secret standard, right? Remember last class, we talked about that way system can allow user to specify, hey, I want to lock this entire table right now. That's actually not a CQ standard, right? Users can provide this hint, or if they don't want to provide, that's fine. But this isolation level is actually part of the CQ standard, right? And most of the system will allow users to specify a stronger or lower isolation level depending on what their application needs. And then, of course, if you don't specify, system will provide a default isolation level, right? Here is like a summary, it's actually made by a former professor from Stanford, Peter Billis, but now he's doing his own startup thing. But either way, that's the table that he made, that summarized the default isolation level as the maximum isolation level across different systems, right? So one interesting thing we can immediately notice here is that only two systems set their default isolation level to be serializable, right? Even though most of the theory, right, most of the analysis we talk about are developed by a database researchers over the years will focus on serializability. But actually in practice, because of the performance in communication, most system, I mean just by default, only give you a lower isolation level, and most of them will just give you read committed by default if you don't specify explicitly. And for many systems, right, they don't even provide serializability as the highest isolation level, right? Because just because it's difficult to implement and potentially restrict the performance quite a lot, so many systems don't even have that. And beyond the serializable read committed, repeatable read as well as uncommitted level we talk about, there's actually an interesting additional isolation level called Snapshot isolation, right? That's actually the maximum isolation level can be achieved by, I mean the famous Oracle database, right? So they don't even provide a serializable even if you want. But this is, we'll get a little bit detail into the Snapshot isolation later. Essentially this is some isolation level that is also a little bit lower than serializability, right? But we'll talk about details about this later in this class, right? And then I think it's two or three years ago, our group essentially did a survey on many DBAs about what isolation level, DBAs just means database administrator, right? We did a survey to ask about what isolation levels that you are setting on the databases that you are managing, right? So here are the results. Essentially if you look at the serializable column on the right, right? If you look at the column, not no single DBA will specify all the databases as serializable, right? So because of their application, because of performance consideration, there always be some instances they are managing that is actually not serializable. And the most isolation level that people would resolve to is actually just this read committed. So it's actually unclear to me at least that whether people use read committed the most because that's just the default isolation level by most systems, right? And they don't really change. All this is because of the thought through their application property and characteristics and they just determine that, hey, they only need read committed, but nothing more, nothing less, right? That is actually the part I'm not exactly sure, but it just turns out to be what most people use, right? Then the last note on this isolation level thing is that, well, it's not exactly isolation level, but last note on the SQL standard thing is that in SQL standard, there is actually a specific command that would allow user to give hints on whether a transaction is entirely read-only or not, right? If the database system knowing in advance that this entire transaction is read-only, then maybe there could also be a potential optimization, right? But it's not implemented by all systems, all right? But you can specify that just not all systems would actually do corresponding optimization, even if you specify it's read-only, all right? Okay, so that's about isolation level. Before we jump into today's class, multiverse, any questions? Okay, am I following? Cool, so today's class, again, is the last topic on the topic of concurrency control. And it's last, but it's also a very, very important one, right? So essentially today, we are not going to talk about a specific concurrency control protocol. Instead, what we're talking about is actually a optimization that many, many, many systems would apply just in combination with the earlier concurrency control protocols we talked about, right? This specific optimization would be called multiverse concurrency control. I mean, though it has the name of concurrency control here, it's not a specific protocol. Instead, it could be used in combination with any of the protocols we talked about earlier, right? Either locking, optimistic, timestamp, et cetera, et cetera. So the fundamental idea of multiverse concurrency control is that the database system is going to, instead of the earlier protocol, right? The system only has one centralized or global physical copy of the record, right? No matter whether you lock it or you have some copies of the record in your private space, from the global perspective, right? There's only one centralized physical record, the earlier in the class we talked about. But here, with multiverse, the intuition is that what if we can maintain multiple physical copies of a record in the global space for a specific single logical record object, right? Then if we do that, would it be possible that this allow us to have even more flexible scheduling of the transactions, allow more parallelism, or concurrency, higher throughput, lower latency, et cetera, right? And the high-level idea is just that when every transaction writes a new object, instead of write that object in place, right? In the global centralized location, why, what if we just keep all the original records intact, and then we just write a new copy of the new records, right? Then we have different copies of this record, and then other transactions can either read the current copy created by myself, or order copies, right? So this potentially can allow more flexible scheduling. And now I'm going to more detail some of this. So a little bit of history. Again, multiverse, the idea of multiverse concurrency control is not new either. It's, well, people generally view the first time it was proposed is in a, this is from MIT in 1987, sorry, 1978, right? And then the very first few implementations of this idea of multiverse concurrency control would be in the early 80s, and then implemented specifically in two systems, right? One is called RDB, and the other is called Interbase. And they are, they both came from DEC, and they're both architected by this person called James Starkey, and now he's became a, he's like, now he's doing his, another startup again, right, called NeoDB. And then later on, this, I mean, the RDB system was acquired by Oracle, would be called Oracle RDB. But note that this is actually different than the Oracle database you normally use, right? So there's that, I mean, this Oracle database, I invented in, I forgot, the 70s, 80s, right, is used and updated until now. But then they also acquired, Oracle, the company, also acquired this early database system called RDB. And then renamed it to be Oracle RDB. And lastly, the Interbase was just later on open sourced. And now it's called Firebase, right? So this is actually the first of two implementations with this idea of multivariate concurrency control. There's, those systems are still, I mean, active today, right? Or at least used by people today, right? Okay, so again, the multivariate concurrency control, right? You have a multiple of variance of a, multiple physical variance of a single logical record or object. And then the main advantage of this multivariate concurrency control would be that writers and readers do not block each other, right? So say for example, if you are writing transaction, then when you write a new record, I mean, all the, I mean, for example, when you write a new record to a specific object, right? Then if there are other running transactions that are still reading the earlier version of this tuple, they are not affected, right? So because you are not deleting any record, so existing transactions, if they are using, if they read some version of this record earlier, then they can still able, they are still able to continue, right? So the writers will not block readers and vice versa. Of course, I mean, writers and writers would affect each other, right? Otherwise, you would have a conflict. And then, especially in this, under this, I mean, implementation or optimization of multivariate concurrency control, it would allow read only transactions to be able to read a consistent snapshot of the database system without acquiring any log. So by reading a consistent snapshot, what do we mean is just that when the transactions are trying to access tuples, right? It could access tuples, any tuple it access would be all at a version that is exactly at the starting point of this transaction, right? So essentially, this transaction, again, if it's a read only, it could read a consistent version of the entire database system at a particular point of time, right? So, and all the data it would read would come from the transactions committed after that time point. And any modification after that time point would not be read by this transaction. And this reader only transaction can do that without acquiring any logs, right? So again, this could, you can see that this could potentially give us a lot of potential performance improvement. And that's just exactly what snapshot isolation means, right? Just means that a transaction is reading the state of the database at a particular time point that it can only see the version of the tuples committed by transactions earlier than that, but nothing after, right? So again, to note that this snapshot isolation level is still not sterilizable, but even though it will not really have the basic conflicts we talk about in the concurrency control theory class like dirty read, dirty write, and the writing to uncommitted data, it will not have those basic problems. But then it will also, it would have a problem called the write skew, right? So we would still make it not sterilizable. But it at least is already a pretty strong isolation level, okay? And lastly, with this multi-version idea, it's also easy to support a type of query called time travel queries, which essentially means that you can ask the query to check or to query the state of the database, again, at a particular time point back in time, right? So this is very straightforward, with different versions and with the snapshot isolation, you can easily specify a timestamp and say that hey, I only want to read tuples with the versions before or exactly at this timestamp, right? So that's the very easy to support queries of that kind, all right? So let me give you some examples, right? So here, see that we have two transactions and then we only look at this one record A, right? So first of all is that for the illustration purpose, I'm denoting this version of the tuple here, but in actuality, there's no this field, they don't really maintain this field, right? It's just something for illustration. So instead, what they have is well-maintained would be the begin and end timestamp for each version of a particular record. In this case, it would be a tuple, right? And then we have this value, it's just a value of this tuple, okay? So say we have two transactions here, right? Transaction T1 and T2. And here, we are going to assign timestamp to the transactions at the beginning of their execution, right? So even though, last class, when we talk about OCC, we say, hey, usually, optimistic concurrency control, we only need to generate timestamp for a transaction at the validation phase. But here, because we need to use the timestamp to determine which version each transaction is going to read and write, we have to assign a timestamp for each transaction at the beginning of its execution, all right? So here, transaction T1, we give it a timestamp 1, T2 give a timestamp 2. At the beginning, transaction T1 is going to read record A, right? And then here, I mean, because this transaction, this begin and end timestamp is, well, the timestamp of transaction A, which is 1, is between the range of the begin and end timestamp of this particular record, right? Then it can read this record and continue, right? Here, transaction 2 comes along and want to write the record A. So here, instead of overwriting this version 0 of this tuple A, what we do is that we are going to create a new version for this tuple, right, record A1. And then with a different value and also with the begin timestamp as the timestamp of the writing transaction, we do repeat 2, right? And then end timestamp would just be a placeholder here, right? Say infinity or not, right? Doesn't really matter. And then, of course, then this earlier version would become old, right? So while I'm writing this second record, sorry, second version of this record, I'm going to set the end timestamp of the earlier version to be 2, right? So this earlier version A0 would only be valid for the transactions between this time range, right? So all the later transactions with the timestamp greater than or closer to 2 if they need to go to the newer version, all right? So in this case, we're trying to guarantee that each transaction based on its timestamp and the position in the entire scheduling, they should read the correct variance generated by the corresponding transactions following the serial order defined by the timestamp, all right? So another thing to note that here is that beyond this beginning and end timestamp of each transaction, we actually need to have a separate location so that we keep track of the status of all the transactions whether they have committed or not, right? Because sometimes if we, you just brought it over here, right? If transaction T2 writes a new version to this record A, then before it commits, right? Other transactions, depending on the isolation level, right? For example, in serializable, other transactions couldn't really see this version, right? Because this transaction T2 has not committed yet, right? Other transactions, if they come along later, they should not see this version yet. So you need to keep track of which committed or which not. And so that otherwise there might be conflict and we'll get to the more details later. So here, transaction one comes back, right? And here, what happens here is that because transaction has a timestamp one, right? Now it wants to read the record A, or two for A again. But here, the timestamp is only between the range zero and two, right? Because that's the starting time point of this transaction. And then at that point, it can only see this version A zero. But it cannot see the version that has a bigger beginning timestamp as written by a transaction that is scheduled later, right? Because the transaction two has a higher timestamp than that one. So when transaction A reads A again, it only sees the earlier version of A. And then this would not generate unrepeatable read, right? Because otherwise, if you have only a single version, transaction will modify these two points, then it will be unrepeatable read, right? But in this case, we avoid that use, right, okay? And then, yeah, lastly, transaction two can just commit regularly, all right? Here, another example, right? Just give you a different look on this. Still, transaction A reads A, right? And then it can read just this current version. And then transaction A did it right on A, right? And then here, so create a new version and with the beginning timestamp of one, right? Again, end timestamp would be set to infinity. And here, it needs to modify the end timestamp of the A0 version of a tuple A and to be one, right? Corresponding to this transaction one. Later on, transaction two comes along and starts first to read tuple A. And then now, which version should it use, right? Because we see that in the transaction status table, transaction one is still active, right? So even though transaction two comes after transaction one, so if transaction one commits, obviously, it should be able to see the value of the written by the transaction, written by transaction one, right? But because transaction one has not committed yet, right? Reading this value from this version A1 right now would actually be a dirty read, right, or reading uncommitted data essentially. So because we have this status of all the transaction in the table, and we know that it has not been committed yet, and then we just check that. And then instead of reading the new version, we are still reading this earlier version of this tuple, right? So even though we have this end timestamp that is smaller than two, but because we have this table, we are still able to read the earlier version, all right? And then data on transaction two comes along and then try to write this record. So here, well, because transaction one is still writing, right? For example, assuming that we are using a log-in protocol, right? Assuming, yeah, so transaction one is still writing, and now because transaction two also needs to write, right now it needs to acquire a inclusive log as well, right? Again, assuming that we are using a log-in congressional protocol for now, then transaction two has to stop and wait for transaction one to commit. So here, transaction one come back and read this version, right? It's the same version. It reads the same version, it's written by itself, and then later on it commits, right? So here, transaction one status became commit, right? So now when transaction two wants to commit, then transaction two can see that, hey, transaction one already finished, right? It already did all its modification, and then the modification is already, all the modifications are in effect in the database table. So here, again, depending on the isolation level you specify, right? So under the highest level of serializable, you may not be able to commit this transaction, but then under snapshot isolation, right? Because it's only requirement is that every transaction only sees a consistent snapshot of the entire state of the database at a particular time point, then at this point, if the transaction, isolation level is a little bit lower, transaction two can actually commit as well, right? And then eventually write this new version to the database and then change its commit status, all right? So any question in terms of how do we maintain this timestamp logically, right? I haven't talked about implementation yet, logically. How do we maintain and modify the timestamp of different transactions based on what they read or write? Any questions? No? All right, cool. So this is actually a very, very important opposition that came up under the challenge of concurrency control and in fact, most of the systems, right? I think most of the systems in the latest 10, 20 years, they implement this opposition with a multivariate concurrency control. And this is actually just a bunch of systems that we collected that have implemented this technique. And you can see in most of the system, well, there are many, many systems that do this and it improved the performance of the database system, especially the effectiveness of concurrency control protocol by a lot, right? This is like a very, very important organization. So now I'm going to switch to the demos and show you the effect of this multivariate concurrency control, right? Especially in the context of Postgres. So hopefully this time it can go smoother than last time. Okay, so one second, it's not. All right, this is Postgres. So again, if anyone sits in behind and cannot see this demo, please come forward. I already tried to make the font a little bit larger, right? So hopefully you can see it better, but I don't know whether you can exactly see it. Put this here, all right. I can see that, all right? So, oh, for, sorry, for this system, I actually need to log in again. Okay, I'm in. If I set, okay, okay, cool. It's able to connect. Now what we do here is that, again, similar to last time, we have this transaction demo table, right? With only two records, record one, IDE value 100, record two, IDE value 200, right? So see, this is like the two posts here. And then at the beginning, what we can do is that, let me come to this one, sorry. We can use this query, right? To ask postgres to return not only the value of each tuple, but also return the position of this tuple. This is like a city ID. I don't know whether Andrew has showed you guys with the demos before, with the city ID, no? Okay, so here, okay, so step back. We are selecting three additional fields from all the tuples in this table, right? Instead of besides ID and value, we are also selecting city ID, Xmin, and Mx. City ID is actually the location of this tuple, right? It's essentially the ID of the page as well as the position of which slot this tuple is in that table. This combination will help you locate the content of this tuple on disk, right? And then Xmin and Xmax will just be the begin time stamp and the end time stamp of this particular tuple. And obviously, I mean, with this, we know that postgres is using multiple reconcursing control, right? I think I might think my CQ as well, okay? So with this, what we are going to do first, okay, first, we can notice that every tuple, all the tuples here, right, would have the same min and same max, right? Just means that it's being, all these tuples would be created by a transaction with a timestamp of 498, right? For the max timestamp, it's zero, but I think it's just a placeholder, right? It's like zero infinity, I think that's the same thing, right? All right? So what we are going to do here is that at the beginning, we are going to execute the transaction at the read committed level, right? Not the highest level yet. Show up here, okay, show up there, nice. So what we do is that we are going to first look at the tuples, right, here for this transaction, right? So again, it's a similar thing we saw earlier, right? It's exactly the same thing. And then for the second, and actually for better demonstration, we probably can also look at the ID of this transaction as well, right? This transaction ID would be 499, and then for the second transaction, the ID would be 500, all right? It's a two different transaction. So now what we are going to do is that we are going to issue an update query on the first transaction, and then you only update the record with the ID equals to one, all right? So right now, the first record would already have two variants already, right? So now, let's come back and see what we can see here, right? So here, what we can see is that even though the first transaction update this tuple, right, is the same, right, ID equals to one, but here from the second transaction's perspective, right? Because the second transaction has a higher timestamp of 500, oh, that's not correct. It's because the first transaction has not committed yet, right? So the second transaction, even though it has a higher timestamp of 500, when it comes back, it can only see the original version of this record, right? Which is, I mean, just the initial version we see with position zero and one. And here, for the definition of X max, I mean this, yeah, because the first transaction with ID 499 modified this record, then the max timestamp or the end timestamp of this record is changed to 499, right? So the next thing you may wonder is that, hey, what if I select this tuple explicitly, right? So like I mentioned here, the first city ID which means that the page ID as well as the slope number, then what if we know that we have two logical tuples, but we have a third version, right? What if I directly tell the system, hey, I want to know this, I want to see this version at this location, right? What will happen? And if you think the system should be able to see this record or not, it cannot, right? So because even though we know there's this tuple at this location, but logically, right, this first transaction has not committed yet. Even though I query this tuple, sure, I know it's ID, know it's attribute or whatever, right? Select it, but then at the end of the day when doing the concurrency control, the system is not going to allow the second transaction to see that uncommitted tuple, right? Even though it has all these variants, all right? So now come back to the first transaction, right? Let's see, just give you a double check, right? When the first transaction tries to select this tuple from this table, it is selecting the version that it has written before, right? This is at a position 03, all right? So the next thing I'm going to do is that what if now I come back and update this tuple again, right? So you see, so here because we are trying to update the same tuple, right? So depending on the invitation, so here Postgres is actually doing, again, is where we showed this two cars before, right? Postgres is doing a pessimistic locking concurrency control. So similar to the example I showed in class, we will use this update command because transaction T1 is already updating it, transaction T2 will just stall here, right? And then now let's just commit this transaction earlier. You see, right after I commit the first transaction, then I mean the transaction T2 can just continue and then do the modification, right? So here, let me just commit this transaction as well, right? So here we can, here we are seeing, oh, sorry, I just see, we can come back and then select everything out again, right? So here, yeah. So one thing that I'm actually personally not exactly sure, I mean this is like the same one I was trying it. So here the second, sorry, the first record ID equals to one. When we finish modification, it's actually have a max timestamp also equals to 500. I don't know why, so I thought it should be zero. So maybe 500 is also a placeholder, the same as zero, right? If it's a same value of zero, maybe it's all like to infinity, I don't really know, right? But I think theoretically speaking, right, if you have, well, not theoretically, it should just set the end timestamp for this record to be infinity, right? But I don't know why here it's 500 but not zero, right? Maybe just somehow Postgres use both to determine, to denote infinity, right? I don't really know. Okay, that's the first example that I wanted to show. Here, I want to show you an example of what if I increase the serialized isolation level a little bit, right? All right, here in the transaction, I'm here, we start this transaction, and then I begin the transaction with a serializable isolation level, all right? Actually, before doing that, any question on the first example I show? Yes, please. 4980, yeah, yes. Yeah, it's ID equals two, yeah, yeah, yeah, yeah, yeah. No worries, yeah, no worries, yeah. Any transaction on the earlier example? Sorry, any questions on the earlier example? All right, so let's start this again, but with a higher isolation level, right? Again, we should look at the ID of this transaction 502, all right? So what we are going to do here is that for the first one, right? Okay, let me, let me, for the record, let me just select this, what's it called? This is this tuple, right, from this table as well, right? Just to show you what would be this record as well as put the transaction ID here as well, just for the record. But it's the same value we've seen before, okay? So here what I'm going to do is that I'm going to first, again, update this, this transaction, update the value of the record ID, record with ID1 in this transaction, but also I'm just going to have this syntax to returning the ID of this transaction, right? 501, okay? So here what I'm going to do is that I'm going to do the same thing, right? I'm going to do this update again, right? Similarly, up until now, similar to what we've seen last time, this, the, the transaction, there's a second transaction is stored, right? Because we are updating the same record, right? At the same time, okay? So now what I'm going to do is that I'm going to commit the first transaction, but before I'm going to do that, I want to remind you that the last time we did the same thing, right? What happened to the second transaction? We will commit the first. The second transaction is, it's just, it's just un-stored, right? The section transaction is allowed to continue and then it's allowed to modify the record and then continue, right? But here, when I see, when I hit commit, you see what happened to the second transaction? Second transaction says that it could not serialize, could not have a serializable access, right? Because there's a concurrent up here. Why is that? Well, because we specified serializable isolation level at the beginning of both transactions, right? So this is essentially, this is a higher isolation level and in serializable isolation, we're not going to allow these two transactions to update this record at a schedule, essentially at a schedule with conflicts that could form a cycle, right? Essentially it will be read-write conflict and write-write a function of conflict. And then in this case, we are not going to allow this transaction to continue, all right? And then here, I think I just, I cannot commit, right? It just, it says it was back, right? Because this transaction has already been aborted, all right? Make sense? Any questions? Okay, cool. So lastly, right? And show you, let's see, let me see what I can. Okay, so I will show you a little bit the mix of this transaction isolation level, right? So here, again, in the first transaction, I start this transaction with a serializable and then for second transaction, I start with a read uncommitted, right? So it's like one high level and one low level, right? Let's see what's going to happen here, all right? Again, similar here, right? So I can select everything from this table, right? I mean, you see, this has a record of the max, the, well, yeah, for the first record, the n timestamp or x max became back to zero, right? That's what I expect to see. But I don't know why in the earlier example, it shows the same. But essentially, these are the two tuples, right? And then here, what I can do is that I can similar stuff, select everything, right? You see two transactions seeing exactly the same thing, and then I can do an update, yes, do an update on this transaction. But in this time, very differently, I'm updating the tuple ID equals to two, right? See, that's what I'm doing here differently. And also, I return the current transaction ID, low timestamp, okay? So now, when I come back, I'm going to do an update on the tuple ID equals to one, right? Okay? So now, this transaction also proceeds, right? Because it's doing an update on a different tuple. And what I want to do is that I want to do a select query, right? On this particular table. And what do we see here, right? Here, we see that for the first tuple ID equals to one, it is seeing the latest value it is written, right? But then for the other tuple ID equals to two, it does not see the latest value, right? It is still seeing the value that is earlier, right? Because the begin timestamp of the tuple ID equals to two is 498, right? That's a transaction we executed earlier, right? And then the end timestamp is 503, right? That's because the second transaction with ID 503 installed a new version and then changed the end timestamp of the first transaction to be 503, right? And here, what's interesting here is that I want to select all the tuples, right, from this second transaction now. And remember that we are executing the second transaction as read uncommitted, right? That's what we remember, right? That's the isolation level that we specify for the second transaction at the beginning. So in theory, what should we be able to see here, if we execute this query? Because I'm specified as read uncommitted, right? It's not required to protect anything. Then in theory, I should be able to see the latest version of no matter written by whichever transaction, right, because it's not protecting anything. But when we actually execute this transaction, what we are seeing here, right? We are seeing that the first version, let's see, the second transaction is modifying, actually, what's the ID of this first transaction? I kind of forgot, right? Let me select this ID, yes, 504, right? Exactly. So we can see that for the, so here, for the second transaction, right? So for the 2.2, that's what the second transaction has been modified, right? With this ID 503, right? That's for the ID, for the 2.2, ID equals 2, that's what's been modified by the second transaction, right? But for the ID equals to one, we see that we actually, even for the second transaction, running read uncommitted, it is still seeing the earlier version of the 2.2 with ID equals to one, right? It's actually not seeing an uncommitted 2.2 written by the first transaction. That makes sense because you see the first, on the second half of the screen, right? When we select everything from the table, the 2.2 with ID equals to one still has a range between 501 and 504, right? So that's before the first transaction modified. So what's happening here or what I want to show is that even though we specify the database system to run the second transaction as read uncommitted, that just means that the system is not obligated to protect this transaction from uncommitted data, right? It could still choose to protect it, right? And in fact, in this case of Postgres, because it uses multi-verdain concurrency control, it's actually just much easier for them to just naturally protect the range of the transaction based on the timestamp and find the correct version, right? If they want to let the section transaction to read an uncommitted variant of data, they actually need to implement some sort of special logic, right? And then go beyond their normal concurrency control and variant checking based on timestamp, right? Then they decide that it's not efficient and not, yeah, just not efficient and it's complicated to do that and then they just choose not to do that. So even though you specify read uncommitted, I mean they still guarantee that you do not read data from uncommitted transaction. Not guaranteed, but they still don't let you to read the data from uncommitted transaction, all right? There's just something that the RV system can choose, okay? So that's all the demo today. Any questions? All right, cool. Let me come back and continue the rest of the lecture. Oh, sorry, this is from the beginning. So like I mentioned, most of the system from the earlier slides, right? Most of the system in the past 10, 20 years would implement this multivariate concurrency control optimization and just give you a huge performance benefit, right? So it's a very, very important organization and then there are a few specific aspects in the database system, well, specific aspects of this concurrency control that I'm going to talk about in order to actually implement that in a system and implement it efficiently, right? Well, the first, obviously, would be the combination of this concurrency control organization with the earlier concurrency control protocol we talked about, right? But here, I'm actually not going to go into too much detail with the examples, et cetera, because at the most, mostly, at a high level, all those previous concurrency control protocol will just stay the same. You just add a new variance when you are writing to these two points, but then you are still doing the similar ordering, optimizing concurrency control, and locking, right? For example, here, if you want to combine multivariate concurrency control with timestamp ordering, well, I mean, you would still assign transaction with timestamps and then arrange the execution of this, or the scheduling of these transactions based on the earlier timestamps, right? I think the only difference is that in this case, you don't need to keep the special read timestamp anymore, because you all have beginning and end timestamp for each transaction already. But then the optimistic concurrency control, again, it's the same three-phase validation, the read, validation, and write phase we talked about earlier. The only difference is that right now in the private workspace, you will just install new variance in those private workspace, right? Instead of put a new variance immediately in this global table, right? So lastly, for the two-phase locking, again, very, very similar. The only difference is that instead of locking a global single physical copy of this tuple, you are going to lock the specific version of the tuple that you are trying to access, right? So you are trying to read an earlier version of the tuple with ID 100, then you're just to put a shared lock on the specific version with ID 100, right? Similar with the write. If you want to write to a tuple, you're going to just put an exclusive lock on the latest version of the tuple. That essentially put an exclusive lock on the, or the write lock on the tuple. No other transaction can write to that tuple anymore at the same time. But at a high level, all those protocols stay the same. So, and, yeah, unfortunately, due to time I'm not going to go into examples for them. So what I want to talk about is also it's kind of, it's pretty interesting here is that how do you actually organize the storage of those additional variants and how do you traverse them? How do you clean them up? Those are actually very interesting and important topics in terms of implement this multivariance control efficiently. The first is variant storage. How do you store these tuples? So a high level concept here is that for all these different variants of the tuples, right? We are going to different, different physical variants of the same logical tuple. We are going to chain those, chain them together using pointers, right? So essentially in each tuple, in each physical copy of a variant of a tuple we are going to have a specific field where we store a pointer pointing to the next variant, next physical variant of this logical tuple, right? And then there we are going to chain everything together so that we can traverse this chain and find the correct variant of tuple if a transaction want to access it. And then, well, the another thing to note is that the index, well, because in many cases, for example, you have a B pass through index or hash index, the record or the value of that leaf node in the index we're just pointing to the head of this variant chain, right? And then later on the transaction will just traverse this variant chain to find the correct tuple. And then there are different storage schemas, strategies, different methods with different trade-offs, right? So now we're going to talk about them. So they are at high level, they are essentially three ways to organize the storage of these additional versions of tuples. So first is kind of street forward, just because append only, right? So essentially every time you need to write a new record you're just going to append this new record to the end of the storage space of the entire tuple, sorry, entire table and then you're going to adjust the pointer accordingly, right? And it's very straightforward. And the second would be called a time travel storage. So it's similar, but it's almost the reverse way, is that instead of writing a new version of the tuple to the end of the table space, what time travel storage do is that you have a separate table space that store all the old versions, right? So essentially every time when a transaction wants to write a new version, it will first copy the current version, the existing original version of this tuple to the separate storage space, to the end of the separate storage space that store all the old tuples, right? And then it will just update the current version of this tuple in place to be the latest version. So location of the current tuple never change under this timed travel storage, all right? So a little bit different. The third one is, again, you can see it's kind of like a little bit of modification or extension on the second approach, which is that instead of copy the entire tuple with the original values, it will only look at what attributes of this tuple is going to be modified by the writing transaction, right? And then it will only copy the columns of these values of the columns to be modified to the separate space, right? And then update the new value in place instead of copying the entire tuple, right? So based on a few latest research, it shows that the third approach is typically more advantageous, right? Essentially, I mean, another third approach, you every time you do a write, you write less data, right? It's more efficient and also the changes or the values of the old tuples they occupy less space and it's also easier to clean them up later on, right? So a few latest research shows that the third approach is probably better, but they are actually still system using the first or second approach, right? There are some trade-offs. So give you some specific examples, right? The first approach, append only. So in this case, all the physical values of the tuples will just be all stored in this single table space, right? And then if they are not sorted by which tuple it is, right? So the different variants of different tuples will just be interleaved in this table together, right? And then for example here, say I want to add a new variant of this tuple A, all right? Right now, already there are just A0 and A1 exist. So what we do is that we just append a new variant of this tuple A to be A2 at the end of this table space with a new value. And importantly, what do we also need to do? We also need to adjust the pointers between different variants of this tuple, right? So here, we adjust this pointer from a tuple A1 to a tuple A3, right? So originally, we have this chain, right? Originally, we have this chain of two tuples, right? Point from a tuple A0 to A1. And A1, that's the end of the chain, right? Now pointer or something. And then after the modification, then this chain is extended by one, all right? That's it. Any question? Straight forward, right? Okay. So one question you may ask is that, hey, what should be the order of this variant chain? Because you can really go both ways with this, right? You can either do this chain from the oldest pointing to the newest, and then you can also organize this chain from the newest to the oldest, right? And they are actually kind of straightforward. People do both, right, by the way. But they are actually kind of straightforward trade-off between them. In the first approach, for example, from the oldest to the newest, then the benefit is that when you append a new record, right? Then just like we do here. When we append a new record, we only need to append this new record to the end of the chain, right? We don't really need to modify the other... Yeah, we don't need to update the pointer at a new location, right? Because the new variant is only append to the end, right? But then the disadvantage here is that every time we want to traverse this chain, right? Because presumably the newer transactions or most transactions would want to access the newer value, right? Because the timestamp is advancing. Then the disadvantage is that every time if you look at this chain, you are only going to look at the oldest variant first, right? If you want to look at the newer variant, you have to traverse to the end of the chain, all right? But then on the second approach, with the newest to the oldest, right? Then the advantage here is that you are only going to look at... You could be able to look at the newest version at the beginning, right? But then the disadvantage is that right now the head of the chain is... Keep changing, right? Because if you do newest to oldest, then the head of the chain is always going to be pointed to the newest version of the tuple, right? So every time you insert a new tuple, you have to change the head of the chain to be pointed to the new location. In this case, we need to update the index, okay? That makes sense, all right? So next, we're going to talk about this time travel storage, and we can actually see that there are different implications if you organize them with different orders as well, all right? Actually in this case, oh, actually that's the third approach, but in this case here we are just assuming that we are using newest to oldest, right? Sorry, let me take it back. Here we are only assuming we are using newest to oldest for now, okay? So here we are doing a time travel. Every time we want to update a record in the database, we are going to copy the old record to the separate table space. We just call it time travel table. Just means that these are the tuples from the oldest times, right? And they represent the value of the tuples in the history, so it's called time travel table. And then we are going to update the pointers in the main table, okay? Here what we do is that say we want to update, give a new version of this record A as well, we are going to first find a spot for this record A in the time travel table, and we are going to copy the current version of A to that time travel table, all right? And then of course we have to update the pointer to point into the older version before that, right? That's A2 pointing to A1. And then now we are just going to overwrite this version here to be the new value, right? That will be 333 for example, right? Then we just update this value in place. And the next thing we can do, what we should do is that we need to update the pointer of this new version to point to the latest older version or the original version before that, right? So essentially you always update in place. And then this is the newest to oldest. And then the benefit, one benefit of this approach is that even though it is the newest to oldest, but then because you always update in place, you actually don't really need to change the pointer pointing to the head of the chain, right? Because the head of the chain is always in the main table, at location number change. And another benefit of this approach is that you could actually, because all the older versions are in this time travel table, and again in actuality, I mean most transactions would actually just access two posts that are recently written. And for the two posts that are written a while ago, I mean in practice you just rarely access them. So another advantage of this approach is that you in fact can take the entire time travel table to a lower tier storage device, right? For example, a lower HDD. Then I mean you save money and then you don't read them that frequently anyway. So it probably can be overall money wise more better, all right? Any questions with time travel storage? Okay, so lastly, delta storage. Again, so this is sort of an extension of the time travel storage, right? It's just a set of copied entire tuple. You copy the super single values modified by the writing transaction. So here the, okay, so just to give you illustration on this, right? So every time you want to update this tuple, say here you want to update the only, for illustration purpose I'm showing you one value, right? But in actuality you can imagine the table can have a thousand value, right? And you only need to look at one. But here, assuming that the table only has one attribute called a value, and the value of the value attribute is one on one, right? So when the transaction wants to write to this value, what do we do? Is that it will just copy the value of this value column to be to this delta storage segment, right? And then of course, with every delta record we'd also have a pointer. But since this is the first record, the pointer would just be a now pointer. And what do we do? You are just maybe pointing the, make the pointer of this current version of A, which would be A2 pointing to the original version, all right? And then, right, right. You just, assuming that a while, I mean, from some time later, you want to update this record A again, right? I mean, assuming that you want to do an additional update. So what you'll do, you'll just copy the current value of A2 again to this delta record storage segment, and then pointing to, pointing the, make the pointer of A2, point to A1, right? Here, you can just do the modification, right? And then update the current value of A3 to be 333, and then make the pointer of A3 point to A2, right? Then you'll keep your version chain maintained with this delta, deltas, all right? Any question on this delta storage? Okay, oh, yes, please. Actually, you'll see the pointer column, right? Oh, no, yeah, actually not. It is a good question. So for simplicity, I'm only showing you one attribute here, but in actuality, you may write one or two or three attributes, right? So you actually write all those attributes together in a single record. Well, depending on the width of this table, right? Say if the table has a thousand columns, right, or a thousand attributes, but then you only update three, then you'll still be saving a lot, right? But, yeah, yeah. But yeah, if the table has a thousand columns, you are updating a thousand columns, that's just the same as the original, I mean, copy the entire tuple approach, all right? And what to be noted here is that when the transaction wants to access those tuples, right? What needs to happen? So unlike the earlier two approaches, that you just keep traversing this version chain, and based on the begin and end times that we talked about earlier, you directly look at these tuples. Now, while you are traversing this version chain, you actually need to apply the changes back to the current tuple, right? Because nowhere in the version chain, we are storing the value of all the tuples, right? So while you are traversing it, you also actually need to maintain a temporary copy of this tuple and keep applying the values back so that you can restore a version of a tuple at a specific timestamp based on those deltas, right? And then that also leads to another important property for this delta storage, is that as far as I know, people always implement delta storage from the newest to oldest, right? So that has a little implication here, which is because exactly like I said, in order to acquire a specific version of the tuple, you always need to apply the changes to the tuples back to that tuple from the newest to oldest, right? So you can organize this version chain from oldest to newest, that's fine. You can still locate a tuple from the oldest to newest. But after you locate that tuple, you still need to go to the latest version of the tuple and then apply all the changes back, right? So it's kind of like redundant if you go with oldest to newest, but from newest to oldest, while you are trying to locate this tuple, you also maintain or restore the values of this tuple in the meantime, right? So you just are killing two birds with one stone, all right? That makes sense, all right? So the next topic is that also, again, very important is that in this multivariate concurrency control method, one obvious issue is that you just keep accumulating variance, right? And then after your system running a while, then you can have many, many variance, and then they can actually take up lots of space. And then you potentially, most of the time, you want to shrink this space and so that you can free up space for your database system to store regular records, instead of just this variance. And then that would just be called reclaim, right? So the system needs to essentially need to reclaim all those physical variance order in time. And essentially, there are two types of physical variance that we can reclaim. The first type is that for all the variance, that no active transactions can see that variance or would be able to access that variance anymore, then we can reclaim all of this, right? Because time is always advancing, right? If right now there's no transaction that's old enough to fit its timestamp to the current range of my tuple, of this variance of my tuple, then the later transaction won't fit in this range anymore either, right? So then just safely reclaim a variance of the tuple that cannot be seen by anyone. And then the second type of variance obviously is that if the transaction aborted, then the variance that it created would just can be reclaimed, right? And then there are two design decision questions that we are going to answer. The first is that how do we find out the expired variance of the tuples and then how do we reclaim these tuples safely, all right? So at high level, there are two types of approach. One is at the tuple level. The other is at transaction level. So the tuple level, again, at high level just means that you're going to examine each individual tuple, right? And look at, hey, can I reclaim this or not? Can I reclaim that tuple? Can I reclaim that tuple or not? And then so on and so forth. Again, I'll give you details. And then for the transaction level, it just means that instead of looking at each individual tuple, I'm going to look at for each transaction what kind of new variance that it created as well as what kind of old variance it overwritten, right? Because if a variance became old, it must be overwritten by some transaction, right? And then change the variance change. So by that, with these transactions, if we keep track of those information, then at a higher level, based on the timestamp transaction, we can directly locate, hey, what records have been invalidated by which transaction and then directly reclaim those records as we organize those transactions instead of looking at individual tuple separately, all right? This is like a two high level, by the high level approach. And then another thing to mention is that for the tuple level, they're actually two different choices, right? Either you can look at each individual tuple with a background thread, just to keep looping, look at individual tuple, or you can do that cooperatively, right? Which means that while different transactions are reading those variance chains to access to tuples, you can, along when that happened, you can just remove the unnecessary variant of the tuple when you identify them unnecessary, right? While you are trying to traverse those variance chains. So again, it's kind of like something that you can, transactions can conveniently do while they are executing instead of having this separate background thread, all right? So first, I'll talk about the tuple level garbage collection. So I'll give you this specific example, say that we have two threads that are managing two different transactions. The first transaction has ID 12, the second transaction has ID 25, and we have three tuples here, one variant for tuple A, and then two variants for tuple B, all right? And then here, we are going to look at the background of vacuum first, right? Essentially a separate vacuum thread is going to go through each individual tuple in this table and look at whether I can reclaim it or not. And then, by the way, this works with any storage format, right? It doesn't matter whether it's a delta storage, apparently, or a time travel table, right? Then we can work for any of them, right? So when this vacuum thread comes along, what we will do is that if you look at what would be the current running transactions and their IDs, right? And it can obviously, to find a smallest ID, or just record the ID of the all running transactions, right? And then it can go through each individual tuple on the right, right? For example, first, it can look at, hey, whether there's any transaction would fit into the begin and end time stamp of my first version of the tuple A, called A100, right? It doesn't fit. And similarly, for the second version of the tuple, it doesn't fit either, right? But while scanning, it realized, hey, for the third tuple, with the tuple B100-01, I mean, the DR transaction could fit in its range, right? So what it would do is that it would just reclaim the space for the first two tuples, right? And then give full data up, all right? Make sense? So one thing that is a little bit more clever is that I think it's actually pretty clever. It's actually from Postgres is that, well, obviously, it is just very, very costly to scan all the tuples in a table, right? Or once in a while, right? So what Postgres does is that instead of just blindly scanning all the tuples, it actually maintains a dirty block bitmap, right? To see that, to keep track of, hey, from the last time I started this vacuuming, right? And try to reclaim all the unused versions, then from that time, what would be the pages that have been modified by different transactions, right? So the next time when Postgres needs to do the vacuum, it only needs to scan through the pages that have been modified since the last time vacuuming started, right? It doesn't really need to scan through all the pages. So this can greatly save the time cost for the vacuum, right? Which I think is actually pretty clever, all right? So the second approach I kind of mentioned would be a cooperative cleaning, right? So again, like I mentioned before, instead of, I mean, having a background thread always like looping through all those tuples, all the data pages, we can actually let different transactions to identify variants that are no longer be useful while they are traversing the variant chain, trying to find the correct variants of tuples that they want to access, right? So by the way, this only works with the oldest to the newest, right? Because I mean, from the newest to oldest, then there could be oldest tuple that you're never going to need to read, right? Then you can usually just never going to reclaim them, all right? So here, for example, right? Let's say still we have these, I mean, two transactions, right? Transaction one, for example, that's what we are going to focus on. And let's say eight wants to access the value of tuple A with this index, right? So through this index, it can first locate the head of this variant chain, right? So now, while it is traversing this variant chain, right? It's from the newest, sorry, from the oldest to the newest, it can identify that, hey, the specific variant of the tuple may have a timestamp that is, in the timestamp, that is already smaller than any of my current running transactions. So sort of maintain watermark of what is the smallest ID or timestamp of all the currently running transactions. And then if you realize that, hey, no current running transactions have a watermark or timestamp that is smaller than any timestamp of zero, you can just destroy it while you are doing the scanning. And similarly, you destroy whatever records you identify that is no longer be accessible by any active transaction and you just remove them and then redirect your variant chain to this newest variant, all right? To the newer variant, all right? Make sense? Okay, so next, just briefly, for transaction level or garbage collection, again, it's a little bit similar, well, not similar. I sort of already described it a little bit earlier, is that for each transaction, besides trip trapping of what has been written, what has been written, you're also going to keep track of what will be the variants that be invalidated by the update of this transaction, right? And then based on that, you can actually reclaim all those variants all together if this transaction is out of the scope, okay? So here, I'll give you an example. Let's say we have this transaction at two tuples, right? But it's a transaction in the thread one and it can, for example, just update the value of this variant A, right? So what it does is that it will look at this variant, create a new variant A3, right? And then it will keep track of this variant A2 that it has overwritten, right? So because the A2 is no longer a latest variant anymore, at some point in time, we've got to reclaim this, right? We don't know when yet, but we know that it will definitely be reclaimed at some point. So of course, we'll keep track of the pointer here as well, right? And similarly, B come along, right? We write a new variant and then keep track of this variant B as well, right? So when this transaction commit, what it will do is that it will send these two different variants, right? As soon as this commit and begin and commit ID of this transaction, right? To the specific vacuum component of the database system, right? Then this when this, well, you still need actually need a vacuum thread in this case, right? But instead of the vacuum thread go over into each individual tuple, what it will do is that it will just go over the transactions that have finished, right? And then when it realizes that all the transactions, all the current running transactions, have an ID greater than 15, then it can already know that all the variants that are kept or that are invalidated by the transactions with a commit ID smaller than 15, they can all be freed, right? Because no other transaction will need to access them anymore. Instead of going through individual tuples, it can directly locate what variant of tuples have been invalidated by earlier transactions and directly freed them up, all right? Like this? So for this lecture, we still need to go over like three, five minutes, right? Apologize for that. And because I don't want to go to the next card because I want to finish this, right? So another thing I want to mention is that because you now have a different variants of tuples, you now need to manage them, how to interact with them with the index, right? So essentially what we do with the index is that when we're trying to update a variant of the tuple, what we will do is that obviously we need to update the corresponding pointer in the index that pointing to the head of the variant chain, right? So what we need to do is that we need to do a delete followed by the insert, right? We first need to delete that tuple and the insert that tuple back again with the correct head of the variant chain, right? And at least in the primary key index, that's what we need to do. But of a secondary index, that's actually different choices, right? Different systems do different things. And that's actually exactly why Uber switched all their database, switched their database that they are using from Postgres to MySQL a few years ago. It's exactly because Postgres have an inefficient way to manage these pointers, especially in the secondary indexes for this different variant of tuples for their use case, right? So they had to do this expensive switch, start from Postgres and then switch to MySQL. So essentially, again, like I mentioned before, for primary index, you always have to have a pointer to point to the head of the variant chain and you need to do a delete followed by insert. But for secondary indexes, there are two different choices, right? You can either do this logically or do this physically and they have different trade-offs. A logical pointer would be that instead of directly pointing to the head of the variant chain, you can sort of restore a logical identifier of this tuple. Could either be the primary key of this tuple or some indirection tuple ID, right? So when you are updating the head of the variant chain, you only need to update that indirection based on this tuple ID instead of updating the actual value in every secondary index, right? But the second approach, obviously, would be that you directly update the value in the secondary indexes. Then when you are changing the head of the variant chain, you actually need to go through all the secondary indexes, including the primary index, and then do a delete followed by insert, which could potentially be costly, right? Let me just quickly use this, right? Put in the first approach, right? In the index pointer, so what you will do is that in the, well, again, assuming that we are doing an append only and a newest to all this, right? So we are only appending the newest variant to the head of the variant chain. So here, let's say the primary index wants to update, get this value, right? Then what would the primary index do? It just would always store the physical address, because you have to store the physical variant pointer somewhere, right? But then, so how we store that? But then for the secondary index, what it can do is that it can either directly store the physical address and then link to that head of the variant chain. But the problem I can mention is that what if you have many, many secondary indexes, right? Then in this case, for each secondary index, you have to store a pointer and when you modify the head of the variant chain, you have to go back to individual secondary index and then do a delete followed by insert for each single one of them to flip the variant chain, right? So instead, what you can do is that, for example, you can use a logical pointer here, this here could be a primary key, then you just to install, it's essentially store the primary key of each tuple as the value in your index, right? Then every time you need to look up that tuple, you actually first look up the secondary index and then you get a primary key, you do another look up on the primary key index, look at that tuple. What the advantage would be that you only need to flip the pointer on the primary key index, right? Of course, you can also use a separate data structure to store a logical, essentially, you can add a different indirection layer and then add this pointer there, but that's actually a less common, right? Because that's just add additional management overhead. It's almost like you are maintaining a third index, right? That is a less common. And then here, another, I think it is a last issue, right? Which would be that when you are storing this multi-variant tuples in the index, right? The one tricky problem is that, again, the index doesn't really know you have a different variance of a tuple, right? So say you want to insert a tuple in this index and then there's already exist a tuple with this key in the index. You just don't really know, hey, whether this variant is from a different transaction, whether it's committed or not, or whether it's from an earlier variant inserted by the same transaction, right? You just don't have that information, so there can be issues, right? So here I just want to give you a one particular example of what could be the potential issue in the index. So here, see that you have a transaction, you have this one tuple, tuple A, right? And then, for example, you have the first transaction, right? Just do a read on it, right? That's very easy, right? Second transaction come along do an update on A, right? Also very easy, right? Just modify the variant chain and then install the new variant, right? And say the second transaction here, it wants to do a delete, right? So what, so this is fine, right? So assuming that, so we're not going to go to the details but assuming that we just use a mark, right? To notify, to illustrate, to mark that this tuple has been deleted by the second transaction, right? That's fine and the second transaction can commit, right? I mean, because this is perfectly fine. The problem would happen is that what if now there's a third transaction come back and then try to insert this variant A, right? So the tricky thing here is that because we are doing multi-verioning when we delete the second, this record A by the second transaction, there may still be other transactions for example, access a variant A, I still be reading, and then we cannot just directly delete this variant A right away, right? We have to wait for the garbage collection process we talked about earlier to reclaim this variant. So now when the transaction three wants to insert this variant A, it can insert but then in the index there's already A there, right? So there are essentially would be two variants or two variant chains of this A exist, right? And of course you can deal with that but just additional complication that would involve if you have a multi-verion and then in this scenario you have to deal with the case that there could be duplicated keys for the specific record here, all right? And yeah, this essentially summarizes what I talked about and then the solution, again very high level we are not going to do into details, right? Then the solution we are going to apply here is that essentially we are just going to lock this transaction, right? So when we delete a key from this table, we are not going to allow any new transaction to insert anything into this table unless we commit, right? Then after that we can continue and then you can either use a deleted flag or use a, you can either use a flag or specific attribute in the tuple to denote that this key has been deleted, right? But that would actually be an additional either a bit or attribute in the tuple itself or a different approach is that you have a tombstone variant, right? Which would be that you append a new variant of this tuple to the variant chain but this is a special variant of the tuple that has a null value that represent that this tuple has been deleted, right? And then when that is happening you are not going to allow any new insert to this table on this tuple anymore unless the transaction has committed. And then just this is the last slide, right? Essentially, I just want to show you a summarization on what kind of combinations of protocols, variant storage, complication, and indexes that a different system use to implement multivariate concurrency control within combination with the Congress in Control Protocols we talked about earlier. And essentially, I mean, different system you can see use various different things, right? They have different trade-offs on different workloads and just, I mean, we talk about different trade-offs they may have in this class and then in practice, you just go to a pick and choose, right? So as a conclusion, MECC is widely used, a very important organization used in so many modern database systems and then different combinations will give you a different trade-offs and even for some nosical systems, right? I mean, they don't even support multi-statement transactions for example, yeah, for example, nosical systems they would actually, many of them, still use multivariate to improve their performance. All right, so that's all today. And the E-Trump hit us a saint-eyed group on the job.