 My name is Dana, so I'm another one of Andy's PhD students and probably going to be the last of his PhD students to present to you before he gets back. So today I'm going to be presenting your very last lecture on concurrency control. So before we start, a couple of reminders. The first is that Project 3 is due on Sunday, November 17th before midnight. We also released homework for last week and it will be due on November 13th before midnight. Any questions before we begin? All right, so today we are going to talk about multi-version concurrency control. Just make sure I adjust this here. So the first thing I want to point out about multi-version concurrency control is that its name is a misnomer and this may cause some confusion because it's not actually a concurrency control protocol like the ones that you've been learning about in the past two lectures, which are time stamp ordering, OCC, and two-phase locking. Rather, it's a way to architect the system when you have concurrent transactions running by maintaining multiple versions. So recall from last week your discussion of optimistic currency control where transactions maintain a private workspace and any time they read or wrote to an object, they would copy that object into that private workspace. Well, multi-version concurrency control is similar to that idea. Here, instead of having a private workspace for each transaction where we maintain these different versions, we're now going to have the versions be part of a global database and we're going to determine whether some version is visible to a particular transaction. So MVCC is used by almost every new database system that's been built in the past ten years or some variant of it, but it's not a new idea. So it's actually decades old and the first reference to the idea was in a dissertation by a PhD student in MIT in 1978. So it wasn't until the early 80s that the first implementations of it actually came out and those came out of a company called DEC and they were called RDBVMS, which stood for Relational Database for VMS or VAX, which was an old operating system. And the other product was called Interbase. So DEC used to be a major computer company. It was bought out by Compaq in the late 90s and then a few years later bought out by HP. So it's no longer around, but it did some major pioneering work in database systems. So both RDBVMS and Interbase were built by a guy named Jim Starkey, who was also credited as being the inventor of jobs and triggers. So he's a big deal. He later went on to co-found NUODB, which is a new database startup and it also happens to use MVCC. So DEC RDBVMS was bought out by Oracle and is now known as Oracle RDB. And Interbase was eventually sold by DEC. It went through a few different holding companies and finally was open sourced. And so now it's known under a different name. Now it's called Firebird. So it may not be as well known as my sequel or Postgres, but it's one of the earliest open source databases out there. And Andy had this whole fun fact in there from last year, so I'll go ahead and say it. So if you've ever wondered why the Firefox web browser is named Firefox, it's because they were originally called Phoenix, but then they had to change that name because it conflicted, you know, with another system or another product. So they changed it to Firebird, but then they had to change it again because it conflicted with this database system. So finally it was called Firefox. So the main benefit again, like what you have to understand about MVCC is that writers don't block the readers and the readers don't block the writers. So it's only when you have two transactions trying to write to the object at the same time that you have to fall back and rely on one of the concurrency control protocols like two-phase locking. So again, you only need to do this when you have a right-right conflict. So at a high level, the way this works is we're going to assign time stamps to transactions when they arrive in the system and then we're going to provide it with a consistent snapshot of the database as it existed at the time that that transaction arrived. So this means that they won't see changes from transactions that have not yet been committed in their snapshot. And just to clarify, this is a virtual snapshot, so it shouldn't be confused with, you know, a physical snapshot or copying the full database to another location and then running that transaction on it. So again, this is just virtual. So MVCC is really useful for read-only transactions because the SQL dialect allows you to declare when a transaction is read-only. And if you do this, then the database system does not require you to get any locks or maintain the read-write sets. And this works again because it has a consistent snapshot and will only see the changes that existed at the moment it started. And this makes these read-only transactions really efficient and also really fast to do. Yes? How do you recommend a virtual version of the snapshot? I mean, just like I mentioned a minute ago, it's essentially just maintaining a version table or version-made data information. And it's very similar to OCC where you understand the read-and-write sets. And we're going to clearly go over this in a lot of detail in the following slides. This is going to be the topic of this lecture. Yeah? So it's the... We're talking about two reads here. They will read the same snapshot, the same version. So snapshot, you know, it's more commonly, I think, referred to as the version, you know, of the two-bull or database object. When do we bring in a new version of the snapshot? I mean, we're talking about two reads here. They will read the same snapshot, the same version. When do we bring in a new version of the snapshot? Well, can... So bear with me for just, like, you know, three minutes, probably less. And we'll actually... That is the first thing we're going to cover. And I'll answer any other questions you have afterward. All right. So just to finish up on this slide, another advantage of MVCC is that you're able to support something called time travel queries. So these are queries that actually let you ask the database system, for example, what was the state of the database, you know, three days ago, three years ago, and using this versioning, they can actually answer these sort of queries. All right. So the idea of time travel queries was first... was an idea of Postgres, and it originated from Postgres in the 1980s. But Postgres actually removed these time travel queries from their current product, like as soon as people outside of academia started using Postgres more heavily. Can anybody guess why? Well, so the reason why is because essentially what you have to do to actually support time travel queries is you never throw away old versions. So you never garbage collect, right? So over time, you know, the more and more transactions that commit, your disk space will be filling up very quickly, and eventually it will be full. And probably very quickly, depending on the speed of your transactions. And the other thing is that time travel queries are not really needed by a lot of applications. You never really look at... You never go to a website and say, like, okay, I want to know what this web page looked like three days ago. Not in most use cases. But Andy mentions that one common use case for these time traveling queries is in the financial industry. So the reason is because, you know, due to whatever rules and regulations they have to follow, they have to actually maintain the past seven years of transaction history. So these time travel queries actually allow them to very easily query the database and figure out what, you know, some of money, what their total revenue was, or whatever they want to look up, you know, over the past seven years. All right, so in the next two slides, we're going to go over two examples. And what I really want to emphasize here before we start is that MVCC is independent from concurrency control protocols. So the purpose of these examples is just to basically show you how we, you know, update versions and timestamps in the table and also basically, like, how... Sorry, how we figure out which version... How we figure out which version is visible to the particular transaction, right? Which version of the tuple is visible. So this first example, we'll see how this is going to work. So right now, like, the first thing to point out is that now we have this version field, right? So we can see in this version field that it's assigned to A0. So this means object A version 0, right? So we can assume that some other transaction has written the value 1, 2, 3 to the database and whatever transaction wrote it was assigned a timestamp of 0. And we'll go over why in one second. So we also have a begin and end fields. And so these are just timestamps. It doesn't matter if they're logical, physical, hybrid, as long as they're always increasing and follow the other, you know, I guess our valid timestamps, right? Like you learned in the past few lectures. All right, so let's begin. So when a new transaction arrives, so we're going to be looking at transactions T1 and T2. So here T1 arrives and it's assigned a timestamp of 1, right? So now we're going to begin. Okay, so the first thing we want to do is we want to do a read on A. So what we're going to do is we're going to consider, you know, transaction once timestamp, which is 1, and we're going to take a look at our table and figure out which tuple is visible to it by finding, you know, where its current timestamp is between the end and between beginning and end. So in this example, the beginning is 0 and the timestamp of 1 is between 0 and the end, which is infinity, right? So it's going to go ahead and read version A0, right? All right, so now we have transaction T2 and we're going to assign it timestamp 2. So the first thing we want to do here is we want to write A. So at this point what we're going to do is we're going to create a completely new version of A, which will be A1, right? Because we're just incrementing the version counter. And so what we're going to do here is the beginning timestamp is going to be set to the timestamp of T2. The end timestamp will again be set to infinity and then the last thing we're going to do is we're going to update the end timestamp of version A0 to also be timestamp of 2, right? For transaction 2. All right, so one thing you might have noticed that we're missing so far is, like with just, you know, the information that we had so far before this transaction status table popped up, the one thing that we're missing is that we don't really know the current state of the transactions and the database. So for example, you know, the transactions here are currently active, but what if they aborted? You know, then you would have to go back and reverse the timestamps accordingly if it was aborted, right? So as you can see, here we're just going to start filling out the transaction status table. At this point both transactions are active. Then finally we're going to do this read on A. So what version is it going to read? Anyone? Right. A sub zero, because again it's timestamp still lies between the beginning and end here. So it's going to go ahead and read version A0. And finally it's going to commit. So at the very end after this commits then transaction T2 will commit. We'll update the status table and we can blow it away eventually. So for the second example we're going to start with sort of the same setup, right? So we have transaction T1 with timestamp of 1 and in transaction T2 we're sending a timestamp of 2. And it's the same state in the database table and so far we're just starting transaction T1. We're saying it's timestamp to 1 and the status is active. So first we're going to do a read on A. I think at this point it's pretty clear that we're going to read version A0. And next we're going to do a write on A. So again just like in the last slide we're going to create a completely new version and sort in our database table where it's going to be version A1 with value 4, 5, 6 and the beginning timestamp will be a 1, right? It will be whatever this timestamp is and we will sign to infinity again. And the last thing to not forget is that we need to go up to A0 and assign the end timestamp to be the current timestamp of transaction T1, which is 1. Alright, so now we're going to begin transaction 2. So the first thing we're going to do is a read on A. So in this case which transaction is it going to read? Or sorry, which version is it going to read? Excuse me, A1? And why is that? Yeah, so in this case it's going to be... Oh, sorry, I forgot. So one thing that we have to pay attention to and this is a little tricky right now. One thing I forgot to mention at the start is I understand you guys didn't have time to go over isolation levels. So Andy wanted you guys to just review the slides and also the lecture from last year. So I'm just going to provide some high-level hints for isolation levels for when you go over those slides and the homeworks, right? But it might not make full sense at this point. But basically like at a very high level, depending on the isolation level you have, it may choose either version A0, A1, but let's assume it's sort of... it's the strict serializable... or excuse me, serializable isolation, which is sort of what you guys have been using up until this point. At this point it has to read A0 because A1 has not yet committed. All right, so now we're going to do a write on A. And so in this case, what's going to happen next? Well, again, here we have a write-write conflict, right? So assuming we're using 2PL, T2 is going to have to solve until T1 commits, right? So let's keep this going. So now we're back to T1. We're going to do a read on A. And in this case, it's going to just read the same version that it wrote a couple of minutes ago, right? And then it's going to go ahead and commit. All right, so now we can go back here and we can go ahead and now we're going to create the new version A2 with value 789. We're going to assign it the timestamp of 2 with an n timestamp of infinity. And we're going to update the n timestamp of A1 to 2 as well, right? So at this point, you know, whether T2 actually commits or not is really dependent on the concurrency control protocol as well as the isolation level. So that's something to keep in mind. But really this example, the purpose of this example is just to show you how we update the object versions, maintain the transaction status table, and also figure out which tuples are visible, right? Any questions on this? All right, so as I mentioned, a few, oh, yes. So like you said in a previous slide that a writer won't block a reader, right? So when you are updating that version table, so don't you need to have locked on that table? Because like if I'm like, if I did a write and I am like updating a value, like I'm adding a final value to... Are you talking about... So this right top corner table, if I'm a writer and I'm updating this table and somebody is reading this table, then don't we need to have locked on this table? Oh, well, so this, again, this is a very high-level example right now and there's actually, we're going to go into how you actually store this information later on in this lecture. So that will answer the question. Basically it's, in some cases, yes, you do need to consider locks. It really depends on how you're actually storing this version information. So if we don't cover it in a few slides, because there's multiple ways to do this, so I don't want to just list them all out now. If we don't cover it in a few slides, please ask your question again. All right, so again, like MVCC or its variants are used in almost all new database systems and these are just some examples of the systems that use MVCC. But what we really want to emphasize for the rest of this lecture is that MVCC is a lot more than just maintaining the timestamps that I showed you in the previous two examples. There's a whole bunch of other design decisions that you have to make in order to actually implement a system that supports MVCC. So we're going to go over those next. So what exactly are these design decisions? Specifically it's what concurrency protocol you're going to use, how you're going to maintain and store the different versions, which relates to the question that was previously asked, how you're going to clean up the old versions, once they're not visible to any transactions anymore, and how you're going to ensure that the indexes point to the correct version. All right, so the first thing we'll cover is... Skip that one. Okay, so that's what I thought. Okay, so the first thing we're going to cover is concurrency control protocol. All right, so this is basically... Oh, I'm looking at the... Sorry, I'm not used to this present review. My bad. Okay, concurrency control protocol, this is our first consideration for our design decisions. So these are the concurrency control protocols that you guys have been studying for the past two weeks and the past two lectures. And again, when you encounter a right-right conflict, you need to use one of these protocols, whether it be two-face locking, OCC, or timestamp ordering, to figure out which transaction should be allowed to write to that object and what isolation level you're running at. So we're not going to go into much detail on this, since you've just been covering it very recently. So the next consideration is version storage. So for version storage, what we need to do is figure out for a particular tuple version what should actually be visible to us, right? So let's assume for now that we're doing a sequential scan on the entire timetable and we want to know where to find the version of a tuple that we want. So the way we're going to implement this is we're going to maintain an internal pointer field that will allow us to find the previous or next version, we'll go into that more, for this particular logical tuple. So you can think of this as sort of a linked list where you can jump on into, you jump on in and land on the head and then you can follow the pointers in the linked list to find all of the different versions that are currently being maintained. So indexes always point to the head of the chain and, oh my God, I did it again, I'm so sorry, okay, technical difficulties. Alright, so indexes, like it says here, will always point to the head of the chain. Whether the head is the oldest version of the newest version of that tuple depends on the implementation. So there's different approaches that determine how we're going to store these different versions. So we'll go more into that next. So the first and simplest approach is called append-only storage. So this just means that every time we create a new version, we just copy the old version as a new physical tuple in our table space and update it. So then we update pointers to say here's the next version. And we're going to go over examples of all three of these in the next few slides. So the next approach is called time travel storage and this is where you have one master version table that is always storing the latest version of the object or tuple. Then you copy out older versions into a separate table that we're going to call the time travel table. And then at that point you just maintain the pointers from the master version of the table with the latest tuples to the time travel table. And so the last approach, which is the one that Andy prefers and thinks his best is called delta storage. So you can think of this as a diffs and git where instead of just copying the old version every single time and updating it you're just going to maintain a small delta of the modifications from the previous version. So we'll first go over an example of the append only storage. So again this is the simplest approach and this is also what Postgres uses. So each new physical version is just a tuple, is just a new tuple in the main table. So let's say we have a transaction here that wants to update object A, right? So the first thing it's going to do is it's going to find an empty slot in the table space and then copy the values from the current value of A, which is A1, right? That's the most recent value into that table slot. And then next it's going to copy the modified value into that table slot. And are we done yet? Not quite. The final thing we need to actually do is update the pointer to point from the older version to the newest version that we currently installed. So another aspect we must consider here in order to store these versions. So in this example A is considered the head of the version chain and in this example we're specifically ordering these oldest to youngest, right? So an alternative would be you can order them youngest to oldest. So if you're looking for the newest tuple in this case you actually get to the pointer where you get to version A0 and again you have to follow the pointers all the way down to the newest version A2. Does that make sense? All right, so like I just said the previous example used oldest to newest but you could also use newest to oldest and there's performance implications and tradeoffs for both of them, right? So with oldest to newest all you need to do when there's a new version is to just append to the end of the version chain, right? This is very simple. Append the new tuple and update the pointer to point to the newer version from the older version to the newer version. And this is a really easy operation to do. But if you do newest to oldest then what this means is that you have to add the entry and update its pointer to point to the old head, right? But now you have to actually update all of the indexes to point to your new version since again like we said a few slides ago indexes always point to the head of the version chain, right? So this means a lot more updates in some cases. All right, so for time travel storage this is the next approach we'll cover. And here we're going to have a main table that always has the latest version of each tuple. And then we'll have another table called the time travel table and this is where we're going to maintain older versions and copy older versions as they get modified in the database, right? So for this example let's say the transaction wants to update object A again. Same as last example. Then we're going to copy A2 into the free spot in the time travel table and then update the version pointer to point to the oldest version of tuple A. Then we're going to overwrite the master version in the main table to be the new version value. And finally we need to update the there, new version value. And then finally we need to update the pointer to point from the new version A3 to the version that we just installed in the time travel table which is A2, right? For each object or is it almost like a pen table? It would be in a pen table. Alright, so now we'll move on to the last approach that we're going to consider which is delta storage which again, this is used by both MySQL and Oracle and like I mentioned it's the one Andy thinks is the best option. So what's going to happen here is every time you do an update you're just going to copy the values that were modified into the separate delta storage segment that you see over here. So to update A we're first going to update its version value into the delta storage, right? So we're going to copy over its value so instead of storing the entire tuple we're just going to create a delta that states which attributes in the tuple were actually modified. So in this case there's one attribute so that was now reflected in the delta storage segment. Then we're going to update the actual value in the main table and also update the pointer from the new value into our delta storage. So similarly if we want to now install a new value a new version then we need to do something similar to the time travel table scenario which is specifically we append the new version update the value again and now we're seeing its version A3, right? But we also need to update the pointer from A3 to now point to the most current value A2 and additionally we need to update the pointer of A2 to point to the older version of A1 which is the time travel example. So when you want to read an old version what you essentially have to do is you have to replay the deltas to put the tuple back into its original form. So in this case if we wanted to if we wanted to read A1 we would start with the value of A3 and then we would follow the pointer of A2, apply the delta in A2 and then apply the delta A1 and that would get us back to the original value, right? So this is another good example of the trade-offs between reads and writes so reading old versions and the append only approach is really easy which is one nice thing about it it's easy to implement, right? Because you just find the version and the tuple is already ready to be turned so in addition to being easy to implement you also don't have to put the tuple back together, you don't have to apply deltas to get it back to its correct state. But with delta storage, writes are going to be much faster because we don't have to copy the entire tuple if we only make a change to a subset of the attributes. So you know if you just have one attribute like we do here this is clearly a trivial optimization but in many tables you might have different versions of columns in which case this can matter a lot. But again, yes with the delta storage that is the benefit but the disadvantage is that you have to replay the deltas again to put the tuple back together into its correct value. So one take we can get from this is like we mentioned earlier Postgres Postgres will be faster for reads because well Postgres will be faster for reads and MySQL will be faster for writes for this exact reason. So the third thing that we need to know about on our list is garbage collection. So all of these old versions are accumulating as transactions are running and finishing and at some point we know that a particular version is not visible to any other active transactions. So what this means is if you're thinking about the table with the begin and end time stamps and the time stamp version it means that there are no active transactions with a time stamps that fit between that begin and end range from older versions. So at this point we want to go ahead and garbage collect these versions in order to reclaim space. So two additional things that we have to worry about are how we're going to look for expired versions and when it's safe to reclaim them. So these are topics that we're not going to cover in this class but they are covered in the advanced class if you do choose to take it. So there's two approaches that we're going to look at specifically the first one is tuple level garbage collection and the second one is the transaction level. So tuple level means that we're essentially going to do sequential scans on our tables and use the version time stamps and set of active transactions to figure out whether the version is expired. If it is then we go ahead and prune it. So the reason why this is actually complicated is because not only do we have to actually look at the pages in memory but we also need to look at the pages that we've swapped out to disk because again we want to vacuum everything right. So we'll go over vacuum background vacuuming and cooperative cleaning in the next slide. So the second approach is transaction level which we're really not going to go into much detail about but the general idea is that you have transactions that maintain their read write sets and you know when they commit so the versions are right so in this case you have the transactions again they're maintaining their rewrites read write sets so you know exactly when they commit and thus you can figure out when they're no longer visible and come vacuum them. So we'll first go over an example of how tuple level garbage collection works. So let's say we have two threads running in the system where so transaction T1 is assigned a time stamp of 12 and transaction T2 is assigned a time stamp of 25 and then over in our version table you can see we have object object A which is assigned version 100 has a begin time stamp of 1 and an end time stamp of 9 and then we have a few other versions in there for object B. So with background vacuuming what we're going to do is we have sort of a set of threads that run in the back room and they perform this vacuuming where they periodically just do a full table scan of the table and look for which versions are reclaimable and it works with any type of storage here so for background so here so you have a background thread that goes into the the goes to the transaction and it basically queries what the current transaction time stamps are so in this case it's going to be 12 and 25 then it's going to do a sequential scan on the table to figure out whether the tuples would ever be visible to them so would A100 be visible to them hard to say because we don't know yet but they're reading and writing but sorry let me back up so again sorry in this example we're just looking at the begin and time stamps here so here we gather the time stamps of 12 and 25 from these two transactions and then we again look at the beginning and end time stamp so they will never be able to use A100 or B100 because the time stamp does not fall between 1 and 9 whereas they do fall well the time stamp of transaction T1 falls between 10 and 20 so it could potentially use that value and at this point we know those two tuples are safe to reclaim so we go ahead and do so so one optimization here well one obvious optimization here that we can do is we can actually maintain a bitmap for dirty pages and so anytime you modify it you can just flip the bit of the page that you modified so again we're maintaining a bitmap for all of the pages in the database pages specifically if we modify a page we'll flip that particular bit which indicates that that page is dirty so this takes a little bit extra storage but it's just a single bit for all of the pages in the database and anytime you want it and so when the vacuumer comes around it immediately knows which pages it actually needs to vacuum right so it will go ahead and vacuum that page and then reset the bit to 0 so vacuum vacuuming again is typically ran as sort of a cron job that runs periodically but in some database systems for example Postgres you can actually invoke vacuum manually from the from the SQL prompt for example it also has configuration parameters that you can set such that the system will basically start up a vacuum thread if over 20% of the pages are dirty for example so there's different ways to implement this there's different ways to optimize it for different workloads okay so the other approach we're going to look at is cooperative cleaning right so and this is basically where the threads as they're executing queries when they come across old versions that they know are not visible to anybody else it's their job to actually clean them up as they go along so again these are these are threads that are actually executing transactions and they're going to actually check the versions that they traverse across whether those should be whether that space is ready to be reclaimed because they're not visible to any transactions anymore and if it is they will go ahead and reclaim that space so one thing to note is if you consider the two orderings that we discussed earlier oldest to newest and newest to oldest would this approach work for both of those no right why is that that's exactly right so in the case of newest to oldest you're not going to be looking at any of the old transactions so you will actually never end up reclaiming those so it's important to note that cooperative cleaning only works with oldest to newest ordering alright so now we'll just go through a similar example here so let's say that we have an index and transaction and one wants to do a lookup on object A now so it's going to again land on the head of the version chain which is the oldest value and then it's going to scan along until it figures out which versions are actually visible to it so if it recognizes a version that it's looking if it recognizes that one of the versions that it's currently traversing is not visible to any other transactions then we'll go ahead and mark them as deleted and reclaim the space and then at the very end of us also update the index to point to the new head of the version chain so we'll just go through these steps so here we find the value and so we can see that version A1 can be reclaimed and then we recreate the pointer from the index to the new version head so ordering is actually important here and the ordering that's actually done on the slide is not quite correct so when you actually perform these operations what you would do first is actually update so the first thing you would do is mark them as deleted but you're not actually reclaiming the space yet the next important thing is that you actually update the index pointer to point to A2 before physically deleting them or reclaiming that space because otherwise if you have other transactions running concurrently they might find an empty pointer that points to nothing yes so it's going to maintain some information so they can figure out so it's going to know the set of active transactions and be able to compare those time stamps with the begin and end time stamps that in the version table correct alright so again transaction level, garbage collection here we just maintain the read write sets of transactions and we use them to figure out what versions are not visible anymore and then reclaim the space and that's really all we're going to say about transaction level, garbage collection so any questions on garbage collection or anything else up until this point so now we're going to move on to our final topic design decision which is index management so as I mentioned before the primary key index is always going to point to the head of the version chain anytime we create a new version we have to update the version chain or we have to update well we have to update the index to point to the new head of the version chain so this gets tricky when updating the primary key because now it's actually possible that you could have two version chains for the same logical tuple the way you implement this is when you want to delete the primary when you want to update the primary key you do this as a delete followed by an insert of a new logical tuple and there's some bookkeeping you need to maintain and you also need to understand how and when to roll back when necessary but for secondary indexes this is actually more complicated and this will be what we'll talk a little bit more about so with secondary indexes the two approaches we use to make sure that our indexes reflect the correct value in the version chain are to maintain a logical pointer and here you have some kind of false identifier for the tuple or some kind of unique identifier for the tuple that does not change and then you have some layer of indirection an indirection layer that maps the logical ID to the physical location of the database and any time you update the version chain you just have to update the indirection layer rather than actually updating every single index right so the actual approach is I think it was used in some of the slides earlier I think which is to actually use physical pointers which is when you just point directly to the head of a new version chain so every time the version chain gets updated you have to update every single index right so the difference between the physical between using physical pointers and logical is you basically have this indirection table and the benefit of the indirection table is that you do not have to update every single index every time you update your version chain so alright so in this example we'll say we have a simple database and we're using a penda only version chain which is running newest to oldest right so for the primary key index if I want to do a look up on object A then this will just be a physical address right for the primary key which will be just a page ID so you know which page to go to and then you take the offset that's typically what this is it's going to point again to the head of the version chain right and anytime you create a new version you always update that with the primary key right alright so for secondary indexes again you could use the physical address but there's the same issue anytime you update the tuple you have to update the secondary index to point to this and you know this is again like similar to some of the sort of the delta storage idea that we saw a few slides ago yes if you have one attribute or if you have one index or one secondary index then this is not a big deal but it's very common for OLTP databases in particular to have many secondary indexes on a single table so every time you update the version chain you have to update all of those secondary indexes which for OLTP you can imagine might be 12 or a few dozen and this of course is expensive because for example if it's a B plus tree then you are traversing the B plus tree you're taking latches as you go and then finally you have to apply the update so again like we said in the previous slide instead of storing the physical address in the secondary index we're going to look at two alternatives so the first is to just store the primary key which is literally just a copy of the primary key as the value in the secondary index so this physical address there we go right so here we're going to have the actual value that we're going to store in the secondary index is going to be a pointer to the primary key index so now when you want to find a tuple you first get the primary key index out of the secondary index and then you do a look up on the primary key index just as you would to figure out what the physical address is and then everything proceeds as in the first example of the physical address so at this time I update the tuple and the head of the version chain you can just update the primary index and automatically updates all of the secondary indexes right so this is one example of logical pointers and this is what my sequel does and Postgres actually stores the physical address yes a secondary index see if your primary key index right which stores like the key for your immediate table so your secondary index is going to I guess be a reference to that key so yeah so in this case so like if you so if you have a table A and the ID is your primary key in table B you have a reference to table A dot ID right A the attribute column in that table and so you might create a second this is what's called a secondary index on that particular item so if you want to think about something more concrete you have a table of users your user has an ID your user has a list of items that it has purchased so for each of those items are the user's ID in it or it's just typically used for track any other questions I think I might be getting ahead of myself here so the last approach which is also another example of using a logical ID is basically you just have some synthetic value that's like a tuple ID so this would typically be an increment encounter to serve as the tuple ID and then you have a hash table that says how to map from that tuple ID to the address so so basically you're going to get the tuple ID out of the secondary index right so you're going to get the tuple ID out of the secondary index and then you're going to and then you're going to figure out where the physical address is and then the hash table here will point you to the location of the physical address so you can read that value and again similar to the approach we looked at where we were just storing the primary key index this is another example of logical pointers which means that each time we have a new version or each time we update the version chain we can actually avoid having to update all of the secondary index right so the only thing we have to update in this case is the hash table and the pointers does that make sense? alright so this table is actually really interesting so this is a table from a paper that was published by Andy and a few other students I think a couple years ago so what they actually did is they looked at a number of systems so they looked at some older systems like Oracle, Postgres and MySQL also looked at some much newer systems within the past 10 years so for example like Hyper and NeoDB, Hyper would be an example of an academic system so they tried to get a variety of systems here and the table lists which of these design decisions each of these database system makes so let's see if Andy has any exciting things so I guess he says the spoiler if you guys want the spoiler is that or the takeaway of the spoiler is that Oracle and MySQL the way they do MPC they actually found like Andy and some students actually found this way to be the fastest for all TP workloads specifically and they actually found Postgres to be the slowest although personally as both the user of MySQL and Postgres I like Postgres quite a bit but I'm also not running commercial database systems with production workload traces so once you get at that scale it probably matters well it definitely matters a lot so okay so this brings us to the conclusion so today again we talked about multi version concurrency control and again as you just saw in the past few slides a lot more to this than just figuring out what time stamps to assign and what versions are visible to the different transactions so of course you need to figure out how to store the versions, how to update them, how to update the indexes correctly and the other items that we covered here so right for next class just as a reminder don't come the class on Wednesday because nobody will be here so you guys have next Wednesday off and then I think the following week Andy will probably be back although that's not certain but we will start logging in recovery so so so so so so so so so so