 Bad dinner wicked and hip hop. Bad bad dinner wicked and hip hop. Okay, still the NM32 stuff is the same stuff as we talk about Monday. Project three is due on November 14th, the Sunday as a homework is actually due today, homework four. And as you wrote, we're actually going to have the additional office hour hold on this Saturday. But the way to treat this is that you should actually really try to finish the project before even Saturday. And if Saturday comes and you really have some last, one or two years we really couldn't solve, you can go to the office hour as the last resort, right? If you save lots of questions on Saturday and hope you can solve that in a day, that's gonna be tough, all right? So for the upcoming database talk, and next Monday we'll actually have this company, I think it's the co-founder of this company, Flurry, going to come and talk about their cloud native ledger graph database. So ledger I think here ledger just means blockchain, right? So it's cloud, blockchain, graph, all kind of hot topics. So you can check it out if you are interested. Okay, so just pick up from what we talked about last class, right? Last class we start to talk about this logging and recovery mechanism to guarantee these three properties of acid. I guarantee atomicity, consistency as a durability. And we say that at high level there are two steps of this login and recovery process. The first obviously is called logging, which will be the additional steps you have to perform while the system is executing transaction normally, right? Metadata, redo and undo information, et cetera. Then the second part would just be the recovery step that the algorithm or mechanism into go through after you come back from a crash, right? So that you can put that base back into a correct state. And last class we are focusing on the first part mainly, right? And the one part that we didn't really finish last class when we talk about the additional information or in other words the logging that we need to write to keep track of the metadata and changes, et cetera, is that while we are writing these logs, right? To make sure that the database having enough information come back after a crash, this log can actually just grows forever, right? If the database system don't do anything about it, then let's say the database is running for a year, then you just have a year of change log, which could even be bigger than original data you have, right? So this could take up lots of space and over time it could be problematic. So one way to address this and use in the database system is called checkpointing, which essentially means that you take a consistent snapshot of the content of the database periodically, right? So that after a snapshot, you can throw away not everything, but throw away most of the stuff before the checkpoint and then after you come back from a crash, right? Assuming you crash, then you don't need to look at everything either, right? Roughly you only need to look at things after the checkpoint, but actually for correctness reason, you still have to look at some stuff before the checkpoint and we'll get to in the details later, right? So the highlight idea is to take consistent snapshot periodically so that you can throw away old logs. So what do you need to do when you're doing a checkpoint? At a high level, you just do these following steps, three steps, right? Essentially, first you would actually need to write all your logs currently, I mean your right-hand log buffer onto the disk, right? And then after that, you're just going to write all the dirty pages in a buffer pool onto the disk as well, right? So that's why actually this order is important because last time we said that while we are generating those logs and modify the content of the database, we always need to generate log records first before we actually go and modify the page in the buffer pool, right? And here is a similar order, right? You write all the log records first and then you write all the contents in the buffer pool with all the dirty pages. And then lastly, you just write a checkpoint entry to the log record and flash that. So then you just have a checkpoint. So we'll give you a very simple example here, right? So here, I mean, assuming that we have these, oh actually, before getting to the detail, we just assume a simple checkpoint strategy, right? Where we just actually, when we are doing this checkpoint, we just pause the entire database and perform this log flashing as buffer pool flashing. And we'll talk about a more optimized strategy later in this class. But in this example, simple example, right? We have three transactions, right? Transaction one, transaction two, transaction three. And then before this checkpoint that we are taking, transaction one we assume it has committed and then both transaction two and transaction three has started, right? But transaction two only commits after you take the checkpoint. And say at the end of these logs, this database crashed due to various reasons. So what will happen here, right? Well, the first is that any transactions that have committed before the checkpoint, we could just actually ignore them, right? We don't actually need to recover them, redo, undo, whatever, because we have all the changes of this committed transaction already flashed onto the disk. I mean, the dirty pages are already written to the database pages before the checkpoint happened, right? And after checkpoint, we know that all the changes have been persistent. So for transaction two and three, right? They didn't really finish the commit before the checkpoint starts. So what do we need to do to them? Well, because transaction T2 has already been committed, then we actually need to reapply all the changes of this transaction back to the database system when we come back from a crash, right? Because we've already tell the outside world T2 has committed. We need to make sure that the content of T2 is persistent. And then for transaction T3 because it hasn't actually not finished, then before the system crashed, some dirty pages, I mean, written by transaction T3 may already flash to disk, right? Because we have already have the log record of T3 show up in the right catalog. So after that, the dirty pages that contain this modification of T3 in the database content could also be written to the disk already, right? So after we come back, we need to make sure that all those changes of T3 being undone. So this at high level will happen. We'll get to more details when we talk about recovery at this class, right? So that's the high level, the idea of checkpointing to reduce the log size. So there are a few issues obviously, right? With that simple approach that I just talked about. The most obvious one is that if you want to ensure a atomic or a consistent right of all these log records as well as all those dirty pages, while you're taking checkpoint, you a simple approach is just to stop. But that obviously will block users from executing transaction database which would be bad for performance. And even after you come back from a crash, even though you have the checkpoint, if you are not doing it carefully, you actually still need to scan quite some records even before the checkpoint as well. Because like we saw in the last example, there are actually changes from T2 for example, right? That has, yeah, because from T2, you have to redo, right? Because the T2 only commits after the checkpoint begin. And for example, if T3 did any modifications before the checkpoint, you have to undo that as well, right? Even though because your checkpoint may have that. And also there's also the challenge of how often do you do the checkpoint, right? Essentially at a high level, if you are doing checkpointing too often, then obviously, I mean, that you may need to pause the database system very often and checkpointing also take resource and slow down the performance of system. But if you do it very infrequently, then it's kind of defeats the purpose of checkpoint, right? Because if you do it very infrequently, then the log can still get pretty big and after you recover from a crash, you may still need to read a lot of data or log records to put the database system back into a correct state, all right? So that's the high level idea of checkpointing. And we'll talk about the efficient way to do this later in this class, okay? So for today's lecture, what we are going to focus on would be the second part of this logging and the recovery process, essentially the recovery algorithm. So the very canonical algorithm, right? Like a very fundamental algorithm that to perform this recovery step will be called ARIS, right? Like in short of algorithms for recovery and isolation exploiting a semantics. So essentially it's formally proposed in this paper from IBM DB2 research and in early 90s and especially from this guy called Mohen. And actually this document is pretty long, right? It covers various aspects of this logging and recovery process in various scenarios. For example, you are using different concurrency control mechanism, et cetera. So this page, this document is like 50, 60 pages. And so in this class, obviously we are only going to focus on the high level, the main intuition behind this algorithm instead of everything. And I would also say that most of the contemporary database recovery algorithm will actually, most of them will be based on that, all using a very similar mechanism to that. And furthermore, this paper is actually not the, does not represent the first implementation of ARIS either, right? So even before this paper comes out, there are some people using a similar way to do a logging and recovery already. But it's just, this paper is generally seen as the first one that formally organized everything, right? Trying to put all the rules and the scenarios in a principled fashion and organized all the mechanism and details for you. So people see, people generally view this as the first formal document that proposed this ARIS mechanism, all right? So the main idea of ARIS is kind of, we already kind of hinted this throughout our lecture from last class and when we talk about checkpoint essentially, there are just the three main parts of ARIS, right? The first would be that logging part and especially it would use the right-hand logging that we talked about last class. And just to remind that right-hand logging is actually a still and a no-force policy, right? In terms of buffer pool management. And then, assuming that the database system crash at some point, then after the system come back from the crash, what ARIS will do would be first to reapply or redo all the changes of the committed transaction, right? So to make sure that for all the committed transactions, their durable changes onto the disk are brought back to that they have a system if they have not been, the dirty pages of the changes have not been flashed before the crash, right? So redo phase to reapply all the changes of committed transaction. And then after that would just be a undo phase assuming that there will be some uncommitted transaction, but if their dirty pages have already been flushed to the disk, the undo phase will just restore all those changes. And then after that, the database system will be back into a correct and consistent state. And one thing to note that is that in redo phase, right, we just reapply all the changes, but the undo phase, we actually need to record what values that we have been undone as additional log records as well, right? Because we want to know that what things we have restored, what things we have not restored, and especially in the case that there might be a crash while you are trying to recover, right? So the kind of recursive crash scenario. In that case, after we come back, we especially want to know what undo we have done and what we have not done. And then help us to resolve this kind of recursive crash scenario correctly, okay? But that's kind of like additional thing to note here. So today's agenda, a couple of few steps. First, we have to establish a few concepts and the metadata that the database system needs to record beyond just the basic elements of right-hand logging when we talk about last class. Especially for every log record, we actually need to talk about the concept called a log sequence number. And then we'll talk about what operations is exactly we need to do when transaction commits and abort so that we can later on use areas to recover. And lastly, so next, we'll talk about a little bit optimized version of checkpointing, right? And lastly, we can talk about these areas recovery algorithm specifically, all right? Cool. So, first of all, this one very, very important concept or additional meta information we need to add to every single log record is this thing called log sequence number. It is a very, very important thing to coordinate everything, pretty much everything in the areas algorithm. And essentially, this would be a globally unique number for every single log record that would always keep increasing, right? Just only increase. And then lastly, beyond, I mean, besides every single log record, different components of the database system will actually keep their own watermarks, if we will, of different versions of this log sequence number so that different components of a system can coordinate with each other based on their version of log sequence number so that they can perform this log and recovery process all areas algorithm correctly, right? So, like I mentioned, I think a few lectures before, both concurrency control and logging or recovery, they actually, even though they are independent components, they actually need to coordinate many other components in that database system as well so that they can perform their functionality and guarantee the asset process or property. And specifically, there are quite a few different places that in the database system that they need to keep different watermarks of this log sequence number, right? And I just demonstrated their name, their location and the definition here. So, let me go through this one by one. So, the first will be called flushed LSN. It just, the database system just maintained this number somewhere in memory where it just indicates what would be the latest log record, what would be the LSN of the latest log record that you have written on the disk, right? That's what we have flushed. The second and third, page LSN and rack LSN, that would all be associated with a page, okay? So, the page LSN will just be, indicate the LSN of the latest log record that update this page, right? That would be the most recent modification to this page and what's the associated log record number. And then the rack LSN, the third one, would be the oldest update to that page since it was last flushed. So essentially, every time you flush a dirty page onto the disk, you're going to reset this rack LSN. And then next time, if any modification did to that page, you are going to put the log record associated with the log LSN associated with that modification to this rack LSN number. And then the rack LSN is not going to be updated until next time you flush this dirty page back to disk again, right? And then the next one will be called last LSN. Essentially, it would just be the LSN of the latest log record associated with a specific transaction, right? That's stored along with the transaction. The last one would be called master record. Again, it's still an LSN, it's actually on disk, but this LSN just represents the log record where you take the checkpoint, right? What would be the LSN of the latest checkpoint log record? All right? Any question about these basic concepts? Cause we are going to use them all over to this class. Cool. So just kind of like summarize the important elements about what we talk about, right? For every page LSN, that's actually the latest log record that contains the update to that page. And that's going to be updated every time you update the log record. And the flush LSN would be the biggest LSN you've written to the disk. Then, oh, why an important property that we want to keep maintaining is that every time we write some page to the disk, we must ensure that the page LSN is actually smaller than or equal to the flush LSN, right? We use that property to ensure that before we write any change to a dirty page back to the database content onto the disk, we want to ensure the log records that contains that change would be on the disk first, right? Cause otherwise, if we write a change write a change of a dirty page back to the database before the log show up onto the disk, then if the database is crashing between then we don't know whether the data actually in the database content whether that's correct or not because we don't have associated log records. So by ensuring this property, we can ensure that we have a log records onto the disk first before we can flush a dirty page that contains the changes associated with that log record onto the disk. All right, makes sense? Cool. So to give you a realisation of this, right? What's going on here? So hey, say here, we have different components. The first would be the tail of the right header log in memory and SOS, the buffer pool also in memory, right? And then on the right, I'm demonstrating that there's this right header log file on disk as the pages and the master record LSN number on disk as well. So first of all, right, for each log record there will be a log sequence number associated with it and then it's always increasing, right? And then second of all, there will be page LSN as well as record LSN associated with every page in the database, all right? And then next, every time we flush a page, let's say onto the disk, we are going to update the flushed SSSN. In this case, we'll just have the flushed SSSN to be equals to 16 because that's the latest. Next, for the master record, that will be just be a record the LSN of the latest checkpoint on the disk, all right? Makes sense? So let me give you some example about when can we flush a page onto the disk, right? Say here, I have a log record, right? That's, so not over here. So here we have a, assuming that we have a dirty page pointing to this log record 12 would be its page LSN. In other words, the latest change to this page would come from log record 12. Then can we flush this record or not? If our bubble want to evict page, we can, right? Because it's smaller than the latest LSN 16, all right? Then on the other hand, assuming that this page LSN of this page is pointing to something still in memory, can we flush this page? Not, right? Because we haven't flushed the log record yet. Essentially, with this property, we can guarantee that, okay? Cool. So now, how do we actually, well, essentially that's actually a little bit similar to what we have discussed before, right? So all the log records, we have a log sequence number and every time you modify some page with a transaction, you update the page LSN and every time you write a page of log records on the disk, you'll renew the flushed LSN as well, right? So that's kind of like summarize the example we have talked about, okay? Any question so far before we talk about the additional operation we need to do while we are executing transactions? Okay, cool, let's continue. So now, would it be the additional operations on metadata we need to maintain while we're executing transactions with this log sequence number? So for the purpose of discussion today, right? We are just actually going to, we're not going to look at secret queries, we're just going to look at read and writes followed by a commit and abort for each single transaction and we are going to assume some simple scenarios. Again, the errors algorithm can work even though these assumptions break, right? For example, it can work with the different kinds of mechanism but again, we are not going to talk about everything. So for the purpose of discussion today, we're going to assume that all log records would fit into a single page and we are going to assume that when a system write to a page onto the disk, that write is atomic, right? And next, we are going to assume that we use a single version concurrency control, a mechanism and we use the strong strict to fit locking to guarantee the concurrency. And lastly, I mean, we use the still and no false buffer management with the right head locking, all right? That's for the purpose of discussion today. Cool. So first of all, what do we do when a transaction commit? That's actually the more straightforward case. So we sort of also hinted about this in the last lecture. Essentially, we need to just first write a commit record to the log, to the end of the log and then we are going to ensure all the logs, log records of this transaction from beginning, including all the changes up to the commit, flushed to the disk, right? And then, I mean, this could contain many log records. And after that, we can already tell the client that all the outside world, this transaction has been successfully committed. And then lastly, actually not for the purpose of the correct commit for the transaction, but for additional bookkeeping purpose, we are actually going to have a additional transaction in the record, according to the end of the log. After we have resolved any remaining task of this transaction, if it will. So essentially, after the transaction commit and we flush everything, we can already tell the client the transaction gets committed. But there could be cases that in order for the internal mechanism or the errors algorithm to work, especially to work in an efficient way, we are going to, we may keep the log records in memory for a while, right? To do some additional tasks. And if we finished everything with processing the log records of this transaction, we are just going to append this transaction in. And at that point, we can already blow away all the log records associated with the transaction. So this is only for internal bookkeeping purpose. It's not necessary for the transaction to commit. And we don't have to flush that immediately either, all right? Just for transaction end. So, give you a simple illustration here. Again, this like a similar example where, but we only have one transaction T4 begin to do some modification and then commit, all right? So when this transaction commit, what do you do is that it would flush all these log records, right? We're then containing this transaction onto the disk. And next, obviously because it has written log records onto the disk, it will renew this flushed LSN to be 15, right? Which would be the latest log records. And what we do next is that after a while, again, it may need to retain these log records in memory for some additional tasks, but assume that after a while, it finished all the things related to the log records, right? Then it would just append this transaction end records at the end of this right-hand log tail. And then now it can just blow away all the log records in memory, all right? Cool. So the next transaction when transaction abort, what we need to do is that, again, similar to what we have talked about, when transaction abort, all we need to do at a high level is just to undo the changes of a particular transaction based on the log records, right? And this actually, little bit similar to what we need to do when we come back from a crash, right? Because when we come back from a crash, if there are records of uncommitted transaction in the log, we also need to undo all those changes, right? So the transaction abort process would have some similarity to the recovery process. And then one additional information, this is actually not required, but for efficiency reason, one additional reason we are going to, sorry, one additional information we are going to keep in the log records to accelerate this undo process will be some information called previous LSN, right? So essentially, previous LSN, as the name suggests, just contains the log sequence number of the previous log record generated by that transaction. So essentially, with this previous LSN, we can sort of chain all the log records generated by a specific transaction together to be a linked list, right? So now, when we need to abort the transaction and undo all the changes, we can just follow this linked list. So this is only for efficiency reasons. For crackiness, you don't have to do that, right? You can just scan all the log records and figure out what would be associated with this transaction, all right? Makes sense. So a illustration of this abort process. So here we can notice that in addition to the log sequence number of each individual log record, we also have this previous LSN, which would indicate all the LSN of the previous log record generated by this specific transaction, right? As now, say we have this transaction, again, still T4, but instead of committed transaction, we see that this transaction has abort, right? So what you do is that we will just follow this linked list and then undo all the changes of this transaction one by one, right? So again, similar to what I have talked about before, when we finish everything processing this transaction, right? Finish all the undo and address all the remaining tasks. Nobody else would use this information in the transaction. We can append this transaction in here, right? And obviously what would be the important is that what would happen when we perform this undo operation? And then like I mentioned before, so beyond just undo all these log records, one additional thing we have to do is that we have to log the undo we have done to have a system where we are performing this undo for a transaction. Again, it will be especially useful if the transaction crash while you are trying to abort or undo certain operation so that after we come back during that time, we know where we are, okay? So to specifically talk about what we need to record, that's going to be something called composition log records, right? So apologize if I'm kind of like overloading these different terms, but I mean, areas is actually a mechanism that is a little bit complicated, so we need to establish all this metadata information before we talk about this specific recovery algorithm, okay? So this composition log records is essentially, like I said, we record what undo operation you have done to have a system, right? Especially while you are trying to restore the content in a page, in a buffer pool to the original value. And then in this undo log records, right? In this composition log records or CRR, what failed or additional information we need to put there will be something called undo next pointer, right? Essentially, this undo next pointer in the composition log record will just be the log six number of the next log record to be undone, right? So this composition log record would also be formed a link list, right? So that it can accelerate our operation when we are trying to come back from a crash, okay? And then the last thing to note that is that unlike this, what's it called? This normal redo, normal log records that we actually need to make sure that we are flush of them, right? Before we can flush a dirty page and then for this CRR, right? We can actually edit them to the log records, right? But we don't really need them to be flush before we tell the outside world that this transaction has aborted, right? So essentially, when a transaction abort, we can immediately, I mean tell the user the transaction has aborted, but we don't need to wait for the undo operation to finish and all the CRR to be written to the disk, right? That we don't need to guarantee unlike when transaction commit, okay? So here, to give you a specific example, here just because for space reason, we are not going to show you this demonstration of these graphs, graphics before, right? So here, we can only show you a table of all the records, all the log records with the associated information. So here, we have this transaction, I mean begin and then delete an update and then follow by a abort, right? And we have the log sequence number and then freeze previous log sequence number that chain them together. And then after this transaction abort, we can already tell the outside world transaction abort, but what we do is that we all need to undo all the changes and as soon as generating those composition log records it contains all those changes, right? And then in this composition log records, right? For example here, these log records would be just be corresponding to the earlier updates for the log record too. And then what it has is that instead of have the before and after value with the modification, it will just have its reverse, right? Essentially, we are going to restore the original value. So the new value changed by the transaction now become the after value, right? And then lastly, we have this undo next log sequence number that will be pointing to the next record that we are going to process in this transaction abort or undo mechanism, all right? And then here, because this transaction only did one update before it abort, so this undo next record will just be pointing to the beginning of the transaction, right? And then after we process that, that we already finished the entire abort process and we don't really need any of these log records in memory anymore, right? Because we finished processing them so we can append the transaction end record right here and the undo next which is beyond the operator, all right? Make sense? Cool. Any questions so far? Okay, okay. So essentially, I mean this is kind of like a summarize of what we talked about before, right? When a transaction is to abort, we first write the abort record to the end of the transaction, right? And then we will replay, yeah, replay all the log records based on this previous log record number that we chain together in the link list and restore all the changes of this transaction. And while we are doing that, we are going to write the composition log record and finally, we just write this transaction end log record at the end when we finish everything, right? And lastly, this composition log record doesn't really, it never needs to be undone, right? Because it's an undo a effect that we want to remove from the habit system. So we don't need to undo a undo record log record, right? It doesn't make any sense, all right? So, so far we have talked about these metadata. We need to maintain, right? Especially log sequence number as well as the additional operation or information we need to keep with while we are committing our aborting transaction. Now we are going to talk about one optimization we are going to do for the checkpointing and lastly, we will get to the recovery mechanism, all right? So, I mean, we have talked about this checkpointing idea so that we don't need to grow our log record forever, right? So before we talk about the fuzzy checkpointing, just to give you some examples of non-fuzzy or non-optimized checkpointing before that. So one specific version of non-fuzzy checkpointing or unoptimized checkpointing would be that when the database needs to take a checkpoint, it can just pause everything, right? Similarly we talked about before. But in addition to pause everything, what it can also do is that it can also wait for all the active transaction, current active transaction to finish, right? It can do that and only after that it can flush all the dirty pages on the disk. So this is even like more simple than the early example we talked about but that obviously the database system needs to wait much more time but the benefit of this is that this will make the recovery much easier, right? Because this algorithm guarantee that while we are taking the checkpoint, everything before this checkpoint, all the transactions before the checkpoint have committed and there's nothing else, right? So after we come back from a crash, right? Potential crash, we don't need to look at anything before the checkpoint, right? Only need to look at things after and everything before the checkpoint can be immediately thrown away, right? So this is like a very simple checkpointing but then the problem, as you can already see is that it needs to wait, it needs to store the database system and it also needs to wait for the running transaction to finish in this case. And I mean in today's world, especially for some analytical database workload that would, for example, sometimes could take, read all the tuples in a table with 10 billion rows, I mean the transaction may run for hours, right? It's not that uncommon. So with that, the checkpointing would store the database system for a couple hours, right? Which would be obviously bad. So a slightly better version of this naive checkpointing with that would be that you store, pause the transactions but instead of waiting these transactions to finish, you just stop them in between, right? So there could be some modifications of uncommitted transactions in the database system right now, but you don't really try to wait for them to finish because they can take forever. And then for writing, sorry, for read only transactions, if you know the transaction is read only, you can just let them to continue as normal, right? So what this will do is that, I mean, as I just read here, you can have, for example, two additional, two separate threads, again for simplicity, right? We just assume one thread doing the checkpointing, the other thread just executing the transactions. So assume that you have a writing transaction, right, that already modified a page, page three in this database, and then now you want to take the checkpoint, instead of wait for this transaction to finish, which you don't know how long time it will take, it can first, not first, it can immediately pause the execution of this transaction, right? Don't allow it to do anything else or do any more modification, but then you can just directly use this checkpoint thread to scan through everything and then write everything onto the disk, right? And then later on, after the checkpoint finish, this transaction can continue and do other modifications, right? But then the obvious problem with this checkpointing approach is that, now on disk, you have flushed three pages onto the disk, two of them are just kind of like a consistent state with only modifications from a committed transaction, right? But then there's this additional page, page three, you flushed it onto the disk, I mean, before the checkpoint, while doing the checkpoint, but then it has values of uncommitted transaction, right? So if, for example, this transaction aboard, then this value is incorrect, right? It shouldn't be there. So we have to have additional mechanism to deal with that situation. And the additional mechanism, or especially the additional information that we need to keep track with to deal with this situation when we come back from a crash would be essentially two parts. One part would be something called active transaction table, which means that while we are taking checkpoint, we just keep track of what would be the transactions that are still running active, right? And the other would be called dirty page table, right? We also want to know what would be the pages that would be the dirty pages that contain the changes from uncommitted transaction, right? This I get two additional information we want to keep up with to help out with this scenario, okay? So first, this active transaction table, what it has is that first it will have an ID for every single transaction to be recorded or for every active transaction, right? Transactions that are still running, not committed and not aboard. And then it will have a status code of the transaction as a last LSN, right? Which would indicate what would be the most recent log sequence number with log records that contains the changes of this transaction, right? We talk about this last LSN before already. And then for when a transaction commits or boards essentially finish, we'll just remove a transaction from this active transaction table. And then there are three possible states of every single transaction. The first is running with executing certain stuff. The second is committing a transaction is trying to finish before you remove it. And the third one is called a candidate for undo which is actually the default state of a transaction if a transaction is not running or not committing because what the database system is going to assume is that it's going to assume every transaction can be aboard and need to be undone, right? Because the database system can crash at any time, right? And before the system crash, any active transaction can abort and can be undone, right? So this last state, this candidate for undo will actually be the default state for any transaction in this active transaction table, right? And next, for the dirty page table we sort of already discussed it will just keep track of what will be the pages that contains values from, contains modifications, right? Because dirty page contains modifications from uncommitted transaction, right? And then we need to keep track of that. And it also needs to record the rack LSN of every dirty page, right? Which would be the oldest modification applied to that specific page after, since the last time it has been flushed onto the disk, all right? Cool. So to just give you a concrete example of this, right? Assuming that we have these three transactions, T1 and T2 and then T3 and we did two checkpoints in this case, right? So before this first checkpoint, right? We assume that transaction T1 has already finished and then transaction T2 is still running, right? Then in this case, in the active transaction table or in abbreviation ATT, it will only be a T2, right? And then in this dirty page table or in abbreviation DPT will just be the page 22, which will be the dirty page, right? And then that will be this first checkpoint. And then as another example for the second checkpoint, when that happened, the transaction T2 has already committed, right? So only transaction T3 is running in ATT and then the dirty page of T3 would be page 33, all right? So that's the basic concepts of ATT and DPT and then this little bit better checkpointing mechanism, right? But of course, right, the problem with this slightly better checkpointing is that you will still need to stall all the current running writing transactions, right? When you are doing this checkpointing and depending on the duration of the checkpoint, how much records you need to write to this, this stall could be locked, right? So what, were there any even better mechanism that would allow us to still executing these transactions keep modifying the dirty pages or not dirty pages, keep modifying pages in my buffer pool and while I'm doing this checkpointing, right? So the answer is the fuzzy checkpoints, right? Essentially, we can do that and the mechanism that help us to do that in addition to the ATT and DPT we already talked about would just be that we're actually going to write two checkpoint records in this case, right? So in addition to the original checkpoint record with ATT and DPT, we're going to write an additional checkpoint beginning records. So indicates when I begin my checkpoint. So at that point, I already have access to the state of the database before everything starts, right? And then I start to take my checkpoint based on what's already exist before this, I write this begin checkpoint record. And then after the checkpoint end, we're going to say notify or note that the checkpoint has already finished but then we just append this additional ATT and DPT information. And with a combination of these two information, we can allow transactions to still be modifying records while we are doing the checkpoint, all right? So to give you a more specific implementation or instruction, every time when the transaction note, when the database system is trying to take a checkpoint, obviously it's going to write the begin checkpoint record first, right? But then after the checkpoint finishes, instead of write the number of the end checkpoint record, it will actually only record the LSM of the begin checkpoint to reflect it to be reflected to be in the master record, right? Because that's when the checkpoint starts and we are only guaranteed that we are writing a consistent state of the database system before the checkpoint starts, right? While we are doing the checkpoint, some other transaction can still be modifying the database system. So the records written later could be dirty, right? So in the master record, we are only going to record the LSM of the checkpoint begin log record, all right? And then for any transactions that starts after the checkpoint begin record, we're actually going to ignore them from the checkpoint end record, right? We will have additional mechanism to recover them after we finish, we have after we come back from a crash. But then in this ATT and DTT dbt table, we don't need to include the active transactions, only start after this checkpoint begin, all right? Because the master record only has the log LSM before that. So to give you this specific example, when this checkpoint begin, then the system starts to look at all the log records before that, as it looks at all the pages in the buffer pool before this checkpoint begin and start to write all of them onto the disk, right? And in the meantime, some other transactions can start, and this existing transaction can still make modification, right? And then after the database system has flushed out everything before checkpoint begin, then it will just record a checkpoint end record that contains first the active transaction, right? Which would be T2 before the checkpoint begin, as well as the dirty pages that are modified by this uncommitted transaction, which would be T2. Yes, please. So is this that it will be modified into a four-seq point time? Yes. Maybe it will still be the same, it's state 11, state 22. Yes, yes, yes, yes. And for those things, we still need to handle that, but we'll handle that through additional recovery mechanism. Yes. OK. So that's for a checkpoint. And if any additional question before we are actually going to jump into the recovery mechanism right next, any question? OK. So that's essentially for the checkpoint. And with all this information established, with all this log sequence number, a different redo on the records, as well as the information in the checkpoint, finally we can talk about how we recover the database system back to a correct state, assuming that there is a crash. So again, we have mentioned this before, but just put you into the context to remind you a little bit. There are essentially three phases. The first phase, which we haven't talked a lot about, is that there will be analysis phase, where we're actually going to read every single right catalog record from the starting of this must record number, which would be the checkpoint and begin record number. Since the last time I do a checkpoint, we're going to read every single record from there and then re-establish the correct activity transaction table, SOS, the dirty page table. Exactly like the other students ask, there are certain transactions that may have not included yet. We have to restore them back correctly. And the second phase would be the redo phase. We are going to, after the first analysis phase, we are going to determine a point in time in my history of log records, that we are going to reapply every single change in log records, even including transactions that has committed and transactions have aborted. So the reason we want to do that is that there are mechanisms that we could abort. We could escape the changes of aborted transaction. But then in many cases, in order to ensure that the system apply all the changes correctly and then restore the system back into the correct state, oftentimes it's just easier and straightforward that you just apply every single change in order so that you put the habit system back into the state right before it crash with all the changes from committed and uncommitted transaction and then you clean up things afterwards. Because if you want to skip the change of aborted transaction, it's possible. But then while we're doing that, you kind of lose the normal order of the changes of different transactions. And then you have to do quite some other operation to make sure that things are correct. So for this redo phase, again, in a more straightforward version of Aries, we're just going to reapply all the changes including committed as aborted transaction. And then last, in this undo phase, we are going to come back and look at what transactions have not committed before the crash and then restore the changes of this transaction. And then we can put the habit system back into the correct state. That's the highlight there. So just to visualize this thing a little bit, it's the same thing as I talk about, but just to visualize this for you, every time we recover from a crash, as illustrated on the right hand side, assuming that we have lots of records and then it crash, we're going to first look at what will be the last begin checkpoint LSN, I mean reflected in the master record number onto the disk. And then we are going to first look at that and then assume that this is the begin checkpoint number. Well, we are going to look at all the records after that and then determine what will be the active transactions during this whole process. What will be dirty pages as well as what will be the log records that I need to go back to to reapply all the additional changes. And then after that, we're just going to go into this redo phase and then we are going to repeat all these actions from committed and uncommitted transactions that may have not been reflected into the content of the database yet. So more specifically, we'll get to the details later, but most specifically, we are going to look at the smallest REC LSN. Remember the REC LSN is the oldest LSN since the last time I flushed the disk. So after I flushed a page to the disk, then before I flushed it for the next time, this REC LSN would indicate the oldest change. So for all those oldest changes, if I take the oldest of all these oldest changes, then in the mid-dirty page table, then that will essentially be the smallest starting point that I need to redo all the changes. Because all the changes before that, from this LSN, especially REC LSN, we know that they have all been flushed to the disk. And then that would just be the redo phase. We have all the changes. Lastly, for undo change, we are actually going to find, I mean, it's kind of straightforward, what would be the oldest log record of the active transaction before crash? And we can get that through our analysis phase. And then we are just going to undo all the changes for these transactions that are still active when we crash, right, make sense? Nice. So for this analysis phase, quite a few information that we need to keep. Again, is you will start from the master record, or in other words, the begin checkpoint of the last successful checkpoint. And then every time you find a transaction end record, you'll just remove the transaction ID or whatever identifier you have from the active transaction table, right? Because you know that no matter whether it committed or aborted, you finish the processing of all of them. And then in all the other cases, right, then you would actually add a transaction to this active transaction table with the default status undo if you see this transaction for the first time or if the transaction is just still applying update. And then if you see a log record indicates a transaction commit, then you would just, I mean, again set the status of this transaction in your active transaction table to be commit. And lastly, every time you see a update record, you will actually just renew the REC LSM of that specific page reflected by the update record to be the LSM of that specific record, right? So you'll know what would be the changes of different, what would be the LSM to the changes of the dirty pages in the dirty page table, all right? So that's the information you need to record. And again, with this analysis phase, we are going to recover the transactions that we didn't record after the checkpoint begin, right? We'll include these active transactions in this analysis phase as well. And then at the end of the analysis phase, again, we'll have a two, this complete version of these two different tables, active transaction table indicates will be the active transactions, I mean, before the crash, and then dirty page will indicate what pages may or may not be rigid to the disk yet. I mean, it very well may already be rigid to the disk, but we just don't know yet, right? So we use that page table to keep track of the changes of all the uncommitted transaction, sorry, uncommitted transaction, all right? So here, I'll give you this specific example. Say that we have a transaction begin at time 10, sorry, checkpoint begin at log sequence number 10, and then after that, a transaction 96 come along, right? And then apply this modification, right? So that's the time point. And then assuming that we are scanning, we are scanning this right head log file after the crash, assuming that we already come back from the crash, and we are scanning this record, then transaction begin, I mean, there's nothing there, then for the log record 20, right, with sequence number 20, we are just going to record the ID of this transaction in the active transaction table and the status of this transaction by default, which is be undo, right? And then we are going to look at what will be the change of this transaction, right? In this case, a modification in page A, oh sorry, modification to A in page 33, I'm going to record that page in the dirty page table as well as the rack LSN asset page as well, right? To be this LSN of this particular log record, okay? And then after that, we keep scanning, right? Keep our analysis phase, say we encountered this checkpoint interface, right? Now we have some additional information that while we are taking this checkpoint, we already record, right? For example, before we issue the transaction, sorry, keep the transaction, before we issue the checkpoint begin command, there may be another transaction, transaction 93, but it may has already started, right? It may do some modification on different pages, et cetera, right? So in this checkpoint end record, we are just going to get those information back, right? Because if transaction 97 starts from checkpoint begin and does some modification before checkpoint begin, then by only analyzing information after, we don't know that, right? So here we just added those additional information back. So now transaction, we keep scanning this log record at with log record of 40, we have this transaction 96 commit, right? So in this case, we're just going to write, flip the status of the transaction 96 to be commit, okay? And after a while, we see that transaction 96 actually has an end command, right? Which means that we already finished all the processing, tell the outside world, and no other things would need log records of 96, we're just going to remove that from the active transaction table, and we don't need to deal with that, right? All right, so that would just be the analysis phase. Any question? Okay, so now just redo and undo phase. So they are actually kind of a straightforward, right? So in the redo phase, we're just going to reapply all the changes of everything in the individual log record since the beginning of the smallest LSN, we identify all the dirty pages in my dirty page table, right? And then again, there are techniques to skip the unnecessary read and rise, especially from a boarded transaction, but then in this lecture, we are not going to talk about that yet, all right? Then yeah, for the redo phase, we'll give you another one. Oh, actually, yeah, yeah. In the redo phase, we actually not only we have to reapply all the changes, in the dirty page table, we also need to keep track of what we have to do, and then we also need to know what would be the log sequence number of each page, and especially we need to look at the, yeah, we need to look at what would be the pages in the dirty page table to decide whether we are going to apply this change or not, right? Because even though we say that we will still apply the changes of the boarded transaction, but then there could still be cases that we didn't really want to apply the change, because for example, in some cases, right, in the dirty page table, this specific page reflected by this log record may not even be there, right? So you should indicate that, I mean, this page is already being flushed to the disk, and even though there's a log record that I'm scanning, if it is not a dirty page, then I don't need to write that back again anyway, right? That's one scenario, and this scenario is that the page could be in the dirty page table, but then the page's log sequence number might be the log record that I'm looking at right now, right? It's log sequence number may be smaller than this page's record LSA, right? In this case, in this case, we actually also know that there will be a future modification of this specific page, right? That would actually be the latest, sorry, would be the oldest modification to that specific page, and that I only need to apply that page, so that change, right? For everything that is smaller, for every change that has a log sequence number smaller than the record LSN, I don't really need to apply that change, right? Because all those changes have been flushed to the disk already, right? We don't need to apply them back again, and we can tell that through the page's record LSN, okay? So there's just a few scenarios where you can skip the modification, right? But just keep in mind we are, there's nothing says whether this transaction is aborted or not, right? We still need to apply the change of aborted transaction, okay? So lastly, right? To the specific mechanism to finish this redo phase, just to talk about the specific implementation detail, right? In this case, we will first reapply this change of that log record, and next we will set this page LSN to be the LSN of this specific log record, right? Because page LSN always reflect to the latest change applied to that specific page, right? And then there's no additional things we need to do, right? We don't need to flush additional change records, log records, et cetera, because we already have this redo log records, right? So even though the transaction, sorry, even though they have a system crash right now, we still have this log record in place already, right? We don't need to do anything else, append any new records. And then after everything finished, then for all the committed transaction, for after everything in the redo phase finished for all the committed transaction, we will just append these transaction and log records, and then remove them from active transaction table. All right, makes sense? Cool. So next come to this undo phase. I mean the undo phase again is also kind of straightforward. Essentially you are just going to look at what would be all the active transactions that you have not committed after you finish the analysis phase, right? Essentially that would be all the uncommitted transaction at a time when the system crash. And then you just look at the smallest log record number among all those active transactions, right? And then go back in time and fetch the log records and then restore the changes of this transaction one by one, right? And then again, like I mentioned before, for these log records, we have this last LSN associated with each log record, which would indicate what would be the previous change applied to each individual transaction, so we can use that, accelerate this replay process of every individual transaction that we need to undo. And also we need to apply this composition log record after every modification, all right? Make sense? Any questions about a redo and undo? Oh, no question, cool. So now I'll give you this example, especially to show you how this composition log record would look like in practice, right? Say we have this log record here and then the log content on the right and then this checkpoint, I assume that we have a very fast checkpoint, right, beginning and end, and then there are two transactions, T1 and T2, did some modifications and then a board right away, right? So, assuming that we already finished all these, what's it called, analysis and redo phase, right? Assuming that we already finished all the analysis and redo phase, now when we start to do the undo, well first we look at this transaction T1, it has not committed when the, it didn't commit, right? When this checkpoint finishes. So we'll first undo these changes of transaction T1, append this CRR number and then the log sequence number of that transaction would just be 10, right? That would be the next thing to be undone. And then after that, I mean, assuming that transaction T1 only has this one single modification, we just say, hey, we already finished processing the changes of all these transactions and then this transaction T1 has been successfully aborted, right? Yeah, this previous LSN will just indicate where in this chain of log records need to go back if you need to reverse the modification of this transaction, okay? Okay, now assuming that we have some additional changes in this, I mean, schedule of transactions and with T3 modify something else and then T2, I mean, come along and then modify something else as well, right? Now we crash. So what do we do here, right? So give you a, so again, so assume that we crash this time and then assume that I have finished all these called redo analysis and analysis and a redo phase, right? Here, I'm directly showing you the results we come back after this crash, right? I mean, for simplicity reason, we don't do the analysis here. Then what we need to do here is that I'm showing you this active transaction table, right? Has two transactions, T2 and T3 still running. This dirty page table has all the dirty pages that I have analyzed so far, all right? And then here, this, yeah, this transaction in this active transaction table because we keep updating this last LSN of this table, of these transactions, we can directly jump into the corresponding locations, right? Look at what will be the latest log records of those active transactions and we start the recovery from there, all right? And then here, say that we first recover transaction T2 and then we'll just, I mean, undo these log, undo the changes corresponding to log record 60, right? And then this will be a composition log record. We write for that and then the next undo would just point into the earlier log record for that transaction, right? So on and so forth, we want to undo a transaction T3 and we apply the composition log records and now we already have this T3 transaction end written here, and then we are going to flush everything and then we say that we have successfully successfully undo the changes of transaction T3, right? But then assuming that before I can finish this whole process to continue, assuming that the system crash right now, right? And then this will be actually be a repeated crash in this case because we haven't finished all the process. So what will happen is that everything here will be gone, right? I mean, then we have to actually, we didn't take a new checkpoint either, right? So we have to restart from the beginning of the checkpoint and then redo the analysis and then repopulate everything. But then what happened here is that because we have this composition log record and especially at the end of, I mean, this undo process of transaction T3, we have this transaction end record for T3. We know that we will finish all the processing with T3, right? So we can just directly remove transaction T3 from my active transaction table when the second time I come back from the crash and then I only need to deal with the transaction T1 in this case, sorry, transaction T2 in this case and then we should be much faster and then after that T2 finish, we open the transaction end record for T2 as well. All right, make sense? Yes, oh, make sense, okay, okay. I thought you have a question, all right. Cool, so just some additional questions related to this transaction, sorry, login and recovery process. The first is that what does the database do if the system it crashes during the recovery process, especially during the analysis phase, right? So before I just put that on a slide, I don't know whether you have looked at it now so any idea of what the system need to do during analysis phase, nothing, right? Because we didn't make any changes which is wrong to recovery again, right? We don't need to do anything. Again, similarly, what does the data system need to do during the cash recovery process in this redo phase, right? Again, nothing, right? Because for all the, everything that we are redo, we are going to redo, we already have the log records written, right? Even though we didn't finish, when we come back again from the next time, I mean, this redo record is still there, right? We can always apply them again. We don't need to do anything either, all right? So the next question, how can the database system improve its performance during the recovery, during its recovery in the redo phase? Well, then, assuming that the database system is actually not going to crash again, what we can do is that we can flush all the changes in the background thread asynchronously, right? Have different threads potentially to help us to flush those changes and then to put the database system into back into a correct state as efficient as possible, right? And lastly, how can the database system improve its performance during its recovery, during its recovery in the undo phase, right? So there are potentially different possibilities. The first would be that you can actually lazily redo back the changes before the new transaction access the pages, right? So we can actually come sort of like a stage all the changes to the page before you apply them because you can sort of stage all the changes to a page in a specific place and then you don't really need to apply them right away, right? Because no transaction has access to the page yet. So it's only that when the next time, when the system have resumed its normal operation, next transaction comes along and need that page, then at that time you can consolidate the changes you apply to that specific page during crash recovery with a new change, right? And then you apply them once together so that you can only need to write them once, right? There's one option. The other is that you can, I mean, rewrite your application logic a little bit so that you avoid long running transactions. In that case, after you come back from a recovery, then hopefully all your active transactions, all your active transactions are not really too far away since the beginning of a checkpoint, right? In that case, you only need to look back at minimum as possible beyond the last checkpoint. If the transaction is too long, then even though you can do the checkpoint and the system can crash and recover correctly after checkpoint, you may need to look back very far away beyond the last checkpoint, right? They kind of defeat the purpose of checkpoint. You still need to look at lots of data, all right? That's pretty much. Again, to conclude a little bit, the main idea of this login and recovery process, we always need to use this right-hand login with these areas, right? Especially right-hand login is a still and no false policy, very, very important concept, right? It would be show up in the homework, right? And I mean, just if it's a fundamental decision point about the buffer management related to crash and recovery. And next day, you can use the 40 checkpoint to help accelerate the checkpoint processing, especially to allow the system to perform these executions with normal transactions, especially writing transactions while you are doing this checkpoint, right? And then in the redo phase, you are going to redo everything since the earlier is dirty page based on the REC LSN of the pages, you know, the page table. And then for the undo phase, you are going to look at the transactions, that all the active transactions that are still running at a point when you crash, right? And lastly, when you undo things, right? Either when you are trying to undo things from the abortive transaction or trying to undo things while you are doing, during the undo phase of the recovery process, you need to write these composition log records, right? And that records will help you to accelerate your recovery process if you crash during your recovery, have repeated recovery, then you don't have to do things over and over again. And then log sequence number is the very, very important concept, right? I mean, it's also like, is the fundamental things that coordinate these, all these components to work together. And here, I list definition of previous LSN, page LSN, but definitions like a record LSN, master record LSN, all of them are very important for all those things to be able to coordinate with each other, all right? So that's all for this class today. And so far, we actually already talked about all the things we will talk about in this lecture about the single node database system. So, I mean, right now you can go back and then build your own single node database system already, to process data and guarantee acid property. And next class, from starting on next Wednesday, we're gonna talk about the distributed database system. We have a few lectures on that. And then that's pretty much a semester, right? It's fast. We'll have a guest lecture, we'll have a final review, and then the semester is done. All right, cool. Thanks, everyone. See you next Monday. Yeah! I see Jay talking about the same as groove. Run through a can of two. Share with my crew is magnificent. Bust is mellow. And for the rest of the commercial, I'll pass the mic on to my fellow. For a mic check, bust it. The views are set to grab a 40. To put in the yoga snap, snack, snack, and on. Take a sip, then wipe your lips. You my 40s getting more. I'm out, he got spit-dip. Drink it, drink it, drink it, then I burp. After I slurp, I skew. I put in much work with the BMT. And to eat trouble, get us a same-nosed brew on the dump.