 I have a lot to talk about today. This is sort of continuation on where we left off on Monday's class to talk about recovery. So DJ Drop Table is still in Vegas. So he can't make it today, but he'll be back on Monday. And as you guys said, he's dealing with girlfriend's plural problems. So on Monday, we started talking about recovery algorithms. We talked about how to have the database system be able to restore the database to the consistent state as it existed before the system crashed. We want to do this to make sure that we provide the atomicity, durability, and consistency guarantees for all our transactions. And I said that every recovery algorithm we're going to have in our database system that we possibly design is going to have two parts. The first is the things we're going to do while we're processing transactions to record extra information or to flush out data in a certain way to disk, and such that we can be able to recover after a crash. So now we're going to talk about today's the second half of, if there is a crash, how do we come back and pick our life back together, pick the pieces of our life back together and figure out what's going on? So the protocol we're going to look at today is based on this technique developed at IBM called Aries. So I don't think the textbook is going to explicitly refer to in the chapters when they talk about data's recovery in the techniques we'll be talking about today as Aries registered in the back of your mind to understand that Aries is pretty much everyone does. So Aries was developed at IBM Research in the late 1980s, early 1990s for DB2. And there's this seminal paper that came out that's super long. If you can't fall asleep, it's 70 pages. So go ahead and give it a crack and see whether you can get through the whole thing. But this thing goes into excruciating details of exactly all the steps and failure scenarios you need to handle to guarantee that your recovery algorithm is going to be correct and that you're not going to lose any data. So it's not to say that nobody was doing recovery before this paper came out in 1992. It's just to say that this paper was laid down or codified the exact policies that needed to be mindful of in order to do recovery correctly. And then not every single database that's out today that's doing checkpoints and write ahead logging may be not be falling to the exact letter what Aries does. But at a high level, they're essentially doing the same thing, write ahead logs with fuzzy checkpoints. And then a three phase recovery protocol. So the three main ideas of Aries for database recovery are the following. So the first is that we're going to use write ahead log to record any changes that transactions make to data in our database in these log records and that we have to flush out the logs that correspond to the changes to a page before the page can be written out. And as we said last class, this is using a steel no force policy. So what does steel mean? Steel says the buffer pool is allowed to flush out pages to disk that have been modified by transactions that have not committed yet. And no force says that we're not required to flush all the dirty pages that a transaction modified in order to say that transaction has committed correctly. Instead, what we do is we flush all the log records that the transaction generated out the disk, make sure they're durable and safe before we tell the outside world that our transaction has committed. So this is what we're going to do at runtime. And we'll have a slight variation on how we're going to do this. That's going to be different when we talk about on Monday. We'll have to see if they store some extra stuff to make sure this works correctly. Then after a restart to recover the database, we're going to first replay the history in the log to redo all the changes that transactions made. And this includes any changes from aborted transactions. We're going to reapply them as well. And then we'll have to go back and undo a bunch of stuff to figure out how to reverse the changes that should be persistent. And then the other change we're going to make also too is that when we start undoing changes, both at runtime and during recovery, we're actually going to create log records for those undo operations as well. So that's something we also haven't talked about so far. Because in the last class, I just said, here's all the updates I'm doing. Here's the corresponding log records. And then when I aborted, I didn't do anything special. So now when we do aborted transaction, as either at runtime or part of its recovery, we're going to add extra log records to say, here's the change that we are reversing. And we're going to need this to ensure that we can recover. If we crash during recovery, we can recover from the recovery. And imagine this thing being infinite. So this is the extra step we need to make sure that we can handle all scenarios. So today's agenda, we're going to first talk about how the execution workload is going to be different. Execution process of transactions during the regular normal processing is going to be slightly different. So we're going to introduce this idea of log sequence numbers and the extra steps we do for commits and aborts and fuzzy checkpoints. And then we'll finish up talking about how we actually do the three phase recovery algorithm as defined by Aries. Again, this is super hard. Again, as I said, this is probably the third hardest part about database systems. So again, stop me as we go along if you have questions. Feel free to interrupt. So last class, we talked about right-hand log records. And we said that, for our purposes here, we're just assuming we're doing physical logging or physiological logging and not the logical logging where you just record the SQL statement. And so we're going to have the transaction ID that's making the change, the objects that they're changing, and then the redo and undo information. But now we're going to include additional metadata that keeps track of the order in which these log records are being generated. So we're going to introduce this new concept called a log sequence number, the LSN, that is just a monotonically-increasing counter that we assign to every single log record as it gets added. And so the log sequence number doesn't need to be continuous for a transaction, because again, depending on what concurrency control scheme we're using, we can interleave their operations in any different way. So I might give out log sequence number 1 to transaction 1, log sequence number 2 to transaction 2, and it goes back and forth. They don't need to be all continuous. So now with these log sequence numbers, we're going to modify all different parts of the system that need to be aware of what log records have modified data and whether or not they've been written out the disk. So we want to extend our buffer pool even further. Now it would be cognizant of what are these log sequence numbers that correspond to the changes that were made to its pages that it has in memory. So every log record has a log sequence number. But then, all throughout the system will have these other log sequence numbers. So this is just sort of a summary table for the log sequence numbers we're going to counter throughout the rest of the class. But we'll go through these one by one, and you'll see then how we end up using them. So the flush LSN is just an in-memory counter that keeps track of the last LSN of a log record being flushed to disk. And this is just telling us in our right ahead log buffer summary, how far in the log records, how far back we know those log records are actually durable or not. And then for every page, you're going to have the page LSN and the rec LSN. So the page LSN will just be the last log record that modified that page, like the newest log sequence of that record. And then the rec LSN will be the oldest log record that modified this page since it was last flushed. So when I bring the page into memory, if I go ahead, the first transaction that modifies it, I add that log sequence number as the rec LSN. And then no matter how many times I keep modifying it while it's in memory, the rec LSN is always the same, whereas the page LSN will increase. And then I'm going to write this data also out to disk as well. This will go to just go in the header. And this is just sort of extra information to make sure that we're doing the right things during recovery. Every transaction would keep track of the last LSN, as that's the last log record that it added. And then globally, we're going to have a master record that we're going to use to keep track of the LSN the last checkpoint that we successfully took. And I can remember I said that when we take these checkpoints, they're actually entries into the log that say, hey, I took a checkpoint at this time. So again, this is sort of reiterating everything I said before. So every page has a page LSN. And it's just the most recent update made to that page. We keep track of the flush LSN to know how many, what sequence in the log, or what point in the log we've written those things up to disk. And as I said last class, yeah, sorry. Question? Page LSN and rec LSN, you write to the disk. Did you just say that? Yeah, so I said, so the page LSN and the rec LSN, do they get written to disk? They don't have to, but in general, you do. It's just an extra safety mechanism in case you end up missing something. If it's information, right, that page transaction last modified the page in the first code. It's the information about the log record. Log record, yeah. But if you write that thing to the disk itself, then after that, those two information become redundant, right? Because once you have written the page itself. Correct. So his statement is, if you write it out the disk, if you write the page out the disk with the page LSN and the rec LSN, isn't that essentially redundant? Because now the page is durable on disk, it's no longer dirty if it was in memory, isn't that fine? And I'm agreeing with you, I'm just saying it's an extra safety mechanism. We are going to be super conservative here because we don't want to lose any data. And disks are super, you know, they break all the time. We just want to make sure that there's enough information around that if we have to recover and maybe maybe our log got corrupted a little bit or the page got corrupted, we have enough information and enough locations that we can figure things out. Yeah. Now you'll see as we do the actual recovery process, the obvious things we could do to speed things up. And we'll talk a little bit about them at the end. We are not going to do them here because we want to be super cautious, right? People get pissed when you lose data. So you don't want to lose data, right? OK. So this is just saying what I said last class. Remember I said that with the right-hand log, before we can flush out our dirty page, flush the log record that made that page dirty, right? And this is how we're actually going to figure that out. We just use these LSNs as the watermark to figure out, oh, the last log record that modified this page is this, you know, one, two, three. But I flushed out log sequence number 456, and 456 comes after one, two, three. So therefore I know that whatever log sequence, whatever log record that made my page dirty, that also got flushed out the disk as well. So it's safe for me to flush out this page. So that's how we're going to guarantee this. So let's look at a sort of a high-level example of all this. So again, we have in memory, we have our log buffer, which is just always the tail of the log. And it's always going to be the most recent changes that may or may not have been written out the disk over there. And then now, actually, the other thing we're introducing also is a log sequence number. Again, these are just a counter we're increasing by one. Then over here in our page, we have the page LSN. Again, that's the most recent log sequence number of the log record that modified this page. It may not be the one that made it dirty in the first place. That's the rec LSN is. Since I brought this page into memory, what was the first log record that made this thing dirty? Because I need to know where my boundary is. Then now we have the flush LSN. Again, this is just the pointer to whatever was the last LSN that we wrote out the disk and the log. The master record is then just pointing to the last successful checkpoint that we took. Yes. Why do we need page LSN then? The question is why do you need page LSN here? That'll come up later on why we need this. It has to do with when you're trying to figure out the, as you're replaying the log, is the thing that I already write this thing out that I already write out the dirty page and therefore I don't need to worry about anything that comes before that sequence number. Think of this as like a boundary. Since I brought this thing into memory, here's the range of LSNs that could have modified my page. But you just said that this rec LSN is the one which makes the page dirty. Yes. The page LSN is not red. Page LSN is just the last VoxEucans that made the page dirty. So it's here. So when I want to go decide whether I can flush my page, I need to know what was the last change that made to that page. If that last change comes before the flush LSN, and the flush LSN is this, if I know my change is up here and it comes before the flush LSN, if this thing's been written to disk, then I know anything before it's been written to disk and therefore it's safe for me to vick that page. Okay, I get it. Like we need to check from page LSN to rec LSN. Yeah, so during recovery you check in that range. We'll get that in a second. When it comes time to run time to flush this thing, I check page LSN. Okay, again the master record is just the location to the last checkpoint. And we use this when we recover to figure out where do we want to start our analysis to look at the log to figure out what was running. Right, because without the master record, without these checkpoints, we have to start from the very beginning because we don't know anything about what pages have been written out. So to be very clear here, like as the buffer pool is flushing out pages, we're not logging that. So we don't know potentially what pages have been written out the disk if we recover after a crash. We just know the log sequence numbers that have been written out. All right, so this is actually just an example of what he was asking about. So let's say that my page LSN points to here at log sequence number 12, my flush LSN points to 16. Am I allowed to flush this page out? Right, yes, because the page LSN is less than the flush LSN. But say the page LSN points to 19. Can I write that out? No, because I know that the log record that corresponds to the last change made to this page has not on disk and therefore it's not safe for me to evict this. And again, you would have this logic inside your eviction policy or replacement policy you would implement in your bufferful manager to be mindful about what are these flush LSNs, what are these page LSNs, and keep track of all these things when you make a decision about what to evict. All right, so just to recap everything I said, all log records have an LSN. Every single time we make a change, we were to first add the entry to the log record or add a log record to the buffer in memory, then we get back an LSN. Then we can now modify the page and that's where we update the page LSN because we already hold the right latch on the page and therefore we can update it atomically. And then every single time we then evict a page, we just always update this flush LSN to know that this is how far in the log we've written things out. And then you can now start truncating the log above that because you're unlikely to need it in memory. So let's talk about now when we execute transactions. So again, every transaction is gonna be just a sequence of reads and writes at the lowest level for recovery. That's all we really care about. We don't know anything about SQL statements when we're doing physical or physical logical logging. We just see reads and writes. And so when the transaction finishes, it finishes when it's either commits or aborts. So the assumptions we're gonna make to simplify our discussion today are the following. So the first is that we're gonna assume that every log record can fit into a single page. And that just means that we can do an atomic write on a page that has all the contents of a single log record. We don't worry about a log record spanning multiple pages. It's not hard to handle that. You just have to add some checks on and say this is what segment of this log record is on this page versus that page. If you don't see both of them, then it's considered not to be atomic. But for our purposes, we just assume it's a single page. We assume that we do four kilobyte page writes that are atomic. We're also gonna assume that we're only doing single versioning in the database system. And we just use strict 2PL so we don't worry about weird concurrential anomalies. And as I said before, we're also doing steel no force, which is sort of implied because you have to have that in order for all of this to work. So when a transaction commits now, we do exactly what we did before. We're gonna write a commit record to the log. And once that commit record is durable and flushed to disk, we can then tell the outside world that your transaction has successfully committed. Because all the log records that this transaction produced will have to get written to the log either with or before this commit record so we know that everything is durable. The transaction internally though is not actually fully completed yet. So even though we told the outside world it's committed, we could still maintain some internal metadata about what that transaction actually did. And we'll see this in a few more slides. Like we'll maintain a table that says, here's all my actual transactions. And when we actually complete all the metadata or internal bookkeeping we need to have for that transaction, when all that's done, we're gonna add a new special log record called the transaction end. And this is just telling the database system in the log on recovery that we will never see anything else about this transaction ever again once we see this transaction end. And therefore we can remove it from all our internal bookkeeping tables. So unlike in commit where we have to flush that to disk before we can tell the outside world we've committed, we don't need to do a hard flush on or f-sync on the transaction end. We just append it to our log buffer and it just gets written out as part of the normal process. Now, so just to see how this works, right? So this is our simple demo. So here we see we do our transaction commit, then we flush out the log, right? Now we update the flushed LSN to now point to the last log sequence, a last log record that we generated. And now at this point we can tell the outside world that our transaction is safe, our transaction is durable, it's committed, we're done. But then at some later point, we'll do some additional processing and then eventually we'll write out a transaction end message, right? And this is just saying at this point in the log, when we do recovery, we will never see this transaction ever again. For commit, there's not that much, there's not really anything we're gonna do special in between the commit and the end. For the abort, we'll see why we're gonna need this. And likewise, as I said before, since we know at this point, there's nothing really in here that we actually still need to have it hang around, like the commit's already been flushed out, we could just trim the log and reclaim this memory and reuse it, right? And that's sort of that ping pong effect that we talked about last class when we did group commit, right? Once I know this log buffer is empty because I flushed everything out, then I can use that to fill it up again. And then I write out the other buffer. So for a boards, this is gonna be different than what we did before. Question? Yes. So when we have an issue of transaction end, we know that the page has been written to this? No. Transaction end is an internal marker for the recovery algorithm to say that you will never see this transaction ever again. You will never see any additional log records for this transaction. Because it'll make more sense when we do abort. Because there's stuff you're gonna do after abort. The outside world doesn't see, it's not like a query you executed, but it's stuff we have to do to reverse the changes. Yeah, give me a few more slides. Yes? When did it like write a script in the book? Like a disk, right? Because I'm assuming it's like, do you write immediately after you call? Disk write, so what? All right, so at this point here, when does it get flushed out? It's the buffer pull replacement policy, right? What, two ways, right? So one is you need to free up space, you have to evict a page, it's marked dirty, you flush it out. The other one could be the background writer, which I think we talked about it briefly. In some database systems, there'll just be a separate, you have a separate process or thread in the background that occasionally walks through the buffer pull, finds all the dirty pages, and then just writes them out ahead of time. It doesn't evict them, just says, hey, let me write you out, and then I flip your bit to say you're not dirty anymore. So that way, when the buffer pull replace some policy runs, and it wants to evict a page, hey, I got one that's marked clean, I can just drop it, not have to write it out. But that's independent of what we're talking about here. In some ways, that's the beauty of the right-hand log is that we can separate the policy other than how we evict things, other than making sure we don't evict things that there's no log record out the disk for, we can separate that decision process from the right-hand log part. Yes, sorry. Is the flush LSN a legitimate page LSN, then you can't evict that page? Correct, so the statement is if the flush LSN is less than the page LSN, then you can't evict that page because you know that whatever this thing, the last log record you wrote out is prior to this one, so therefore, the log record that modified this page is not out on disk yet. What if that's true for all of your pages and your buffer pull is full, does that mean that you have to start flushing your log, is that like the correct thing? So this question is, if your eviction policy is running, you say, oh, my flush LSN is less than all the page LSNs for all the dirty pages in my log, all, oh, sorry, or dirty pages in my buffer pull, all my pages in my buffer pull are dirty, what do I do? You have to stall, you have to write out the disk, you can't, there's nothing else you could do. And that's what the background writer is supposed to be, it's sort of like, if I know I have some idle IO cycles, I could write these things out ahead of time, so I don't have to stall, I'm not on the critical path as an execute transactions. Again, the worst thing you could do is I'm running a transaction, it acquires a bunch of locks, then I have to do a bunch of disk IO because now that backs everybody up and everything gets slower. So if I do some things in the background, now it's not on the critical path while I'm holding locks and transactions can complete more quickly. And but how you balance that between like flushing this, flushing the log, flushing the background stuff, that's the tricky thing that data systems have to figure out. We don't let the OS do it for us. Okay, so for abort, abort are gonna be sort of a special case we're gonna have under Aries where we're gonna reverse changes for transactions, but we're actually gonna end up adding log records to keep track of that we have reversed those changes. This is way different than what we've talked about so far. Like every time we say, oh, we abort a transaction and roll back to the changes, we just assume that we can undo some stuff in memory. Now we need to keep track of everything that we're doing as we reverse transactions. So we're gonna add an additional field to our log records called the pre-VLSM. And this is not required, but this is just gonna make your life easier at runtime because now you can know how to jump back to find the next thing you need to reverse for a given transaction. Like to avoid having sort of scanning everything. And essentially think of this as like for every transaction you're generating a link list to say, for every aborted transaction, here's the link list, here's all the changes I need to reverse. I need to, I know how to walk back and one by one and reverse them. So we have our simple transaction like this. Now we have the LSN and the previous LSN. And so for the first LSN, for the first log record for this transaction T4, since there's no previous LSN at the begin statement, we just set the previous LSN to null. So this transaction runs, it makes some changes, but then it aborts. So now what we need to do is reverse those changes. I'm gonna add log entries for these. And then once we know all those log entries have been, are durable out in the disk, we can then add the transaction end message. Again, this is what I was saying. This is saying that the transaction end message is denoting that there's nothing else that could ever come about the transaction later on. We've reversed or done everything we've needed to do. So this part here, we're gonna talk about next, is how we actually record the changes that we're reversing as we go along. So these are called compensation log records, or CLRs. So CLR is just, it's like an update record, but it's reversing the change of an actual update record when the transaction actually ran. Yes? Yes? What do you mean the, like what are we doing at the end of the week? Like every two months, what are we doing? So your question is, why do we even have the previous LSN? What is this actually buying us? Because we just take every T4 until it's in T4. Yeah, so her question, her statement is, could you just check for this simple example to find all the things that are T4 and then just figure out those things I need to reverse? Yes, but what if I'm running like a high-end system that's doing a million transactions a second and I crash and now I have any given time when I crash, I could have maybe 100,000 transactions in flight. So to avoid having to scan everything over and over again to find exactly what I need to reverse, this is just to help her to get us there. So it's not required for correctness, it's, we use it for convenience. Yes? Why do we need to reverse? Why do we need to reverse? We haven't written anything to the disk. So for this example, no, but how do you know in real life, in real system? What did I say? I said we don't know, we're not logging what pages we write to disk. We don't know anything about what we've got written to disk because the eviction policy is doing its own thing. So that's why we're gonna play it safe and just reverse everything. So the CLR, again, it's gonna be like an update log record where it's gonna have before and after value, but it's gonna be tied to an update record that actually occurred during the transaction's regular lifetime. And then we're gonna have this undo next pointer, which again, just for convenience, you can tell us what's the next LSN that we need to reverse. So that we know for every given transaction, we would know whether we've actually processed all the ones we need to process to reverse all those changes. So the CLR is gonna be added to the log, just like any other log record, but they're gonna be added after the transaction gets aborted. And then once we reverse everything, then we add the transaction end. So unlike commit, with commit, I have to flush all the log records to disk before I tell the outside what I've committed. If a transaction gets aborted, we immediately go back to the application and say you've aborted. They don't need to wait around to write anything at the disk because who cares? They try to go read their changes, they're not gonna be able to do that as long as we're running with the correct concurrency protocol. So the CLRs just get logged any other record and then they just get flushed out eventually at some point. So let's look at an example of how we're gonna use CLRs. And so, because I'm running out of space on the slide because we have so many metadata fields, I'm now gonna show this in a tabular form. So here we have one transaction T1, it just does an update on A, and then it aborts. We don't care how it aborted, whether the transaction application said to aborted or the database system's procedure protocol said to aborted, we don't care, it's all the same. So now when we wanna start undoing its change, this is during the regular execution time, this is not during recovery, this is as we're processing transactions. We're gonna create a CLR entry that's gonna be tied to the update that it did. And again, it looks exactly like the first one, it's just that the before and after values were reversed. So before, the value was 30, and then after the update it was 40. So now what we're doing in our reversals we're setting the old value back to 30. And then we have this undo next record that LSN that says what's the next thing we need to undo to completely reverse this transaction. In this particular example here, the undo LSN goes to the begin statement, so we know there's nothing else for us to reverse for this transaction. So we can go ahead and add our transaction end record. Yes. You said that then we bought, we don't need to flush all the logs out of the disk, we can quickly tell the user that your transaction is a deposit, right? This question is, did I say that when you abort, you can immediately go back to the application and say your transaction aborted and that you don't have to flush any of these things out the disk? Correct, yes. Then like you made some changes and like you told the user that the transaction aborted, then you crashed and you lost these logs So this statement is I said that if you immediately tell the outside world you're aborted, but then you create these log records, you reverse the changes, but then you crash. You need the above log because it's not up here. Yeah, do you know what you can use, right? Like, you're right. So you can lose these, do we care? He's shaking his head, no, why? Because we don't have to abort. Well, yes you're gonna abort, but like something more fundamental. So the page that got modified by this transaction is not on disk, so who cares? But it can happen that you lose half of them and half of them get written to the disk. Half of what, sorry? So like from one to 11, you one to four get written to disk? Yes. Four to 10 get lost. Who cares? Then like when you are redoing it, then you do one to four, then focus on it. Because you undo it because it didn't commit, right? So again, we'll see this when we do a full example. If I don't see, if I crash here and I don't see what was the final determination of the transaction where the commit was aborted, it aborted, right? That's the beauty of the red head log. It's a single source location that tells what was the final outcome of what happened in the transaction. So we got here, say we got here and we aborted. We tell the outside world we aborted, we crash before we do all this stuff. Who cares? Because the transaction got aborted. We don't have to guarantee anything's durable. We come back and it's as if it wasn't there. Because we'll reverse any changes that could have gotten written up to disk. And like if you write to the page, then you'll see that slash elephant hand. Correct. So if the page got written to disk, you would have seen, you'll have to see all these things anyway. So we know how to put it back in the correct state if necessary. Yes. When are we going to tell the application level that we have successful source after the log development? Here, yeah. So say they told us roll back from the terminal, from the application, then we could append this log record into the red head log in memory, go back and immediately tell them that it aborted, and then now start cleaning up all its internal changes. So we don't have to wait to do any cleanup on the things that it modified. We can tell it aborted right away. So this is very, this is different from the commit one. Right, commit, again, commit says I have, commit, if the Davidson tells you you committed, it has to guarantee that everything's durable. So it has to be any log record that corresponds to the changes that it made has to be written to disk. Up into the commit. We don't care about transaction end. That's internal for us. That's what's called a board. Meadly come back, yes. I didn't think about it. Like who cares that I aborted? Why do I care that you're gonna go do much of stuff to clean things up? Why should I wait for that? For any new transactions that might want to edit the same page, they have to wait until I undo the changes I made to that page. So his question is, if any transaction wants to modify the same page that got modified by a board of transaction, do they need to wait until the, until the log record that modified it got written out the disk? No. Right, because if I, like say I abort here. Say this is in a page, page A. I abort here. I can immediately release whatever can, assuming two days locking, I've released the right lock I have on that page. Any now other transaction can come along, start modifying it. So if they start modifying, what is that gonna do? That's gonna create new log records, bump up the page LSN for my page, and therefore that page can't be written out the disk until my new log records have been written out the disk. So who cares, who cares if this other transaction modified it? This is fine, like you don't need to write things to the log, but like you need to reverse the value so if somebody is reading, so we can't read the value of an abort to the log. Yeah, yeah, you're correct. If I try to do, if I try to read the change made from an abort to transaction, I have to reverse them. Yes, before I'm allowed to read it. Somebody can't read that page unless these companies log records are there. When you say written a disk or written a memory? Written a memory. Yeah, sure, yes. You have to, yes, you'd have to reverse that. Let me think whether that's true, always. If it's single version, yes, multi version, no. Because that's the beauty of multi versioning, right? Who cares? Because now you just have a version that doesn't exist anymore. You ignore it. Multi version, you can go about it. Yes, in single version, yes. You'd have to reverse these things, yes. Yes. Yeah, so there's a case where, say, you have a lot of operations, say, in this case, and the first port got written to the disk and like on the 11th operation you abort, so eventually you need to reverse both the things, like in the memory and in the disk also, right? Yeah, so the same it is. Let's say I have a transaction that makes a lot of updates. Half of the updates get, you apply all the updates, but half of them got written out to disk. How do I reverse those changes? Because now they're out on disk and because it's the steel policy, I'm allowed to commit right out, I'm allowed to flush out dirty pages. Yes, you'd have to bring them back in and reverse them. When you actually do that, whether you do that immediately on abort or whether you do that lazily, like the next time someone goes to read it, then I'll reverse it. Yes. I love these questions. There was some dude on YouTube complaining, like, Professor Pablo, I like your class, I like the material, but all those pesky students keep interrupting me asking questions. That's a bad dude, right? Let him get to see me when I come ask questions. All right. So now for our board algorithm, again, we've already said this, we're going to write a board record to the log for every transaction, and then we're going to play back the transaction updates and reverse order for the transaction. And then for everything that we're going to reverse, we add a CLR entry to the log that's tied to the original update that occurred during normal operation, and then we store the original value. And then once we complete all these CLRs, once we've done all they need to do for this transaction, then we can add the transaction end message. So CLRs will never need to be undone. We can only redo them during recovery, because they're sort of like a one-way update. Okay, so now, all right, so what have we covered so far? Now we know we have log sequence numbers, we have these CLRs, and we know how to keep track of the LSNs at various locations for us to figure out what has been written at the disk or not. So now let's talk about how we're going to bring along some additional metadata to figure out what was running at the system at the time and use a better checkpointing scheme. So in our last class I said that checkpoints are necessary for us to be able to figure out at what point can we figure out, what point can we start recovering the database? Because without checkpoints, we potentially have to go look through the entire log. And if we have one year is worth of log with no checkpoints, we may have to potentially replay one year for the log, which could take a long time. So let's look at two bad ways to take checkpoints, and we'll see why fuzzy checkpoints are the better way. So if I were to discuss one of these bad ways last class, and I said the technique we were going to use was to just halt the execution of any new transaction and then take the checkpoint. And this will guarantee that my pages that are written out the disk, as part of that checkpoint, are consistent. There's no torn updates from a transaction that was running halfway through, right? So basically what happens is on the front end, on the network layer, we halt the execution of any new transaction, and then we just wait for all of our workers to finish executing any transaction that's already running. So this is obviously bad, because we have to wait until all the transactions are finished, we can't start executing anything new. So if we have one transaction that's going to take five hours to run, then we have to wait for the five hours to finish before we can go ahead and take our checkpoint. And then during that time, we can't execute any other transaction. So our system looks completely unresponsive. From a recovery standpoint, this is great because the checkpoint is consistent, as I said. There's no intermediate updates, it's exactly as it was when it only contains changes from committed transactions. So a slightly better way to do this would be to pause transactions while they're running while we take the checkpoint. So what I mean by this is any transaction has a query that's modifying the database. We just pause it. Anytime they try to acquire a write latch on a page, we prevent that and they have to stall. Any read-only transaction can still run, they can read whatever they want because that doesn't interfere with that checkpoint. And then again, there's concurential protocol up above to figure out whether they're allowed to read certain things, we can ignore all that. But it's all the right transactions, we're just gonna pause them. And so it would look something like this. So say I have in memory, I have three pages, I have a transaction and a checkpoint that won't occur at the same time. So say this transaction's gonna update page three and page one. So it's gonna start at the bottom here, applies change to page three, and then before it can update page one, the checkpoint starts. So we have to stall our transaction, right? Because it's gonna try to acquire the write latch on that page one, you can't do that because the checkpoint's occurring, so it just stalls. So now the checkpoint, all it's really doing is just a sequential scan, or a scan of every single page in our buffer pool and just flushing them out the disk. So our checkpoint's gonna write out page one, two, three with the modification that the transaction made to page three. Then the checkpoint finishes, our transaction gets installed, and then we now update page one, right? The problem is now on our snapshot and our database, for one query this transaction executed, we saw half their changes. So our checkpoint or the state of the database on disk is not consistent. So in order to handle this, we wanna record some additional metadata to figure out what transactions were running at the time we took the checkpoint, and what pages were dirtied in our buffer pool while we took the checkpoint, so that we can use that information to figure out later on, oh, well, this guy updated page one, and I missed it on my checkpoint. So I need to make sure that I wanna replay any log records this guy made to put me back in the correct state here. In the back, yes. Be very careful on the language here, checkpoint. It allows the transaction to see where anyone's on. For whatever pages, you already take the trouble, that's part of the page, because you already have the... Yes, so his question is, in my example here, I showed in a brute force or coarse grain write latch on the entire system. So this guy has to finish his checkpoint before this guy's allowed to go, or could I say, well, I'll just release the write latch on page one and then allow him to update it. Yes, but you still have the same torn update problem. I'm not saying, I'm using this as a straw man to say that this is a bad idea and we'll see how to do it in a better way, but that's an obvious optimization, yes. Can you say we care about regarding what page we're dirty? Will we only care about whether we're dirty by the false transactions or are we dirty by the false transactions? This question is, when I say we care about dirty pages, which is gonna be the dirty page table, do we care about transactions that are paused or any dirty page? In the real system, it's any dirty page. Because in the way we do fuzzy checkpoints, we're not actually gonna pause transactions. Yes. So like when you do the tick point and you write everything, right? So I'm thinking that these two things don't need to be written because... Sorry, what two things don't need to be present? A, T, T and D, P, T. So like you just see the log, you see which transactions have not committed like if you have to recover from a tick point, say like today everything crashed and now you're recovering from a tick point, you see which transactions did not commit and you just throw them back. So his statement is you don't need the dirty page table and the actual transaction table because if you just replay everything and then reverse them, wouldn't that put you in the correct state? Yes, I think I agree with you. But that's gonna be super slow because you're gonna have to update, you're gonna bring back every single page, modify it, right? Whereas with this metadata, we can avoid that. But like you have to only replay the log since the last tick point, right? No, not true. You have to go farther back, potentially. What? Few more slides, we'll get there. Because you don't know what's been written in the disk. All right, so there's the actual transaction table and the dirty page table. So we're gonna record this information when the checkpoint starts. We're gonna write it out with fuzzy checkpoints when the checkpoint ends and then we'll see these two concepts come up again when we do recovery. We're gonna basically replay the log and populate this information to figure out what we need to commit or undo. So in the actual transaction table, this is gonna be for every single actively running transaction at the time the checkpoint starts, we're gonna record its transaction ID, its status and then the last LSN that was created by this transaction. So the status is either when it's running, it's committing, so it's committed but before we get to transaction end or it's something that we think we have to undo. It may have to undo. We don't know yet, right? Because we don't know what its final outcome is gonna be. And again, when we see a transaction end message, we can remove this from the ATT because we know we're never gonna ever see it again. So that's why there's no completed or finished here. So this will just be hanging out in internal memory. We can populate this while we take the checkpoint but then it's included in the checkpoint end message which we'll see in a second. Then the dirty page table is just keeping track of all the pages that are in the buffer pool, the pages that have been dirtied in the buffer pool that were modified by uncommitted transactions. And for this one we're just gonna record the Rec LSN which is the log record of the first transaction that modified this page that made it dirty since it was brought into memory. So let's see a slightly better version of checkpoints that's using this information. So we see now in our checkpoint entry, in our log record we're gonna have the ATT and at this point here we only have one transaction running T2. So that's the only thing we have inside there. And then we have the dirty page table and we have P11 and P23 because there's P11 was modified here. So P11, P22. P11 was modified here and P22 was modified here. So the syntax I'm showing now is like here's the object that was modified and it's pointing to what the page number was. So in this case here we don't record anything about transaction T1 because transaction T1 committed before my checkpoint started. So I don't care about it at this point anymore. So then now in the second checkpoint T3 is still active and then we have two dirty pages here because T2 committed before our transaction started. So this is still not ideal because we're still stalling all our transactions in order to take this. So we're pausing everything at this point here. These guys are not allowed to modify it and so this is just saying we wrote out a checkpoint but oh by the way here's some stuff that could have been modified during this time to make sure that you find it. So like the first one, the first checkpointing scheme I showed you, nobody actually does this one either. Everyone instead does fuzzy checkpoints or systems that support high performance checkpoints are doing fuzzy checkpoints. So fuzzy checkpoint is just where we're gonna allow transactions to keep on running, keep on modifying the database while we're taking the checkpoint. And so in order to record the boundaries of when the checkpoint started, when the checkpoint finishes to know whether something could have been written out that we missed, we add a explicit checkpoint begin and checkpoint end log message. So the begin is just telling us when the checkpoint started and the end tells us when it finishes and this will include the ATT and DBT that occurred during the execution of the checkpoint. So we go back here now. So now we have our checkpoint begin and then checkpoint does this thing, starts running things out and then in the checkpoint end we include that we have transacting T2 because T2 started before the checkpoint started and then the dirty page table tells us that D11 was modified during the checkpoint as well. We don't need to include T3 here because the T3 started before our checkpoint started. Sorry, it started after our checkpoint started. So once we have the checkpoint end written out the disk successfully and which means we flush all the pages out that we wanted to take it in the checkpoint, then we go ahead and update our master record to now include the point to the checkpoint begin because that's gonna be our anchor point where we start our analysis during recovering. Because we're gonna know at this point here, right? Here's all the, we flushed all the dirty pages but we kept track of maybe ones that we may have missed because they got modified. Yes? Where do you keep the checkpoint end? Where do you keep the checkpoint end? What, sorry? Where do you keep it? You start, you begin it and where do you decide to put it? Just before the comment. When you say, when you, like the log record? It goes in the log, but when it ends, you shove it in. When did it end? When you scan through the buffer pool and written out all the dirty pages to disk. And after you flush them out, you F-sync, right? Because you make sure it's durable. Then you add the log entry here and it's committing, I'm using committing at quotes because like a regular transaction, I flushed the log record for this, to disk. Yes? While writing ATT and DQT after the checkpoint end, so we are not holding any log on the complete database because we are not searching anything else to the log at that point. So this question is, am I holding any locks on the entire database while I write this thing out? No. We are not writing anything to the log about any of the transaction during this time. What do you mean, in here? Like, while you are writing ATT and DQT to the log. Yes. Some other transaction may be making some changes, but if you are not writing them to the log write-off, that's why it does not matter. Like, those writes that they can, could have happened or could not have happened. Until you write, if something was writing to the log with, while you are writing ATT and DQT, then it will be wrong, like, because you wrote, like, a transaction T2 is running and while you are writing to the log, it also commuted. Yes, I think what you're saying, there is a sort of stop the world moment here where you briefly flush this thing out, but that's not, that's a blip, it's a minor thing. I'm saying that there is a stop the world moment, but like, it's only for the log. Correct, yes. They can change whatever else they want to change. Yeah. In the regular buffer pages, yeah. So now let's do recovery. After all that, after 40 minutes of minutiae of log scene with numbers and fuzzy checkpointing, let's talk about actually recover this. And then, given that everything we've set up now that we have all this extra metadata that we were recording, recovery actually is not going to be that bad. The tricky part is just figuring out where you start each of these phases in the log. So the analysis phase, you're going to look at your master record for the database on disk and that's going to give you the location of where the last checkpoint began in the log. So you jump to that location and you scan forward through time till you reach the end of the log and then you're just going to populate the DPT and ATT to keep track of what was going on in the system at the moment of the crash. And then now it's going to figure out, you're going to use that to figure out what transactions you need to abort, what transactions you need to make sure that you commit. Then in the redo phase, you're going to jump to some appropriate location in the log where you know there could be potential changes from transactions that did not make it safely to disk and you're going to start reapplying those changes until you get to the end of the log. And you're going to do this for every transaction you see, even ones that you know are going to end up aborting because on the analysis phase, you see everything through the first pass. So you know what's going to commit, what's going to abort. So then in the redo phase, just for safety reasons, we're just going to reapply everything. Then in the undo phase, now you're going to go back and reverse order from the end of the log up until some point to reverse any changes from transactions that did not commit. And when the undo phase is done, then the database is now in a state that existed, a consistent state that existed at the moment of the crash with no partial updates from aborted transactions and all changes from committed transactions have been applied to disk in the back, yes. Next slide, yeah, okay. So again, three phases, analysis, redo and undo. So at the very beginning, we look at the begin, we figure out where the begin checkpoint is in the master record and that's where we're going to begin our analysis. So let's say that this log record here is the start of the last checkpoint. Because again, that's in our master log record, we know where that is. And then now we're going to scan forward through time and look at these log records and build out our ATT and DPT. And then now we've got to figure out, well, for the redo phase, what is the smallest rec LSN in the dirty page table that we found after do our analysis, right? So this is telling us, this is the location of the first log record, the oldest log record that modified a page that may have not been written to disk. So when we redo, we jump to this point and reapply all our changes. And then now in the undo phase, we start at the end point and go back in time up until some point where we know that this is the oldest transaction that got a border that was actively running while we took our checkpoint and we reverse all those changes. So the errors are sort of showing you the boundaries of how far you have to go back in time in the log. So I'll go through each of these more precisely in context of LSNs one by one, but is this clear at a high level what we're doing? Analysis goes forward in time, redo goes forward in time, and then undo goes backwards in time. And for undo, I may not be undoing every single log record I see here, right? It's just for the transactions that I identify in my ATT after the analysis that should not have committed. Okay, so this sort of summarizes more concretely what I just said. So in analysis phase, we're gonna scan the log forward from the last successful checkpoint. Anytime we find a transaction and record during an analysis, we can remove it from our ATT, right? Otherwise, for any other record, if it's the first time I've ever seen this transaction, we add it to the ATT with the status of undo because we don't know, because we're going forward in time, we don't know whether it's gonna abort later on. If we see a commit record, then we just change its status to commit. Like I said, when it's when it's the end, we can remove it. And then for any update record, we're gonna look to see whether the page that's in the update record that's being modified is in our DPT. If not, then we go ahead and add it and we set the recLSN to be our LSN. Like this is again, this is telling us this is the log record that first made this page dirty when it was brought into memory. So now at the end of the analysis phase, the ATT is gonna tell us what are all the actual transactions that we had running in the system at the moment of the crash? And the DPTs are gonna tell us what are the dirty pages that could have been in our buffer pool that may have not been written to disk? And we're doing this, we have to build this table because again, we're not logging out every time we do a buffer pool flush to a page on the on disk, we're not recording that in the log. The log records are enough to tell us the potential where it got modified and we're trying to reconstruct it. Yes. So you have other and no for sure whether a page was written to disk? This question is how do I know if we're sure whether a page has been written to disk? Like in the log or in the real world? Like on hardware or in the log? On the basis of the log, I can answer that. You can't. Because there's no information that tells me that it's been written to disk. Not entirely true. When we see redoable seasons in a second. But in general, if you know the LSN of the log record you're looking at is less than the rec LSN of the log of the page as it exists on disk, then you know that your change got written out the disk to the page, the page got written out the disk. Then it got dirty again by another one, but your thing got written out before that. All right, so quick overview of the analysis phase. And so here I'm just showing you what the ATT and the DPT are gonna look like for these different LSNs. So we get our checkpoint. We don't know anything. So the ATT and DPT are empty. Then we do an update in log sequence 20. So for this one here, we have transaction T96. It's the first time we've ever seen it, right? Because again, we don't have a begin record here because it began before checkpoint started. So we see that and we update our ATT to say, hey, we have a transaction here at T96 and the status is a candidate for undo because we don't know whether it's gonna commit or not. And then we see that it modified page 33. So we add that to our dirty page table with the Rec LSN of our log record here. Then now our transaction ends and now we get more information about what's in the actual transaction table on DPT. So now we see that there was a T97 that we didn't see in between our checkpoint. So we know that there's some transaction up above this checkpoint, the checkpoint start point that did some stuff that we may need to go look at as well. And then there's also a new page 20 that was also modified. So we wanna include that in our DPT as well. So now we see T96 commits here. So we flip its status to be committing. And then when we see the transaction end message here, then we know that we can remove it from ATT. But now at the point of the crash, we see that there is T97 still hanging out here with the undo candidate status. So we know that this transaction made some changes up above our checkpoint that we didn't see in our log that we need to go back and make sure we reverse. Cause we don't know whether those pages got written out to disk yet. Yes. So it's very possible for the office space can not like kind of skip the other pages because it's like no, like the ATT is empty or something like that. Your question is, is it possible for the, after the analysis phase, the ATT and DPT are empty. So therefore you know that nothing, there's nothing was dirty. Could you just say I'm good? Yes. And actually you would just sort of see that cause there would be nothing. Yeah, if it's empty, then you know there's everything that made it out. Yes. So the logic behind all of this, once you do check point begin, until you do check point end, the buffer pool manager cannot write pages up to disk, right? Your question is, if during the, in between the checkpoint, is do I, am I not allowed to write out dirty pages out to disk? The buffer pool is not allowed. The buffer pool is not allowed to write out dirty pages out to disk. Only check point can the person take the check point, write all of the pages. Yeah, actually I don't, I think yes, but I actually don't know the answer to that. I think yes, because like why would it, why would the buffer pool manager ever need to write out dirty pages? Well, I need to evict some stuff to make space where the checkpoint essentially is just doing that. That is fine, but the thing is that if buffer pools write something to the disk, then the sum records that you have in between become. Correct, yeah. So I think the answer is yes, but I should double check that. So his statement is that during the checkpoint, the buffer pool manager is not allowed to write out dirty pages to disk. Because you may not, missing that something. I feel like the answer is yes, but I actually don't know. I need to think about that. All right, so again, after the analysis phase, we have ATT and DPT that's telling us what was going on at the time of the crash. So now in the redo phase, we want to repeat history. So we're going to apply all the changes from some point in the log where we know that there was a dirty page that was modified that was potentially not made out to disk. So we're going to reapply all those changes even the aborted transactions. And any CLR we see from an aborted transaction, we're going to redo them as well. So this, as I said before, this is going to seem very expensive because we're going to be reapplying changes. We may technically, may not need to reapply. For example, for a transaction, we know who's going to abort, who cares about bringing a page in, making, doing update, and then reversing it in the CLRs. If that was the only transaction that modified that page, I could just skip that transaction. So there are optimizations like that you can do, but again, we're just going to ignore that because we want to make sure that everything is sort of clean and incorrect. So as we do the redo phase, then we start from the log record containing the smallest RECLSN and the DPT because again, that's the first log record that modified a page that dirty it up that we may not have made it out to disk. Then as we scan through for every single log record of CLR, we look at LSN and we're going to redo the action unless the page is not in the dirty page table, in which case we know that our modifications were flushed out the disks at some prior point or if it is in our dirty page table, but our records LSN is less than the pages record LSN. And that would mean that we made some change to the page, then the page got written out the disk, but then some other transaction made that made a change to the same page, and then that second change didn't get written out to disk. And so that's why again, if we record the RECLSN right out the disk, we can recognize what actually got, what changes actually got modified. So to undo an action, we just reply the change and we set the page LSN to the log records LSN. It's just as we would normally do during regular execution of the transaction, but during the normal operation, we're not doing any additional logging. We don't have to worry about flushing anything extra. We can sort of do everything asynchronously. So then when we get to the transaction end message, we'll just go ahead and remove it from the ATT and if we want to be super careful, we could flush everything at that point. So now in the last phase of the undo, so this is just undoing all the transactions that we saw at the ATT after the analysis phase that are hanging out with the undo flag, or undo candidate flag. So we're gonna go back and reverse order and we're gonna reverse their changes in log sequence order. So even though transaction T1 ran before T, or T1 aborted and T3 aborted, if we see the log records for T3 first, we'll reverse them first before we get to T1. And every single time we reverse the change, we're gonna add a CLR message. And this is gonna allow us to recover the database if we're crashed during recovery because we know what we actually reversed. So there's a lot of hand waving, a lot of text. Let's walk through a simple example here. So here we have now a right ahead log going forward in time and we see that we did a checkpoint and it finished. And then we have T1 did a modification on page five, T2 did a modification on page three, but then T1 aborts. So during the normal execution, what do we do to abort this? We create the CLR that says we wanna undo the change from this one here. And then once we know that that's been applied, then we go ahead and add our transaction end message to say that this transaction is fully done. So now, I'm also not showing the previous LSN, but again, think of that as a link list to tell you for a given transaction how to walk back through its updates and be able to reverse them. So now let's say at here, we do a bunch of changes for T3 and T2, but then we crash. So in the analysis phase, we would come back and populate the ATT and the DPT. So this is, I'm running out of space, so this is a truncated version of the log. So for example here, I had 40 and 45 in two separate lines, now 40 and 45 on one line, right? And there's the CLR followed by the transaction end message. So when we come back, our ATT tells us that we had two actual transactions, T2 and T3. And then we have the last LSNs that point to the last modification that they made. So we're gonna look at the transaction that has the greatest rec LSN to start doing undo. So assuming we've already redone everything, right? And now we're gonna undo this thing and we're gonna add new log entries to reverse these changes because these transactions aren't allowed to commit. So we look at this and say, T2's last LSN is 60, T3's last LSN is 50. So we're gonna, we wanna reverse the LSN 60 first. So we go ahead and create the CLR, right? This to reverse this change and then we add this undo next LSN, the point to the next LSN we would need to reverse for this transaction here, right? And again, I think this is a logical pointer for convenience to tell us where we need to jump to next. Then maybe we add the CLR for transaction T3, right? And this is the last thing we actually need to undo for this transaction. So therefore, we can go ahead and create the transaction end message right away, right? And at that point, we're gonna flush all the dirty pages that this transaction modified and the right-hand log to disk. So at this point, we know that we never need to recover or undo this transaction ever again to reverse anything because the log already contains everything you need to do to reverse it. So we'll redo it if we crash again but we won't have to undo it anything. So now let's say we crash here, right? We're doing recovery, we're in the undo phase, we crash and restart, all of this gets blown away because this is just hanging out in memory. So when we come back, and our ATT would say, well, we only have T2 sitting around at the end. We need to make sure we need to reverse that. So in the next thing we need to reverse, the starting point for our reversal is here. So for this, we've already applied it during the undo phase so there's nothing to undo because you can't undo and undo. You just redo them. So you redo undos but you don't undo undos, okay? So the undo next tells us that the next thing we need to reverse is up here at 20. So we add a new CLRR for that, then we have our transaction end message because it's the last thing we need to do. We flush the log and then we're done. At this point here, after this has been flushed, the dirty page has been flushed, we know that the database is in a consistent state as it has existed at the moment of the first crash with no partial effects from any aborted transactions. Yes? Is it worth taking a check point at the redo phase? This question is, would it be worth doing a check point at the redo phase? That would, so that would make it so, you would do that if you assume you're gonna crash in the near future again. Otherwise, it's just excessive disc rights. So nobody does that. It's like running a database for Puerto Rico. If it's 1970s Puerto Rico and you don't have power, yes, you do that. Okay? Is it supposed to be every time to like, after doing the transaction and after compensation log record to flush the dirty pages? This question is, is it, are we required to do this flush here? No, that's an optimization. We can remove that. Yes, yeah. Next slide. All right, so in this example here, I show what happens if you crash during undo. If we crash during the analysis phase, what do we have to do? What's that? Nothing, right? There's nothing to do, because if we just always do this free the log, we just come back and do it all over again. If we crash during redo, what do we have to do in the back? Nothing, exactly, yes. Because you just come back again, redo it all over again. Including the CLRs that you generated from if you undid things previously. So it's only on the undo phase where you potentially have to go figure out, all right, what was I undoing at this given time? So related to his question is, is there a way to avoid all those extra disk flush every single time we have a transaction end, and sort of related to his question as well? You just assume that you're not gonna crash during recovery, and therefore you're just gonna flush the changes from the dirty pages. Use a synchrony flush them out the disk. When I think you finally say the data says online and ready to start processing new transactions, I think most systems would take another checkpoint then, but you don't have to for correctness reasons, you don't have to. So how can we, another way to improve performance during the undo phase? Well we've already talked about a couple of these. We talked about figuring out that this transaction aborted and it's the only thing that modified this page, therefore I don't need to reverse any changes on that page. Another approach would be to actually lazily apply the rollbacks for transactions at runtime. So you do the analysis, you do the redo, then you figure out what you need to undo for every single page, but then rather than applying those changes, you just sort of keep them around somewhere in memory, and then any time a new transaction comes along and it wants to read that page, then you go ahead and apply the lock. So the idea here is you almost have instant recovery that you say, all right, I'm back online. Even though my database is not in the correct state, no transaction can read those pages that have not been rolled back yet correctly, and only when you go ahead and read them then you actually apply them. The idea there is like if you have a large database and for the undo phase you only modify a small portion of it, rather than blocking access to the entire database while you undo the smaller pages, you really come back right away and let anybody read whatever they want, it's just you block them when they try to read things, you haven't rolled back yet. I don't think anybody actually does this optimization. And then the last one would be rewrite your applications so you don't have long running transactions. So you minimize how far back in the log after the last checkpoint you have to go and then replay that. If you can cut that down then the redo phase and undo phase will much faster. Okay? Everyone ready to kill me or fall asleep or what? Or go out and build your own database. All right, so. The main ideas for areas we covered are against write a head log using a steel no-force with fuzzy checkpoints which is essentially just taking a snapshot of all the dirty page IDs so we know which one's got modified. And then we're gonna redo everything since the earliest dirty page we had in our write a head log and then we undo the transactions that did not end up committing before the write a log finished. And then we add these CLRs to make sure we've record all the undo operations we're doing for updates to the database. And the log sequence numbers are the way we're gonna use to figure out whether the log record that modified a page has been successfully written to disk or not. All right, we have like three minutes. Let's just do a quick demo to show you that this actually does work. So this is gonna be my SQL and we're gonna have one table. Let me turn this off too. Sorry, let's suck. So we're gonna have one table that just has a single row that has 10 columns, right? One, two, three, four, five, six, seven, eight, nine. And then we're gonna have a simple Python program that in a single transaction it's gonna take the first column and just slide it over to the next one. And then it's gonna increment every column by one. And so we're gonna let this run. And just to prove that it's actually working we go select again, right? We see it's incrementing one by one, right? So it's running in an infinite loop. And down here we have the log from my SQL to tell us that it is actually running. So what we're gonna do is a hard kill on my SQL, kill-9, right? That's gonna kill everything. And lo and behold, you see that the log actually got tripped. It says that we crashed and then this is running Ubuntu. It has a service that says if my SQL crashes it automatically restarts it for you. But you can see we got disconnected up here and then our Python code got disconnected here. So this proves that when I did the kill-9 that like it killed everything. So let's actually go look in the log and see what it says. The details of all this doesn't matter but lo and behold up in here says database was not shut down normally starting crash recovery, starting to apply a batch of log records in the data. So this is just percentage. So this is doing Aries. This is saying like, let me go look in the log, figure out what's running and go ahead and reply them. So now when we go back and look inside of our database, when we reconnect it to prove that we don't have any torn updates for our transactions, we should be guaranteed that every column is one more greater than the previous column, right? And lo and behold, in this case it is. My SQL works, okay? I don't know what other way to show you a demo of the database crash other than that. It's the best I could come up with. Okay, yes. Sorry, I don't understand why database starts to recover. When I killed it, why did it recover right away? That's a Ubuntu thing. Ubuntu has a service that says if this thing crashes, immediately restart it. Yeah, it's not special to my SQL. It's the operating system doing that. Yeah, okay. All right, so at this point you can say, you could quit CMU, you have enough information in your brain to go out and say build a reliable transactional database management system. Like you can do recovery, you can do transactions, you can do query optimization, you can do database storage, right? This is what I've, at this point in the semester, these are the core things you need to know of what a database does. So starting next week on Monday, now we'll start talking about distributed databases. All of the same concepts that we've talked about so far this semester still have the same problems or same issues that we think account for in a distributed database. Just now we have to account for the network so it's going to be slightly more tricky, okay? So again, at this point, even though like say, oh, I want to work on distributed databases, you need to understand the single node databases first and this is the point in the semester where you should have a full understanding of what they're actually doing, okay? All right guys, have a good weekend, enjoy the weather and see you on Monday. Oh, we're coming full with Michelle and crew. Two cent full of case in me say not true. In the midst of broken bottles and crushed up can, let the cows in the gym or I'll drive. It's with St. Nides in my system, crack another unblessed. Let's go get the next one and get over. The object is to stay sober, lay on the sofa, better hit down my shelf. I'll be team stressed out, could never be sun, Rick and say jelly, hit the deli for a part one. Naturally blessed, yes. My rap is like a laser beam, the puns and the bushes. St. Nides, felican team. Crack the bottle of the St. Nides, sip it through those, you don't realize that drinking ain't only to be drunk, you can't drive. Keep my people still alive and if the St. don't know your vote, can a pain, a pain.