 And there are a few recovery schemes described there. In the sixth edition, we decided there are too many recovery schemes described there. It causes confusion. You know, you say which one is the first recovery scheme or the second one or the third one. The first one says without concurrency, then there is a version with concurrency which is slightly different. Then there is one more which says advanced concurrency and then there is Aries. There were just too many things. So, we decided we will simplify life and give one scheme which is easy to understand. It also happens to be a very, very widely used scheme. So, it makes sense to cover that scheme rather than cover other schemes which are not used that much and just cost more headache for people to remember the alternative schemes. So, that scheme is also, it is essentially, if you are familiar with the advanced recovery algorithm that had two parts. One was called repeating history, the other was logical undo. We split those two and now a recovery algorithm has the first part which uses repeating history but does not worry about logical undo. The second part worries about logical undo and the third part is Aries. That is actually a nice progression from these three. The second one is not a new algorithm, it is an addition to the first one and the third one again is not an entirely new algorithm, it is modifications to the second one. So, it is cleaner that way, that is the structure. So, now let us get into the actual algorithm, we are all familiar with failures. Why do failures occur? Logical errors which cause the transaction to roll back, system errors, the power failed, there is a deadlock which forces the system database to roll back a transaction, all these problems can occur. So, if there is a failure what do you need to do to ensure atomicity you have to roll back. So, that is a key part. The other part is durability to ensure that once you have committed everything is safe in spite of failure. All recovery mechanisms are predicated on the fail stop assumption. What is the fail stop assumption? Say if something goes wrong, this will be detected and the system will stop immediately. Now, this is an assumption, it is not guaranteed actually. So, what do we mean by this? It is possible for some kind of failure to corrupt the database without you ever knowing about it and you are in trouble, you cannot recover from that failure because you may not even we will detect that failure. So, what all database system implementations do is they have a huge number of assertions internally which will detect most such problems. If there is any corruption of data, they will detect it and they will detect it in time so that generally it is not guaranteed, but because there are so many assertions most of the time it will be caught before it has further repercussions and then you will can hopefully recover from it. The recovery algorithm itself is only proved correct under this assumption that failures do not have will result in stop immediately or at the moment that affected data is read it the system will stop and recover. Now, as you all know I am going to skip the first parts in some motivation. Let us get to the last part here. The actions taken during normal transaction processing which is logging primarily and then actions taken during recovery to use the log to recover. In order of familiar with notions of volatile non-volatile memory and also of stable storage which physical blocks are the data blocks on disk, buffer blocks are the same data temporarily in memory which updated of course it is not yet been written out. Input and output are the operations which read from disk and write to disk. In contrast we will use read and write to denote getting data from the buffer into work area of the transaction and write puts it back in the buffer. So, write does not mean the data output it is just in the buffer and read gets it from the buffer, but of course if it is not in the buffer automatically an input will happen before it reads it ok this is the model any questions. So, when we looked at the schedules of transactions what are the reads and writes? The read is this getting from buffer, write is writing to the buffer that does not imply it is written to disk the output to disk can happen much later. So, we have already seen what is read and write and as I said output need not immediately follow write. In fact output need not even happen at the time the transaction commits it can happen even later depending on the recovery mechanism used. We are going to focus on log based recovery there are other schemes like shadow paging. The earlier version of the book had a lot of details on shadow paging, but it is not used much these days. So, we have cut down on the description now in the current edition. So, you know what is the log it is a you know it is basically a record of things which happened updates primarily and also transaction start commit and so on. So, we assume that when a transaction starts it writes a record ti start. There are some tricks for example, the areas algorithm does not bother the first time you see a record of the transaction is the start, but anyway we will forget those optimizations. So, it has to write a ti start before it does a write it writes a log record containing its identifier. The data item which is being written the v 1 which is old value of the data item and v 2 which is the new value of the data item that is the log record again this is standard. You are and at the end it writes a ti commit. You are all familiar with the two approaches deferred and immediate. In deferred when a transaction does a write that does not mean it the write operation is done immediately it is kind of kept locally in the transactions local memory and the actual write to the database happens at the end of the transaction. In immediate the moment something is updated it can be written back to the database it can be does not mean it has to be it can be. So, with immediate modification in the buffer you may have updates which are not yet committed. So, you may have and that in turn may be written to disk also. So, on disk also you may have update which is not yet committed which means the log has to have enough information to allow you to roll it back. Differred modification simplifies some aspects of recovery, but it has extra overheads. We are not going to consider it for the moment. We can do some optimizations to the recovery algorithm we describe, but we are not going to bother about it. And of course, when does the transaction commit? Commit has to be atomic and the atomic commit is said to happen when the commit log record hits the log on stable storage and log is written sequentially which means all the log records of the transaction are on stable storage and the commit is there. So, moment the commit comes there we have enough information even if there is a failure to redo the transaction. So, that is why that point is the atomic commit point. At this point writes perform by the transaction may still be in the buffer. So, the database copy is not yet updated by an output that is ok. So, now let us look at an example t naught starts it updates a from 1000 to 950, b from 2000 to 2050 and then at some point it writes a and b. It has to write it before it commits. So, these are the local it at this point it could have updated a locally it has just written the log record, but before it writes it has to write the log record and then it commits. This fellow then starts update c and commits. Now, note that the writes are done before the commit, but this is still in the buffer it is not yet on disk. Now, the output may happen later in this case b b and b c are output to disk here. So, this is output well after that transaction has committed. Conversely b c is output here which means it is output even before t 1 has committed. So, both of these can happen in the recovery mechanism has to deal with it. Why do we say this can happen? Why not prevent why not ban this? Why not ban an updated item from being written until the transaction commits? The problem is if a transaction does a lot of updates memory is going to be full buffer is going to be full then where do you keep it? So, you write it back to the database that is the clean way. The log has to be written before the transaction commits. You can only declare the transaction committed when the committed commit record and everything up to that has been written to the log. So, typically what systems do is one of two things. By default whenever you do a commit it is written immediately at that point. Till then it may not be written, but when you do a commit the log is flushed to disk. But also periodically pages may get flushed. A few systems actually do what is called group commit. Even when you say commit they will just make you wait. It will not immediately flush the page. It will wait for maybe 20 milliseconds 30 milliseconds. And in that period maybe 10, 20 transactions may say commit. And now all of them would have written log records to the same page and one output commits all of them at one go. Instead of 10 different outputs of the same page. So, that is an optimization most databases will do. So, that is a simple log. With concurrent transactions the log records can be interspersed. You have to deal with that. Buffer can have data items updated by. Single buffer block can have data items updated by more than one transaction. But to ensure recoverability we are going to assume if a transaction has modified an item. No other transaction can modify the same item until TI has committed or abort it. If you do not do this, how do you, if you update another guy updates you want to roll back, how do you roll back? You will wipe out that other guy's updates also. So, this is a basic assumption. If you want to relax this you have to use what is called logical undo logging and we will briefly mention it later. Undo operations again are well known. Undo of a log record writes the old value back. Redo writes the new value to X that is straightforward. That is for a log record. What about an undo of a transaction? As you know an undo of a transaction has to undo all the updates made by the transaction. In what order? It has to start from the last one and go backwards. Why is that? It may update the same item more than once. If you go forward let's say it updated it from 10 to 20 from 20 to 30. If you first undo the first one it will go from 30 to 10 and then when you undo the next one it will go from 10 to 20. Whereas the original value should have been 10. So, undo always has to go backwards in time. Redo on the other hand if you need to redo always has to go forward. So, that's a basic principle. When you do an undo though there is something special which happens in this recovery and other similar recovery algorithms which are based on repeating history. When you undo a transaction in the simplest case you know transaction side rolled back or because of a deadlock it had to be rolled back. So, when you are undoing it every time you apply an undo you write out a log record ti, x and v. What is v? It is the old value which is being restored. This log record doesn't need two values because it's actually restoring an old value. We don't care about the new value it's anyway being thrown out. So, this is a special log record which is written out at the time you execute an undo operation. Moreover, when an entire transaction has been rolled back a special log record ti abort is written out. Now, what is the need for this? If you are familiar with the recovery algorithms from the earlier editions which I use in some of the books some of those don't do this. The undo says you do the undo and that said no logging happens at that point. This is a lot cleaner because what it is doing is logging every update that is done to the database. Whether the update is done while the transaction is running or the update is done because the transaction is rolling back is irrelevant. Every update is logged and it can be replayed. This aspect greatly cleans up the recovery algorithm and it's easy to show that it is correct. The other algorithms are actually very hard to show correct even. Why is this information used? No, no, this transaction which is getting repeatedly aborted is while the system is live. So, that does not need the log. The system needs to keep track of which transactions were rolled back and prevent the same one from being rolled back repeatedly. That has nothing to do with logging mechanism, that's it. So, why this information? It is used for recovery. Now, you could write TI commit instead of TI abort if you so wished and actually the algorithm will work with that. It doesn't matter. So, you can think of a thing which was rolled back as it did some work. Then the transaction commit continued undoing everything it did and committed. So, an abort and commit are actually identical in this. We just keep it separate because it may cause confusion. But these logs are purely for recovery. They're not for any other purpose. Now, when you redo on the other hand, no logging is done. So, the log is already there. You don't have to log it again. So now, this is what happens when you roll back a transaction, the undo part. The other part is if there's been a database crash, system crash and you're recovering from it, what are the actions that you need to carry out? And there are as follows. We'll just see what needs to be done. Then we'll see the sequence of steps at the end. So, basically a transaction needs to be undone if there is a log record TI start, so the transaction started. However, there is no log record TI commit or TI abort. This is an incomplete run. It was running when the failure happened. If TI commit or abort is there, it means it completed one way or the other before the failure. So, we don't have to undo it. We only have to redo it. On the other hand, it has to be redone if whether TI commit or abort is there, it will have to be redone. In fact, as we will see, you will redo everything. We don't even care whether the transaction needs to be undone or not. We will redo everything and then do an undo, as we'll see in a moment. So, note that the redo read us all the original actions, including the steps that restored old values and actually including the steps that incomplete transactions took. Everything is redone. This is called repeating history. So, here is a small example of a log at three points in time and what is the recovery action in each case? Here, T0 is not even complete. So, what does it do? We have to undo T0. So, you will restore B2000, 2000, the old value A to 1000, and you are going to write out two log records. T0 B2000, T0 A1000, and finally, you'll write T0 abort. So, this is the action during recovery. In the second case, what we will do is, we will redo everything actually, and then we will undo T1. So, what will happen is, A will be set to 950, B will be set to 2050, C will be set to 600, and then when we undo T1, so subsequently, when we roll back C, the log record T1C700 and T1 abort are written, and in this case, both are redone completely. So, everything is okay. There's also a notion of checkpoints, if you're familiar with it. And checkpoint here is fairly simple. What it does is, it temporarily hoist all updates, writes all modified buffer blocks to disk, and then writes a record checkpoint L, which contains a list of all active transactions. Now, in reality, this is fairly intrusive, saying stop all updates while I'm doing a checkpoint means that all transaction processing has to stop. So, in reality, what is used is a version of this which does not block all updates, and that's described later in the chapter. For lack of time, we won't get into it here today. The key thing is when we recover from database failure, what do we need to do? At the time of checkpoint, we have written all blocks out to disk. So, there is no need to do any redo of anything prior to the checkpoint record. So, we only have to find the last checkpoint record, and then redo actions go forward from there. Undo actions may actually have to go older also, because those transactions which are active at the time of checkpoint may have to be rolled back subsequently. So, for undo, we may have to go further back from the checkpoint to find log records to do an undo. For redo, we don't have to ever go before the checkpoint, only after the checkpoint. Is that clear? I see people are a little lost. We have some examples here. I have a figure. So, here is a checkpoint. At this point, everything that had been written up to here is output to disk. So, redo never has to redo anything before this. It only has to start from here. On the other hand, if a crash happened here at this point, then we have to undo T2, which means we have to go further back and use some of the older records of T2 to perform an undo. Now, there may be many transactions active here. We may have to perform undo all the way up to the beginning of the oldest one there. Any part of the log prior to that is no longer needed either for redo or for undo, and it can actually be garbage collected. So, what have we done so far? We have covered the concepts. We have not actually presented the recovery algorithm. So, we are going to do that now. So, what does it do? Logging is as we described. Transaction rollback is also as we described for a single transaction. The only thing is that when we are rolling back one transaction, we are only bothered about the records of that transaction. When we are rolling back all active transactions, the steps are going to be similar, but slightly different because many things are happening, many rollbacks are happening at the same time. I will come to that in a moment. But note that the transaction rollback will stop once the TI start is formed. As you are going back, when you hit TI start, that transaction rollback is done and it writes the TI abort record. So, now, for recovering from failure, there are two phases. In the redo phase, we replay updates of all transactions, whether they are committed, aborted or incomplete is irrelevant. Everything is redone, as I said. So, how do we do that redo? We will start from the last checkpoint record. We are going to do a little more bookkeeping. We are going to set the variable undue list to L, all those transactions which were active. Then, we will scan forward. Whenever we find a record of this form, we are going to redo it because we are redoing everything in this phase. We do not care whether the transaction committed, aborted or incomplete. Whenever the TI start is formed, we are going to add TI to undue list because it has started. We do not know whether it needs to be undone or not at the end. On the other hand, if we find the TI commit or abort, we know TI is done. We do not have to undo it. We know it completed, whether it successfully or aborted is irrelevant. It is over. So, we will remove it from undo list if it is present in that list. At the end of the redo pass, what is going to happen? Every action done initially has been redone and the undo list consists of only transactions that had not committed or aborted. These are the only guys which have to be rolled back. So, the next phase, the undo phase, we are going to start from the end of the log and go backwards. Whenever we find the log record, TI, XJ, V1, V2, where TI is in the undo list, we are going to roll back this log record. How do we do it? We perform the undo by restoring the old value and then we are going to write a log record, TI, XJ, V1 with the old value. Then, whenever a log record TI start is found, we are going to write TI abort. So, we are done rolling this back and we will remove TI from undo list. How far do we go? We will stop when undo list is empty. That is when TI start has been found for every transaction in the undo list. This stop may happen after the checkpoint. It may happen before the check. I have to go past the checkpoint also. After this, everything is done. Normal transaction processing can begin. Although many systems will perform a checkpoint at this point. After recovery, they will do a checkpoint so that if there is a failure after this, you do not again have to do recovery. It is not essential, but many systems do it. Any questions? Yeah. So, the intuition is the following. The redo phase brings the system to a state where it was just before the crash. This is predictable. We know exactly what had happened and at this point, we can meaningfully do the undo. Now, you might ask, why do not we optimize? Why are we doing redo for transactions which anyway are going to be undone shortly? There are reasons for it which will become more clear if you see the logical undo which we are not actually looking at. But even if you do not have logical undo, it is simpler to just redo everything because otherwise we do not know. When we see a record, we do not know whether it completed or did not complete. We do not know. So, it is easier to just redo everything without bothering because anyway the undo will undo the effect of the redo. So, we spend a little bit of time doing a redo which will then be undone. It does not cause any harm. It is much more efficient than actually finding out whether the transaction completed or not. Then you will have to do two passes. You go to one path, do one more pass for such more work. Any other question? So, this is the basic recovery algorithm. We have an example of recovery. So, this is the log. These are all the records in this order from top to bottom and a crash happened exactly here and this shows what would happen. T0 has completed aborting. So, it is a complete transaction. We do not have to roll it, undo it anymore. On the other hand, T1 committed while T2 started but did not complete. So, only T2 has to be rolled back. It is the only one in the undo list. So, in the redo pass, we redo everything. At the end, here is a checkpoint with T0, T1. T1 committed. So, it is removed from the undo list. T0 aborted. It is removed from the undo list. So, at the end of the redo pass, T2 is the only thing in undo list. So, only T2 has to be rolled back. So, if you go back in the log, the first thing for T2 is this one. Therefore, we add, we roll it back. I mean, we undo this and add a record T2 A500. Next, we find T2 start. So, we write T2 abort and delete it from undo list. So, undo list is now empty. So, the undo pass stops here at this point over here. It does not even go up to the checkpoint record here. But if T0 did not was incomplete at the end, then the undo pass may have to go all the way up to T0 start. Okay. Now, I mentioned early lock release without holding a lock in two-phase manner. And that causes all kinds of problems for recovery. And how to deal with it is described in the section called recovery with early lock release. Now, any of you who went through the details of the advanced recovery algorithm from the previous edition, this is basically the same algorithm. It's been slightly cleaned up and explanations are better. There are more examples and so on. But conceptually, it is the same thing. And ARI, this was there already in the earlier edition. We have not changed it. It's a recovery algorithm which is very widely used in industry. We went in steps to help you understand what is going on. ARI does all of this more or less. But it has a number of other optimizations to speed up recovery, which we omitted to keep things simple. But the book describes some of the optimizations that ARI does. Again, this is the same as the previous edition. And then there is some stuff on remote backup systems, which I hope all of you have seen from the previous edition. If anyone is not aware of this, I'll spend one minute. Otherwise, I'll skip it. Anyone wants to talk about it? Are there any questions on the presentations? Is there a way to find out what recovery algorithm and database uses? Your best bet is to look at the system manuals or search on the net. That said, almost everybody uses ARIs. ARIs is kind of the standard. I think it's old enough that it is out of patent. So everybody uses or they have licensed it one way or the other. Postgresql used a slightly different recovery algorithm originally. But that was found to be a bit slow. So I think they switched to ARIs at some point. I'm not 100% sure of it. Can you experiment with Postgresql? It's easy to... It's hard to find out what it does for recovery by experimenting with it. The only way is to look at the source code or to look for manuals on the net. Postgresql has decent documentation. So you'll find manuals on the net which explains the details.