 So, first of all why do we need recovery systems basically recovery is needed to deal with failures. In an ideal world nothing fails everything runs to completion, but in the real world everything fails sometime or the other. So, we need a classification of failures and then how to deal with them. So, at the software level a transaction might fail because of some internal error condition. For example, integrity constraint is violated that is an error you have to roll back the transaction. It could be due to a system error for example, there is a deadlock certain section has to roll back that is a failure and to deal with the failure you need to be able to undo something. Then you can have a system crash where power goes off or the operating system has an error detects an error and shuts down the system so that is another kind of error. Then you also have disk failure where disk actually fails and you lose data. Now, there are a few assumptions one has to make in order to do anything to show that recovery mechanism works. For example, if you have a kind of failure which replaces a database state by some earlier database state including wiping out all the old logs and just replaces everything with an old database state. Now, if you run recovery it will say this is a valid everything looks valid and this is the final result of recovery. That is an error which has caused serious damage it has completely changed the database and it would be essentially impossible to recover from such an error. So, such errors which do really evil things are called sometimes they are called Byzantine errors. In general, an error which does really evil stuff can be impossible to recover from, but luckily they are there they do happen sometimes, but since anyway there is not much you can do what all recovery algorithms do is they assume a fail stop assumption. What is the fail stop assumption? If there is any kind of failure the non-volatile storage contents are not corrupted what do we mean by corrupted? Well, the database did certain rights to disk those are done now there may be some inconsistent state which we will use logs to recover that is not the issue, but if there is a serious corruption which completely changes the state to some other state that would cause problems. So, the assumption is this will not happen now if you write your code very haphazardly it can very well corrupt pages of the database bug in your database code can completely mess up the data on a page. So, what do we do about this? The answer is that database systems like other you know critical software have a lot of checks built in they will have a lot of assertions and internal condition checks. So, if they detect any problem before writing a data to disk they will make sure it is consistent if there is a problem they will not write it and the system will crash effectively it is a it will stop. So, the fail stop assumption is enforced by careful programming it is not guaranteed, but most of the time when there is an error there will be the system will just stop rather than write corrupted data to disk. What about disk failure? It is destroyed data. Well first of all any corrupt if a disk fails and modifies a few bytes that is detected using checksums. If a disk fails and makes data unreadable what do you do? Well you use a rate system. So, that there is another disk with that data what if the data center burns up? Well then you have another copy of data somewhere else. So, there are other ways to deal with disk failure, but recovery mechanisms also have some support for backing up data from disk to tape restoring and recovering. We would not get into it here. So, recovery wise recovery needed again the basic problem is one of the basic things is atomicity which if a transaction was transferring money from one account to another. If it fails after updating the first account, but before updating the second you have to undo its updates. So, recovery algorithms in general have two parts. One part is actions which are taken during normal execution. So, while the system is running normally they are usually doing various things including writing a log of what all happened and so forth. And then there are a set of actions which you take when you recover from failure. So, first of all when there is a failure you detect the failure that is a fail stop because power failure fine the whole thing goes off. Other kinds of failure the database stops and then somebody goes and restarts the database or restarts the machine when power comes back up. At this point certain extra steps have to be taken to bring the database to consistent state. Why is that? Remember while a transaction is running it is updating various things in memory. Some of those are written to this some or not. If you look at the contents of the disk at the time failure happened there could be two problems. One is certain updates done by committed transactions are not on disk. The other problem is certain updates made by uncommitted transactions may have been written to disk. So, there are two issues in recovery. One is you have to do undo certain actions which did not commit. The second is you have to redo certain actions which were committed, but were not written to disk before the failure. So, undo and redo are two important steps in recovery. Storage can be classified as volatile or non-volatile. Volatile is main memory for example, non-volatile is flash or disk. There is one more level in the hierarchy which is called stable storage. Stable storage is basically a form of storage which survives all failures. Of course, it is mythical. You cannot actually survive all failures. If somebody let us say those of you who watched the movie 2012 the whole world except a part of Africa is destroyed in that movie. They had great fun destroying every city and every country in the world very funny movie. But if that happens can you guarantee that you will preserve data in spite of 2012 happening? Probably not unless you know which part of the world will survive and keep your data there. So, it is not practical or tomorrow if a black hole comes and swallows up earth everything is gone who cares. So, anyway you can approximate it though to survive pretty much all failures. And the way it is done is first of all you keep multiple copies on different disks. More important you will keep copies at a remote site. So, if there is a fire or earthquake here a copy is safe somewhere else. There are more details on how to implement it we will skip that. Again some terminology which we will use occasionally physical blocks we use the term physical blocks sometimes we refer to blocks that reside on disk. Whereas a buffer block is this same block which has been read into memory may have been updated also after being read in. Now the terms input and output we are going to use primarily to mean inputting meaning reading from disk output means writing to disk. So, we will use sometimes I will say write to disk read from disk sometimes I will say input or output. And input or output is in units of blocks. So, the block is read from disk to buffer manager or from buffer manager it is written to disk. This is how all communication with the disk happens. You will never go read an individual tuple from disk that is not how databases work. You will read the block containing the tuple into the buffer and then access the data from the buffer. So, here is a diagram which shows these operations. Here is the buffer with multiple blocks input gets a block A into the buffer. The buffer manager remembers that this part of memory corresponds to the data read from block A of the disk. A B probably you know disk identifies like here is a block of the disk or here is a file ID and an offset in the file or whatever. Now, if transaction wants to read a particular data item X the first step is it will check if the block containing X is in the buffer. If it is not it may have to input it from disk. Then it can get the value of X and copy it into its local space. And it can do whatever it wants in its local space. We do not know what it does. Your Java program fetch the data and is doing whatever it wants. When it is ready to write something back it does a write. So, it writes a data item back. Where does the write happen? The write has to happen to the buffer block. What is the buffer block is not there in the buffer now? Well, the buffer manager will first fetch that block. Then you can write to that data item on that block. It is now written to the buffer manager. Other transactions may be able to see it, but there is no guarantee it has gone to disk yet. So, to actually write it to disk the output operation writes the whole buffer block back to disk. So, you have to issue a you meaning the recovery manager will issue an output operation at some point to copy the data from the buffer on to disk. So, this is basically how data access happens. So, we are going to use read x to indicate that the value of the data item x is read into some local variable. Write x takes the local variable and writes it back to the database. Again, this reads and writes like for concurrency control are probably going to be executed by the SQL engine. But the SQL itself is issued by your Java program or whatever above. Now, again note that just after a write an output need not happen. Most databases would not immediately output a block which has been written. They will do it later. They will often do it periodically. They will go through all the blocks which have been written at some point recently and output all of them one at a time. But it may not happen immediately. Read should happen at whatever point. Write we are going to assume can happen at any point when the transaction runs. Now, some of the concurrency control mechanisms will not perform a write while the transaction is running. They will only work with local memory and do the write at the end when the transaction is ready to commit. But the recovery mechanism we use is general purpose. It should work with any of those. So, in our model write can happen at any time while the transaction is running. If you know writes will happen at the end, you can actually simplify the recovery manager. Now, there are several ways of implementing the recovery manager. We are going to focus on log based recovery which is widely used. There is another alternative called shadow paging which is not used as much. Well, it is not used as much in databases which need high concurrency. But the concept is actually used in some other places where for example, if you edit a file, most editors will keep a backup copy of the file somewhere. And when you save the file, it deletes the backup copy. So, the idea is if you do not save the file, it will restore the file from the backup copy. So, there are various things like that which are used in the real world. So, shadow paging is widely used in many applications. But large scale database systems, there were some attempts to use shadow paging, but there were some performance issues. So, it is not used as much. What is shadow paging? It is not got into details, but it is the basic idea is you keep a copy of the data to reuse in case the transaction fails, you can restore from the copy. Now, let us focus on log-based recovery. So, a log is a sequence of log records. Each log record has some content and there are several types of log records. So, if you physically look, we are showing it in an abstract way with angular bracket, ti star and angular bracket, ti, xj, v1, v2 and so forth. So, this is pretty notation which makes it easy for humans to read it. In reality, in the log, there would probably be a log record format with the first few bytes saying what type of log record it is, what is the length of the log record and then the fields of the log. So, what are the log records we are going to use? The first one is a start transaction log record. So, when a transaction starts, it is going to output a ti start. And whenever it issues a write to the database buffer, before it issues the write, it is going to do the following. It is going to output a log record ti, then the identifier of the data item x, v1 and v2, where v1 is the value before the write and v2 is the value after the write. What is the need for this particular log record? Well, supposing the transaction fails, you have to undo its effects, but that write has already happened to the buffer block. How do you undo the effect? The only way to undo it is if you have the old value of the data item, which is v1. So, the recovery manager will use this log record and restore the value of x to v1 from v2. So, that is how a log record like this is used. It may also be used to similarly do an undo of data which is already on disk. We will see that later. So, this keeps happening. And finally, when ti has finished all the updates that it wants to do, it writes a log record ti commit. What about read operations? Do we need to log read operations? The answer is no. We only care about updates because if you want to either undo partially done things or redo things which were not written to disk, all that we need are the updates. We do not care about the reads. Now, there are two approaches for logging. One of which uses immediate database modification, which means at any point before the transaction commits, it can update the database. That is the model we are using. There is another model which I told you. Some concurrency control mechanisms will collect all the updates in local memory and do it only at the end when the transaction is ready to commit. That approach is called deferred database modification. If a system does deferred modification, we can actually simplify the logging protocols we discussed. But for lack of time, we will not get into it here. We are going to focus only on. So, there are a few assumptions we will make, which will, some of which will be simplified later. First of all, as I said, immediate modification allows updates of an uncommitted transaction to be made to the buffer. Moreover, it even allows that uncommitted update to be output to disk if required. If the buffer is full, the buffer manager may output uncommitted writes to disk. So, the recovery mechanism must be able to undo these uncommitted writes later if the transaction fails. In order to do that, the log record must be available. The log record which we saw before, which shows the old value of the data item. So, what we are going to do initially is assume that when you write the log record, it is going to be output immediately to stable storage. So, it will not be lost. It is in stable storage and output is immediate. In fact, for performance reasons, recovery algorithms do not immediately write the log record to stable storage. What they do is they will collect log records for some time and then write it. But there are certain rules on when log records have to be written to stable storage and we will see those later. For the moment, we will just assume that as soon you should write it, it goes out. So, that is the output of the log record. Now, what about the actual buffer blocks which were updated by a transaction? As I said, we are going to assume that the buffer block may be written even before the transaction commits. Why? Well, what if a transaction updates a lot of data? If you force the updated values to be in buffer and not written out, the buffer will get full. Where do you write it out? It is more complicated. So, most databases will allow uncommitted updates also to be written to disk. The other side of the picture is what if a transaction has done an update and are you going to make sure the update is written to disk when it commits? So, one possible protocol is when a transaction says it wants to commit, you output every single block which it updated to disk before you allow it to commit. Certain recovery mechanisms do this, but it turns out that there is a fairly high overhead to doing this. Think of a transaction which has updated five data items. To allow it to commit, you have to do five different writes to disk. Moreover, until it commits, you cannot release its locks, which means if it has locked a widely accessed data item, the next transaction cannot run until all these updates are written to disk. So, then system performance becomes quite bad. So, what we are going to allow in real systems also do this is that when a transaction commits, whatever it updated need not be written to disk. It can be kept in the buffer for some more time. It will be written sometime later. We do not know when exactly. Then what do we mean by the transaction is committed? Its updates are not in disk. They are on still in the buffer. If the database system crashes, then in the buffer, they were not on disk. So, how can we say a transaction is committed? The trick is that we are going to write enough log records to the log which have the values written by that transaction. So, if there is a failure, the system comes back up. We have the log and we can actually redo all the effects of the transaction using the log so that we make sure that all the updates of a committed transaction will get reflected back in the database before the database continues when it recovers. So, what I have just said, if a transaction commits, its updates may not be on disk. If there is a failure when the database comes up, the first thing it does is it read us all the updates to make sure they are all there in the database before proceeding. So, that is how we can say that the transaction has committed. Some other issues. Since anyway we allow blocks to be output at any point later, there may be output in a different order. The transaction may write block 1, block 2. There may be output later as block 2 and much later block 1. You may have transaction T1 and T2 running. T1 ran first, updated something. T2 then ran, updated something. It may happen that the buffer blocks updated by T2 are written out to disk, but those written by T1 are not on disk. So, what this means is the disk contents are actually quite a bit of a mess. They can have some updates. They may not have others. Actually quite hard if you think about it. If the disk contents can be such a mess, how do we recover? And luckily the answer is we have the log. And even though the disk is a mess at the time of a crash, we can use the log to bring the contents of the disk back to a consistent state. Consistent meaning. We will undo the effects of any incomplete transactions and redo the effects of all transactions that completed. So, what all recovery mechanisms allow is the disk to be a mess, but it will bring it back during recovery. So, now transaction is declared to be committed when its commit log record is written to the log on stable storage. Remember, transaction writes a start, writes a log record for every update it does and then a TI commit. So, when TI commit hits the stable storage, that is exactly the point when the transaction commits. Note also that we wanted atomic transactions. This one operation of writing the commit log record to disk is actually the one atomic action which commits the transaction. Supposing failure happened just before this log record is written, the transaction will be rolled back later. If failure happens just after this is written, the transaction will be redone. So, even though the transaction updates many data items, its commit is still atomic and another issue is, why do we allow, you know, modified buffer blocks to be written out later? One reason I said is that commit will be delayed otherwise. Another benefit is if the same page is written by multiple transactions, you can allow all of them to update it and then one write to disk at the end is enough to propagate all these changes to disk instead of outputting it once for one transaction, again for the next, again for the next, again for the third and so on. So, it reduces the number of outputs. So, what am I doing, what have I done so far? What I am doing at this point is telling you some properties that any recovery algorithm which we want to use should have and I am giving you a few hints about how recovery works. I have not given you a full algorithm yet. I am going to continue for the next few minutes explaining some of the properties of algorithms, some of, you know, some details of how the log records are used. After introducing you to all these concepts, I am going to actually give you a recovery algorithm. So, that is still a few minutes ahead. Now, let us see an example of recovery using logging. So, what is the log here? A log is shown as a sequence of log records going from top to bottom. So, we have a T naught start, then T naught updated A. The old value of A was 1000, the new value is 950. Similarly, it updated B from 2000 to 2050. At this point, the update has been logged, but it has not yet been written to the database buffer. After this, the transaction also wrote the updates to the buffer. Note that the write of A has to come after the log record for A. It could come immediately after the log record, or it could be a little while later like this. Similarly, B is also written to the database buffer, but it is not output. Output is when it goes to disk. So, at this point, the updated A and B are purely in the buffer. They are not on disk. Now, T 0 commits. Note that these two items have not been written to this. They are still in the buffer, even though T naught has committed. But all these four log records are now on stable storage. Then, T 1 starts. It updates yet another item C from 700 to 600, and then it writes it to the buffer. Now, the database system outputs two blocks to disk. What are the two blocks in this example? The block containing B and the block containing C. You will notice that up to now, none of the updates of T 0 have gone to disk. At this point, one of the updates of T 0 is on disk. The other of A is not yet on disk. Furthermore, at this point, one of the updates of T 1 is on disk, even though T 1 has not committed. Finally, T 1 commits. At some point, the database also outputs B A. It has to eventually do it. So, this is the situation. So, if a crash happens at any point here, the state of the disk is messed up and the log has to be used to recover it. Now, any recovery mechanism has to deal with the effect of concurrent updates. So, recovery algorithms are tied somewhat to the concurrency control mechanism which is used. We are not actually going to make any strong assumptions, but there is one very, very important assumption that whatever concurrency control mechanism we use will not allow an uncommitted update to be read. So, we can enforce that by using strict two-phase locking. In fact, not only uncommitted things to be read, but furthermore, when there is an uncommitted update on a particular data item, it is not going to allow another update to the same data item by another transaction. Why is that required? Think about this. Supposing T1 update, say, from 5 to 10, then T2 update, say, from 10 to 15 and commits, T1 is done, but now T2 is done. Now, T1 has to abort. T1 updated the item from 5 to 10. Now, supposing it rolls back by restoring the item from 10 to 5, what about T2? T2 took it from 10 to 15. It committed, but if you restore that value back to 5, you are removing all the updates done by T2. So, in order to allow undo by restoring the old value, we cannot allow a same data item to be updated by two uncommitted transactions. There can only be one uncommitted transaction which has updated the data item. Only after it commits can another one update the same item. In other words, there is no point in dealing with cascading, abort and so on. It is too complex. Now, you can extend the recovery mechanism to deal with such things, but it is just too complex. So, we are going to make this assumption and it simplifies recovery a lot and it is quite reasonable. In any case, you want your system to support at a minimum strict two phase locking or equivalent. So now, there are two operations using log records. Both of them use the same log record which identifies the transaction, a data item X and the old and new value. The undo operation writes the old value that is V1 to the data item X. The redo operation writes the new value which is V2 to X. So, with the same log record, we can either do an undo or a redo or we can skip it during recovery also. That is for a individual log record. What about a transaction? There are certain transactions which we are going to redo because they completed and we have to make sure all their updates are going to, you know, maybe their updates were not output to disk. So, we have to redo everything to make sure their updates are done. So, what redo TI does is, it takes the log records for TI going forward and applies each of those operations. So, it read us every log record for TI. So, every time it finds a log record for that transaction, it will write that value to that data item X. What does it mean to write it? It is crucial. It is there in the buffer, it is fetched into the buffer, updated in the buffer, blah, blah. So, that is what redo does. However, when the redo is executed, you do not do further logging. There is no need. The log record is already there. So, that is redo. Undo is slightly different. Undo is going to take a transaction which had run partially and then undo everything which it did. So, how does it undo what it did? There are two parts. The first is that when it gets a log record to undo, obviously it sets the old value back. But an important part of undo is it should go in reverse. That is, it should start with the last log record of the transaction and move backwards. Why? Let me use the whiteboard to extract. Transaction T1, which wrote the following log records. T1, some x from 5 to 10. That is the first update. Then it updated x again. So, T1, x 10 to 15. So, T1 start. At this point, T1 has to be undone. So, supposing we go forward in the log record performing an undo, what is the first thing going to do? It is going to write the value 5 to x, which is actually the correct value. So, that seems okay. Going forward, the next one will write 10 to x. And then we say we are done. But hey, there is a problem. The value of x before this ran was 5. It was not 10. So, the undo has done the wrong thing. In contrast, if you go backwards from the last log record, see what happens. First of all, x is said to 10. Then when we come to this log record, x is said to 5, which was the original value before T1 ran. So, undo has to go backwards. In contrast, if T1 committed, the final value should be 15. So, redo has to go forward. When you go forward, redo will set x to 10. And then it will set x to 15. So, that is how a transaction is undone. But in fact, there is one more step in our recovery algorithm for an undo, which is shown here. Every time a data x is restored to its old value v, we are going to write a special log record of the form Tixb. Now, what do we need to do this? We are undoing it. Why do we need to write a log record when we are undoing it? To understand this, think of undo as doing the following. You have a transaction. It was running. It set x from 5 to 10. Now, supposing an undo is done internal to the transaction. The transaction says, I am going to set x from 10 to 5. So, I am going to issue one more write from 10 to 5. And then I commit. So, what is the transaction done? Instead of aborting, it did further updates, which restored everything back to the original value and then committed. Recovery is going to essentially do this for any incomplete transaction, which is being rolled back. It is essentially doing this. It is going to carry out operations, which restore all the updated data items back to the original value. And then it is going to write a special log record, which says Ti abort. For all practical purposes, Ti abort and Ti commit are almost identical. Both of them say that the transaction is done. The only difference is when you roll back, the database system is now issuing a bunch of updates, which physically bring the values back to the original value and then completing. Why do these log records need to be written out? Well, supposing after you do the undo, the database crashes. When you come back up, if you did not log all this, how do you guarantee that the undoes were actually reflected on the database on disk? You cannot guarantee it. In contrast, if you write a log record, which records the undo being done, after you undo the transaction, if the database crashes immediately after the undo, when recovery comes back up, what it will do is it will find the log records of the transaction. It will find that the transaction finished with a Ti abort. Now, instead of figuring out how to undo the transaction at this point, it is actually very simple. It will treat this transaction just like any other one which has to be redone and it will go forward and redo all the log records, which of course looks a bit silly. So, there is going to be a log record, which sets x from 5 to 10, then this special log record with no undo information, which sets x from 10 back to 5. It does both. You may say this is stupid. Why are you doing both? Part of the answer is, well, we could skip the first one if we know what we are doing. The second one, we cannot skip, because what if the 10 had been written to the database? We have to do the second part, which is set it back to 5, in case the update to 5 was not on the database at the time of clash. So, essentially an undo, when it finishes, turns an incomplete transaction into a completed transaction and recovery will treat completed transactions identically, whether they are actually committed or whether they are bought. Incomplete transactions are what need more work. So, let me again use this slide to reiterate what I said. When you recover from failure, a transaction needs to be undone, if the log contains Ti start, but does not contain either Ti commit or Ti abort. So, these are the incomplete transactions, which have to be undone. On the other hand, if the log contains Ti start and either commit Ti commit or Ti abort, it is identical, the transaction needs to be redone. In fact, in the recovery algorithm, which we do, which we are going to see, first of all we redo everything, including incomplete transaction, everything is redone. After everything is redone, we are going to do an undo for incomplete transactions. So, repeating history is the term used to indicate this mode of recovery, which redos every action that happened before crash. Well, every logged action. Every logged action before crash, including operations done by transactions that ultimately aborted, everything is redone. So, it is just repeating whatever happened in the past. So, they say that, if you do not learn history, you are condemned to repeating. So, that is the historian's view that repeating history is a bad idea. You are going to make the same mistakes. But here, it appears that you are making the same mistakes and then also undoing those mistakes. But it is actually very, very useful. Repeating history is very useful because we are able to, you know, in the end, show that the recovery algorithm does things correctly, which is actually much harder to show in general if you do not repeat history. This may not be obvious from the presentation here, but believe me, it is a very, very useful mechanism. And earlier editions of a book had algorithm which do not repeat history. Some of them are used in certain specialized systems, but by and large, every database today uses algorithm called Aries, which is one of the fundamental concepts of Aries is that you repeat history. So, here is another example of how undo and redo would happen. So, here is a log, P naught starts. It updates A and B. Let us say that a crash occurred right at that point. What would recovery need to do? So, in this case, first of all, we have to, with repeating history, we are actually going to redo everything, which will actually set A to 950 and B to 2050. After that, it has to undo T naught. So, what does undo T naught do? It sets B to 2000 and A to 1000. Those are the two log records which have to be undone. So, it writes out these two log records. It says the first log record is T naught B 2000. That is set B to 2000. The next one is T naught 8000, which is set A to 1000. And finally, it writes out T naught abort. So, that is the undo part. Now, the next one is what if T naught had actually completed, it committed, but then T1 started and before it committed, what happens in this case? In this case, T naught has to be redone and not undone. With repeating history, this initial part of T1 is also has to be redone. So, the initial part, everything which is in the log is actually redone in the first phase, as you will see. But after that, the undo of T1 happens. So, how do you undo T1? Well, redo first of all sets A to 950, B to 2050 and C to 600. Then, the undo restores C back to 700 and writes out the log record T1 C 700 and then writes T1 abort and it is done. And the last one is where both the transactions committed. In this case, both are redone. So, that is the basic idea. There is one more notion called checkpoints, which basically, it notes the following point. Our algorithms for recovery are going to start from the beginning of the log, as we have seen so far and they have to perform redo and then some undo. Now, if you run the database system for a long time for 10 days and then there is a crash, if you have to use the entire log to do recovery you have to start from the log 10 days ago and redo everything. Now, recovery will be very, very slow because the log is very big. Moreover, the disk is going to get full because you have a very, very big log. You have to free up space from the log. In order to do so, recovery algorithms include an operation called a checkpoint. A checkpoint basically does the following. It writes out certain data in particular the form of checkpointing we use will write out all modified buffer blocks to disk. After the checkpoint is completed, the entire buffer is clean. There is no more data in the buffer, which has been updated but not yet written to disk. What is the point of doing it? Well, the point of doing this is you do not have to use any part of the log before the checkpoint for doing redo operations. That part is irrelevant. You do need a little bit of the log before the checkpoint if you need to do an undo. Let's say at the time of checkpoint, three transactions were active. One of them committed two failed. Those two have to be rolled back. So, you actually need a little bit of the log before the checkpoint in order to roll back those transactions. But for redo, for all the log records before the checkpoint their updates have been written to disk as part of the checkpoint. There is no need to redo those. So, that's basically how you avoid redoing a huge part of the log. You periodically do checkpoints. So, the redo operation uses only a small part of the log. The undo may have to use a little bit more but you can keep track of what is required. And you can actually delete older portion of the log which are no longer required. So, the log size can be kept manageable. So, most database systems will automatically manage the log this way. They will do check pointing internally and delete old parts of the log. Now, on some systems there is a feature called archive log. Oracle has this. What archive log does is it will keep the log around. Even though it's no longer required for recovering in this mode the reason it does keeps that around is supposing you took a tape backup of the database five days ago. And then the current disk is totally destroyed. The rate system is destroyed. Now, if I take the backup from five-day-ago tape, I need all the log records since five days ago to recover. So, that's one reason database systems can be told to keep old logs. But by default they will delete old log records as soon as they are not needed. So, what is this checkpoint? As I said, the checkpoint has to output all modified buffer blocks to disk. It also outputs all log records that are currently buffered in memory to disk. And then it writes a checkpoint log record of the following form. It says checkpoint L where L is a list of all transactions that are active at the time of the checkpoint. Why is this required? Because we need to know what transaction is active and anything which is active its log records are still required. Older ones can be deleted but these log records are required. And we are going to see how we use this information in a little. Now, note that while we are outputting modified buffer blocks here no updates are permitted. Nothing else can run while the checkpoint is running. This is actually fairly intrusive it blocks all access to the database. In reality, checkpoints are not done exactly like this. They are a little more complex but they allow updates to parts of the database while the checkpoint is running. For lack of time, we will just briefly mention it later on. So, basically during recovery as I told you all this verbally but it is there on the slide now we can find the last checkpoint which is there. So, we stand from the end of the log to find the last checkpoint and that has a list L. So, only transactions that are in L or started later are potentially incomplete. If something is not in L but started before it cannot be incomplete. Because it is not in L that means it either committed or aborted and went away before the checkpoint was taken. And all transactions that either committed or aborted before the checkpoint already have all their updates output to stable storage. So, there is no need to do any redo operations for those. So, what part of the log is actually required? Well, once you find the checkpoint L we have the list L. Continue scanning backwards till the record TI start is formed for every transaction TI in L. This is the earliest part of the log which is required. Anything older than this in the log can actually be deleted. Here is an example of these are the transactions T1, T2, T3, T4. Each of these ranges indicates a range of time when the transaction is executing. In this case it is a serial execution as you can see. If you took a checkpoint here what is the active transaction list it is T2. Now, at this point systems the system failed. So, if you scan the log backwards the last checkpoint is here. Its active transaction list is T2 and going backwards you will find the start of T2 here. So, earlier things are no longer required. The log redo will actually start from the checkpoint.