 You are saying there are multiple users concurrently accessing a remote database from mobile devices. How do you ensure concurrently? There is nothing deep about this. We have just looked at concurrency control protocols and all the transactions which run can come from anywhere. They can come from the web, they can come from mobile devices, we do not really care. The protocols can be used regardless. Now, for specific situations, you know, variants of various protocols are used. For example, there are many systems which use variant of, you can think of it as a variant of optimistic concurrency control which is used by Hibernate and by many other projects. So, they are essentially doing a form of optimistic concurrency control which is easy to implement using timestamps which are stored in tuples. So, there are variants which are used in the real world depending on the application. So, this particular variant is quite useful when you have a transaction where you return data to a user and then after sometime the user comes back and says, okay, now here do this update. Now, this could take a long time so you cannot keep locks, but timestamps can be useful in such a situation and you are basically doing some form of concurrency control outside of the database using timestamps. So, the Hibernate object relational mapping system essentially uses this basic protocol which can be viewed as a weak form of optimistic concurrency control. There are details in the sixth edition of the book if you are interested. I do not have time to explain it in detail. So, with deadlocks is there any possibility for starvation? Yes, if you have a deadlock and then you have to roll back somebody, if you are not careful you might end up rolling back the same transaction over and over again. So, you have a possibility that transaction comes in, other transactions come in, there is a deadlock, this is rolled back, it restarts, now meanwhile other transactions come in, again deadlock, this is rolled back, so theoretically this could happen. Practically speaking to avoid this many databases will have a preference on which transaction to roll back. So, if they have a choice they will not roll back transactions which were submitted earlier. They will roll back transactions that were submitted more recently. So, the oldest transaction will get through, it will never be rolled back. I hope that answered your question. Let us move to some other center. We have NPR college. How the concurrent transaction happened in banking technology sir, can you elaborate sir? The question is what are, how are concurrent transactions handled in banks? So, first of all you know banks have transactions which are for individual accounts, but there are transactions which even if you are doing individual accounts, that transaction also updates may be a variable which is the total cash available at the branch and other such things. So, there are chances of multiple transactions by this different users clashing. Now, if you are doing transfer of money between users directly within the bank that can also have concurrency. So, concurrency issues do arise in banks and they are handled by standard concurrency control techniques which we have already seen. The other issue is transaction processing which happens across bank branches. Now, this is an important category. So, in early days we would get a bank draft even today it is done. You issue a bank draft and post it to somebody they go and cash it in another bank branch. What is happening here? It is actually a distributed transaction across two banks across two branches. Today there is a different infrastructure which is increasingly used which is the NEFT national electronic fund transfer or variant of that called RTGS which gives more immediate transfer of money. So, these things are all electronic, but the basic principle is the same. You have a transaction running at one site which generates you know a draft which could be electronic in effect and deducts the balance and then that draft is sent over a messaging system instead of post it goes over a network electronic network and then the other site receives it and credits the amount. So, that is the same principle as bank drafts. So, this is part of not exactly concurrency control, but this is part of transaction processing across multiple banks. So, this thing is actually an example of what is also called persistent messaging. So, let me use the white board and so, the electronic bank drafts are essentially persistent messages, persistent in the sense that once the draft is created, electronic draft is created at my branch, it will be delivered at the other site even if there are failures in between the database guarantees it, how it is implemented is there are some interesting techniques. Essentially, this message is stored in a relation in the database. So, there is a in database transaction which deducts my account and creates this message in a another relation in the database. Then there is a process which takes scans that other relation and picks up messages from there and emails them or whatever other network protocol is used, sends them to the other bank potentially or another branch, well another branch is less of an issue these days because of core banking all the branches of most banks nowadays run of a single centralized database. So, it is mostly for other banks and there they make sure that duplicate delivery is detected and you know they will acknowledge it until it is acknowledged the source bank will keep resending it so that if a message is lost that does not mean money is lost and eventually the amount will be credited on the other side. Does that answer your question? Sir, yes sir another question sir, sir dbms concept usage in telecommunications sir, dbms concept usage in telecommunications anything. So, database concepts used in telecommunications, databases are very much used in telecommunications every call that you make is logged in database not only calls but you know the network state what is connected to what are the you know what is the capacity all the whole network state infrastructure everything is in various databases and then there are monitoring systems which track what is going on in real time. So, there the data may not be stored in a database but it is streamed in. So, it is a major data management problem so this aspect is called streaming data and you take data streams and that is used in a monitoring center to see what is going on if something fails you want to rectify it as soon as possible. So, a lot of these concepts are very much used in telecom and then there is a whole issue of billing for calls and discounts and keeping track of how much data you have used. So, there is a huge amount of data based technology of various forms used in telecom networks. We have Bharati Vidyapit, Ravi Mumbai go ahead. Sir, could you explain me why the execution time of a query in Vibhantu was very less than Windows? The execution time is less in Ubuntu than in Windows I would be surprised. The query execution time is a function of the database system and the hardware it should not be majorly dependent on the operating system. Now, if you actually ran a query on Postgres on Windows and Postgres on Ubuntu you know I am not sure why that could have happened. I do not think the OS has a major role but it could be an some impact of what other processes were running and how much memory was available and so forth. It is hard to tell without looking at the specific instance but it should OS should not have such a major impact. Sir, another question. Could you please explain various steps in detail for query optimization, sir? So, lack of time I just gave you a high level view of query optimization but I think I did cover the join ordering algorithm how to choose optimal join order that is score to most query optimizers. There are more complex query optimizers used for example in SQL server but the optimizer used in most other databases revolves around join order selection plus a bunch of heuristics for pushing down selection and then other heuristics for dealing with group by aggregation, union and all the other and then of course nested subqueries. I also explained how nested subqueries are handled by decorrelation. So I think I have given you the big picture of what goes on in most typical database systems. So, a query comes in, some rewriting is done to simplify it for example, decorrelation. Then if you have a query with nested query, hopefully decorrelation would have broken it up and it is no longer nested but it still might be nested and then join order optimization is used in each block separately typically and then each block is separately optimized and then the plan executes the different blocks. Now SQL server has a much more clever optimizer and that is much more complex, I would not try to cover it here. Does that answer your question? Sir, what is different within immediate and deferred modification and what will be the effect of redo and undo in both of the modifications? That is a question very relevant to the recovery chapter which we are currently covering. I will answer that one in reasonable detail. So in deferred modification, what happens is that the transaction does not actually perform the write on the database, not even on the database buffer. It does the write locally in its own memory whenever a write is called. When a read is done by the transaction, first it has to check in its local memory if that same data item had been written earlier. If so it must take the locally stored value. Now what is the point of this? The point is that writes are done only when the transaction commits. So what is done in logging with deferred update is you have to write only the new value for anything which you update. You do not have to write the old value and what happens is that all the log records are written out with only the new value. So let me use the white board. So the log record will now look like this, transaction, which data item and the new value. There is no need for an old value because updates are never done till the transaction actually commits. And so what happens? These log records are created whenever an update is done and when the transaction wants to commit, all these log records are flushed, flushed to the log on stable storage. At this point the transaction is committed with not a single update having been done to the database. But as part of the commit processing, the logs are still held, the log is flushed and all the updates are written to the database, at least the database buffer at this point in time. So there is no need for old value, it is not required. Therefore there is no undo processing at all here. The recovery only has to do redo of transaction that committed. If a transaction failed, a redo is no longer required. So that is the benefit of deferred modification. But there are some drawbacks of deferred modification. One of the big drawbacks is that if a transaction does a lot of updates, it may run out of memory to keep track of what all updates it did. And then there are some other drawbacks as well. So practically most databases that I know do some form of immediate. But then there are things in between immediate and deferred. In particular, for example, Postgres QL uses snapshot, Oracle uses snapshot. Snapshot requires you to create new versions. So Postgres does not actually update the data item in place. It creates a new version of that data item. So now you might ask, why do we want undo logs? Because the update is never done in place. So undo logs may be needed for certain things. But for regular transaction updates, you might get away with only redo. There is no need for an undo because it is not updated in place. So there are various optimizations which are possible like this. And real databases do exploit these as much as possible to improve efficiency. Oh, good. Now let us get back to the slides. And I will have time again at the end of this session for more questions. So we were here when we left off. Concurrency control and recovery was the topic, next topic. So first of all, does the fact that transactions run concurrently affect the recovery technique? And the answer is yes, it depends on your recovery technique. A naively designed recovery technique will get into trouble with concurrent transactions. A properly designed one won't. If you have used earlier editions of the database system concepts book, we actually had multiple recovery algorithms. We said, okay, here is one recovery algorithm which works with no concurrent transactions. Now here is how we modify it to handle concurrent transactions and so forth. So we ended up with many different recovery algorithms and that was a bit confusing. So we decided, the idea was that we present a simpler algorithm which will help you understand concepts. But we decided there are too many algorithms and they may be causing confusion rather than helping. So now we just present one algorithm which actually works regardless of concurrency or not. The other part is that there are certain things which the system must do to even allow recovery. In particular, we assume over here that if a transaction TI has modified an item, no other transaction can modify the same item until TI has committed or aborted. That is, updates of uncommitted transactions should not be visible to other transactions because otherwise we can't even do an undo. Now this turns out to be important for recovery with respect to, if two transactions which are running concurrently both update a data item, first T1 updates it from 10 to 20, then T2 updates it from 20 to 30. Supposing T1 now aborts, if we do undo logging, when T1 started the value was 10 and T1 made it 20, T2 took it from 20 to 30. If T1 aborts, what should the final value be? If you restore it to 10, you have not only undone T1, you have also undone T2. So can you undo T1 without undoing T2? It turns out there are special cases where there are operations where T1, T2 are adding money. So then you can subtract money instead of simply restoring the old value. So this is actually an important aspect and it is covered in the book. But in this short session, we are not going to look at it and the first algorithm even in the book makes the assumption that data items cannot be concurrently modified by multiple items, meaning if a transaction has updated the data item, it must hold on to the log till it commits. Till then nobody else can modify that data item, that's a basic assumption. We don't actually care about reads, we only care about modifications here. So now let's look at the undo and redo operations which are done using log records. So if you remember a log record has a transaction identifier, a data item identifier x, v1 and v2 which are the old and new value. The undo of a log record simply writes v1 that is the old value to the data item x. The redo of a log record like this writes a new value that is v2 to the same data item x. So that is for log records. What about for transactions? A transaction has multiple log records. How do you undo a transaction? Basically undo of a transaction restores the values of data items going backwards from the last log record. Why? It goes from the last log record and for each log record it does an undo of that log record. So for that data item it restores the old value. The order in which you do it turns out to be important because a transaction may update the same data item more than once. It may update it from 10 to 20, 20 to 30. So if you do the undo backwards, it is first taken from 30 to 20 and then from 20 to 10 which is correct. If you do the undo going forward, the value is 30 but the old value is said to 10 and then going forward the record which had an old value 20 is processed and the final value will be 20 which is wrong. The value should be 10. So that was the initial value when the transaction started. That is why you have to go backwards. So undo traverses log records of the transaction backwards. And the other important thing is that when an undo is complete, a log record ti abort is written out. This signifies that the undo is complete. But here is the most important step here. Each time a data item x is restored to its old value v, a special log record ti xb is written out. What is this special log record and why is it written out? So again if you are familiar with some of the older recovery algorithms which were there in R-Tex book or there in other books, the undo does not actually write any log records. It just restores the value. Here the undo is doing something funny. Not only is it undoing, it is writing a log record. And it turns out that this particular step that undo also records what it does. It is extremely useful. It makes the recovery algorithm much cleaner than otherwise. Yes, there is an overhead. Even when you are doing an undo, you write a log record. There are more log records. But the algorithm is much cleaner and much easier to show it's correct. Otherwise, you might have a buggy algorithm with the older algorithms which we described. It's much harder to show that they actually work correctly in all situations. In particular, if you had a failure, an undo had been started and then the database system crashed. So a crash in the middle of an undo, you have to show that things will still work correctly regardless. With this, it's much easier to show that things will work. So now every time you do an undo operation, you are writing this log record. So what is this log record? It's a redo-only log record. And if a crash occurs in the middle of an undo, these log records are found and the undo, they will be skipped by the undo. But if a crash happens after the undo is complete, that is after the TI abort is complete, essentially the transaction will be redone. It won't be undone. And I'll tell you what that means in a little bit. Before that, what is redo of a transaction? It sets the value of all data items updated with TI to the new value going forward from the first log record for TI. Here no logging is done. You are just doing what is already shown in a log record, no further logging. In fact, as you will see, we rarely redo a transaction by itself. We are going to redo everything together and we'll see that coming up. So the first step is what transactions need to be redone and what need to be undone. So in our algorithm, a transaction needs to be undone if it contains the log record TI start, that is at least one log record was written for it, but it does not contain either TI commit or TI abort. If even one of these is there, it's not going to be undone. So the idea is that if a transaction actually committed, of course you won't undo it. But if a transaction completed its abort, there is no need to undo it. In fact, you will redo the transaction. What is going on? When you undo a transaction, what you are doing is the transaction did a series of updates, ABC. The undo of it logs log records which record how those updates ABC are each undone by restoring their old values. Those are also in the log now. So the full log for that transaction has a series of updates which did something and then another series of updates which actually undid that and there are log records for the undo operation also, which log records, these ones, these special log records which don't have an undo value, just the redo value, those are there as part of the transaction log. So now such transactions which completed aborting will be redone, if they completed aborting and then there is a crash, the next time around they will be redone. So the transaction will be redone if it contains TI start and contains either commit or abort. Either one is okay. It will be redone. And the redo, what does it do? It first does everything that the transaction did initially and then it does everything that was done to roll back. So the net effect is the transaction does nothing. So why can't you just skip it? Well, if you skipped it, it turns out to be more tricky. It's hard to show things work, especially when there are more complex interactions, when there is early lock release and other such things. So usually that's not done. You will redo such transactions. You won't skip them. If there is a failure in between, you don't know what was the state of the disk. So you will redo these transactions. This idea that redo, redos all the original actions, including the steps that restored old values, is called repeating history. So this idea was pioneered by IBM. IBM did a lot of the key work on database system. Much of the things we read today are thanks to IBM. It started with Codd, but then there were people who built storage systems. There were people who built concurrency control, recovery, query optimization. Everything came from IBM in that era. That was an amazing time. And recovery, IBM had several algorithms which even predated relational databases. Their old, Coddassil and other databases actually included recovery algorithms. But when relational database recovery was done, again they took the lead. Later on, a lot of the extensions to the basic algorithms were done in IBM by a person called C Mohan, who is alumnus of IIT, Midlass. And he authored a series of papers on recovery, including a paper on the Aries, ARI, Aries Recovery Algorithm, which is considered the state of the art algorithm. It's now pretty old. It was published in 1992 or so. But now most databases implement Aries or some variant of the Aries Recovery Algorithm. It's a very nice algorithm, highly optimized. If you're interested in the algorithm, there are details in the book. But again, for lack of time here, we are going to skip it. So Aries was based on repeating history, popularized repeating history. So now, here is a small example of the log at different points in time. P0 started, it updated A from 1000 to 950, B from 2000 to 250. If a crash happens here at this point, this is all that the log has. What do we have to do? We have to undo T0. How do we undo T0? We are going to go backwards. We will restore B to 2000 and A to 950. In order to restore it, we are going to write out T0 B2000, T0 A1000. And then, when we hit T0 start, we will write T0 abort. So at that point, T0 has been rolled back. Now, supposing instead T0 continued and committed here, then T1 started and it had done an update, but it crashed. Now, what we have to do is, we have to redo T0 and undo T1 in effect. But the algorithm which we see will actually redo everything going forward, including the partial effects of T1 and then undo T1. So when T1 is undone, when T1 is redone, sorry, T0 is redone. It sets A to 950, B to 2050. Why should we do it again? The idea is, in the first time around, this may have been written to the database buffer, but it was never written to disk. Before the system crashed, this update never went to disk. So we have to do this again and make sure it goes to disk. They're actually optimizations. Ares can very often figure out that, hey, it was actually written to disk. I don't have to do it again. So that's one of the reasons Ares is very successful. It optimizes a lot of common cases and it's much faster as a result. So anyway, the redo logically redos everything. Ares optimizes it by skipping some of the redos which it knows are not required. And then it also does this redo effectively, and then it does an undo, restoring C to 700. And in this case, the redo redos T0 and T1. Now on last concept, then I'll give you the overall algorithm. The last concept is that all this redo and undo, what did I do? I started it somewhere in the log. Where in the log do I start? If I do nothing, I may have to start at the very beginning of the log. So I started my database, brand new database, start processing transactions after one week it crashes. Do I have to redo everything that happened in that one week? It's madness. The database will take forever to recover. That's not acceptable. So it's absolutely critical that I know from where I have to do the redo. And one way to ensure this is by doing what is called a check point. What a check point does is it ensures that certain things which are in the buffer are written out in effect. It ensures that all the modified data in the database buffer is written out. And it records what was the state of the log, up to where we were in the log when this started. Now the idea is that any log record which occurs before the check point. It's update should be already in the database and the check point should have flushed it out. That's a key idea. In order to do this, there are some other synchronization which have to be done. Meaning the writing of the log record and the updating of the page in the database must be coordinated in some way. Again, I'll skip the details for lack of time. But later in the chapter there's a description of the latches which have to be obtained. First get a latch, then write the log record, then write the page, then release the latch. So there are certain things like this which have to be done, discussed in the book. So once these things are taken care of and you have written out all the pages that were updated in the buffer. Then all the old log records are no longer needed for redo. And so they may still be needed for undo. And I'll show you a picture just coming up. So let me repeat. Checkpointing outputs all log records currently in main memory. So first output all the log records, output all the modified buffer blocks. Write a log record saying checkpoint L. Where L is the list of all active transactions at the time of checkpoint. And I'll come back to why this is required. And while doing this updates are stopped in this version of checkpointing. Now in a real database blocking updates, while a lot of outputs are going on is a bad idea. So later in the chapter there's a thing called fuzzy checkpointing, which is not as intrusive. It allows updates to go on. So here I have a picture which shows a checkpoint happening. So these are transactions, T1, T2, T3, T4, T5, and so forth, which are running. This left side indicates when it started. This right side indicates when it finished. And in the case of T4 and T5, they never finished. So they have to be aborted actually. So this picture shows that at this point a checkpointing was done. The key idea is that when I do a checkpointing, all the updates which were done by T1 would definitely have gone to this. Because the checkpoint outputs all the modified buffer blocks to this. So I don't care about log records of T2. What about log records of T1? I don't care. For T2, I actually don't care about these two log records which are before the checkpoint, whatever is here in this area. But I don't care about the log records that came after the checkpoint. So these have to be redone. These don't have to be redone because the log records were done and the pages were updated before the checkpoint. So the redo basically can start from the checkpoint. It does not have to look at any older log records. Now what about undo? It turns out that T4 was here, it didn't finish. We have to undo T4. T5 is another interesting case. T5 actually started before the checkpoint. Maybe it was sleeping for a long time here. Nothing happened. There's not a single log record perhaps for T5 after the checkpoint. It was idle, but it had not committed. Now a system crash happened. What do you do? You have to undo T5. You can't leave it hanging, otherwise some data will be inconsistent. So if T5 did not write a single log record in this entire period, how do I know that T5 even existed? I would have forgotten all about it. And that's where the list of active transactions in the checkpoint here. The checkpoint log record writes a list L of all transactions active at the time of checkpoint. That L will contain T5. Even if T5 did nothing in this period, the checkpoint record will contain L, I mean the list L will contain T5. So now that helps me realize that I have to undo T5. In order to undo T5, I may have to go and look at the old log records of T5, which are before the checkpoint. So some old log records are still required for undo. But for redo, only log records after the checkpoint are needed. This is the key idea. So now coming back. During recovery, the first step is to find out which was the most recent checkpoint. And there are a couple of ways of doing this. One way is to scan the log backwards from the end to find the most recent checkpoint log record. And then redo will start from there and go forward. And transactions that are committed or aborted before the checkpoint already have all their updates output to stable storage. Even transactions which was still active, all their old updates would have been copied to stable storage. And as I said before, some earlier part of the log may be needed for undo operations. This was that's why we are going to scan the log backwards. Still a record TI start is found for every transaction TI in L. So what happened here is, in this case L contains two active transactions, T2 and T5. Those were active at the time of the checkpoint. Now, I will realize that T2 has committed, so I don't have to undo it. But T5 has to be undone. So in order to undo T5, I may have to go back in the log records prior to the checkpoint. But I can stop once I hit the start of T5 because now T5 is undo is complete and nothing more needs to be done. Okay, so I have almost given you all the details of the recovery algorithm in parts, bits and pieces. So I've given you all the concepts. Now I'm just going to put it all together to create a single recovery algorithm out of all of this. And as I mentioned earlier, the earlier editions of our book had multiple recovery algorithms. In this edition, sixth edition, we have reduced it to just one to make life simple. Actually, it's not just one. The one I'm describing now is the basic algorithm. It's very simple. After that, we have an extension to handle early lock release, non-to-face, we assume strict two-phase locking, exclusive locks are held to end of transaction. But for performance, some cases, they may be released early. And then the recovery algorithm has to deal with it. So that is described. And finally, the Aries recovery algorithm is also described in the chapter. But they kind of go stepwise. The first algorithm is a simpler one, which we describe. The next one is an extension. And the last one shows how to optimize the previous algorithm. Okay, so we're just going to present the simplest algorithm. And here is what it does. Any recovery algorithm has two parts. During normal processing, as transactions run, certain things are done. So here, this is logging. So logging is done as follows. When the transaction starts, you write a TI start record. Every time there is an update done by the transaction, you write a TI, item, whichever item XJ, V1 old value, V2 new value. And when a transaction commits, you write a TI commit. And what happens if a transaction rolls back, that's coming up here. So this is during normal operation. The system is running fine, due to a deadlock or integrity constraint violation, we decide to roll back TI. What do we do? We scan the log backwards from the end for each log record of TI. It looks like this. We perform the undo by writing V1 to XJ. This is TI XJ V1 V2. So we write V1 to XJ. And we write this special log record TI XJ V1, that's the old value. Such log records are called compensation log records. So this log record is compensating for an earlier log record, which changed XJ to V2. This is compensating by restoring it to V1. So we continue the log scan backwards. Once TI start is found, we can stop. That's it, we have completely undone all the things this transaction did. At that point, we will output a log record TI abort, which indicates that the rollback is complete and no more undo is needed for this transaction. If a failure happens after this, during recovery, we will find the TI abort log record. And no undos will be done again for this transaction. But a redo would be done. And that redo would perform all these steps, which were part of the undo. Because they are logged here, the redo would execute these steps. So the redo would redo the undo which happened earlier. I hope that's confused you not too badly, but I'm sure it's a little confusing. Okay, so now let's look at recovery from failure. The system crashed, it's come back up. What do you do? The first phase is the redo phase. We replay updates of all transactions regardless of whether they committed, aborted or incomplete. We don't care. We're not even going to try to find out. We will do a complete redo. Now remember, I call this repeating of history. We're going to redo everything that happened regardless of whether the transaction actually committed or aborted or was incomplete. After that, there is an undo phase, which undoes all incomplete transactions. Why incomplete? Something which aborted earlier, finished the abort, is complete now. Only incomplete ones which neither committed nor aborted have to be undone in the next phase. So that is the two phases. Let's look at the redo phase. I'll show an example with all this, by the way. If you don't understand it now, the example will help you. So the first step in redo is to find the last checkpoint L log record. L is the list of transactions that are active at the time of checkpoint. So I'm going to save that value of L in a variable called undo list. It's a set of transactions, which were active at the time of checkpoint. That's the initial value of undo list. Now I scan forward from the above checkpoint. Whenever you find a log record like this, ti xj v1 v2, we're going to redo it by writing v2 to xj. Why? Because the database might have crashed earlier before v2 was written to the database. So that write of v2 to the buffer was lost when the database crashed. So now we're going to write it again, and it will now be in the buffer. And that will get output eventually. Whenever a log record ti start is found, we add ti to undo this. What is this log record? This is a new log, new transaction, which started after the checkpoint. I don't know whether this transaction would commit or abort. At this point, I don't know. That is going to be found as I scan forward in the log. So tentatively, I add ti to undo list. So undo list contains things, which as far as I know right now may need to be undone. On the other hand, whenever I find a log record ti commit or ti abort, what does that mean? Ti is done. We have done all the steps for it. We no longer need to undo it. In that case, we removed ti from the undo list. So during the forward scan, if I find ti start, add it to undo list. If I find ti commit or abort, remove it from the undo list. Those are the two basic kinds of things which we do in redo. By the way, here, even if you find a log record ti xj v2, that is a compensation log record, that is also redone. It's not shown here, but those are also redone in the same way. Okay, so what does redo do? It readers everything and the undo has to undo things which didn't finish. In other words, at the end of the redo phase, if I look at that variable undo list, it will contain all transactions that didn't finish. Some of these might have been active at the time of checkpoint. Some of these might have started afterwards, but none of these actually finished. That is, none of these has a commit or abort. So all of these have to be undone. So how is the undo done? Undo scans the log backwards from the end. Whenever we find the log record of this form, ti xj v1 v2, if ti is in the undo list, we have to perform the undo action. Now we are not undoing one transaction at a time. In one pass, we are undoing all incomplete transactions. The order matters in some cases. So how do we undo it? We write v1 to xj and write a log record ti xj v1. This is the same as undo during single transaction rollback. The difference is we are not doing one transaction at a time. We are scanning the log backwards and undoing whatever we encounter which needs to be undone. Now whenever a log record ti start is found, what do we know? If it is an undo list, we know that we have undone all the steps of this transaction. So at this point, we will write a log record ti abort. So we have finished the undo for it and removed ti from undo list. So ti is done. Now how far does this keep going? We stop when undo list is empty. In other words, undo list was all the things which were active and had not completed. Now when a ti start has been formed for every one of them, we have successfully aborted all of them. And at that point, the undo phase stops. After this, normal transaction processing can come in. So this is shown through an example here. If you didn't understand it fully, this example should make things clear. So here's the log, t0 started, updated b from 2000 to 2050. T1 started. Now a checkpoint happens, t0, t1 happens here. Why t0, t1? Both t0 and t1 are active at the time of checkpoint. So this is the active list. t1 now updates c from 700 to 600. t1 commits. Note that t0 has not yet committed. Then t2 starts and it changes a from 500 to 400 and now what has happened? t0 maybe had a deadlock or by integrity constraint violation. We have decided to roll back t0. So how do you roll it back? What are its log records? The only log record, it has a b from 2000 to 2050 and start. So correspondingly, the compensation log record is t0 b2000, which restores b to the old value 2000. Then we find the start record and we write out t0 about. So what has happened is t0 has rolled back successfully during normal operations. At this point, t0 about has started. And at this point, when t0 about is written, t0 rollback is complete. Suppose now a crash happened at this point. What happens? The redo pass starts at the checkpoint. And what does undo list at this point? It has t0 and t1 here. Now going forward, what does the redo pass do? It sets c to 600 and it removes. When it sees this, it removes t1 from the undo list. So now the undo list has t0. Then when it sees t2 start, it adds t2 to the undo list. Undo list is now t2 t0. Then it finds this record, it sets a to 400. It finds this record, it sets b to 2000. It finds t0 about. It removes t0 from the undo list. This is the end of the log at the time of the crash. So the redo comes up to here. And redo is done. And what is undo list when redo finishes t2? Only t2. Because t0 about was complete, it's not in the undo list. Only t2 is there. Now what does the undo pass have to do? It goes back in the log. And any log record for t2, it has to deal with it. So going back here, it finds this log record for t2, which sets a from 500 to 400. So what does it do? It resets a to 500 and writes a log record, t2 a 500. Then it finds t2 start. So it writes t2 about. And that's it. At this point, it removes t2 from the undo list. And the undo list is now empty. So the undo pass stops here. When t2 start is hit, the undo pass stops. That's it. In brief, that was the recovery algorithm. I have a quiz. Again, I'm not going to make you do this on Akash, but look at it and think about it for a couple of minutes. Just read these two quiz. I hope you've had time to read and understand the question. So the key points here, this quiz was to reinforce this. Repeating history performs redo on all transactions. Everything, regardless of whether it committed, aborted, or is incomplete, everything is redone. So what does redo do? It just sets the value to the new value. That's it. Now the second quiz question is, repeating history performs undo on what transactions? It does not do it on all transactions, of course not. It does not do it on transactions that are completed and it does not do it on transactions that are aborted. Whether it committed or aborted doesn't matter. They are all complete. Aborted transactions have all the steps, redo log records for all the steps that they did initially. And redo log records for the steps that undid those initial steps. So the redo over here, on transactions, all transactions, including those that aborted. So that redo would have completed the undo for that. So there is no need to undo. You should not undo transactions that have aborted. So undo is performed only on incomplete transactions. This would be a good point to take a few questions. Vivekananda College, Good morning, sir. Can you explain the working of timestamp techniques and locking techniques of concurrency control? So we saw both of these. Locking we saw in some detail. Locking we saw that whenever you read a data item, you have to get a shared lock. Whenever you update it, you have to get an exclusive lock. And locking must be at least two-phase, but practically all the locks are held until the end of time. Timestamping, on the other hand, was based on assigning a time transaction, a timestamp. And whenever you find something which goes out of order, the transaction is aborted. So it may seem that timestamp is much easier. It doesn't hold locks. There is no deadlock. But there is a drawback. And the drawback is that there are transactions which might have done perfectly fine in locking. But because the timestamps were assigned at the time the transaction starts, a transaction may be rolled back. So it's, the transaction started and it has not done anything yet. Meanwhile, somebody updates a data item Q, some later transaction. And then this transaction reads that same data item. It's going to be rolled back because it's an older transaction which wanted a newer data, which was not allowed to read an update done by a newer transaction. But if you did locking, maybe you could realize that, yes, this has an older transaction. This is an older transaction, but it doesn't matter. There is no cycles in the precedence order. The locking ensures that. And it's not rolled back. It might be allowed to get the lock and proceed if the other one has committed or it might be made to wait for some time and then it will get the lock and proceed. Of course, in some cases, there could be deadlocks. But if you design the system reasonably, deadlocks are not too common. So a timestamp protocol may be easier to implement or at least appear easier to implement, but it has some overheads of repeated rolls, unnecessary rollback, which locking might have avoided. So the resource consumption could be more. Can you follow? Thank you, sir. Okay, we have GRIET, Mr. Patpalli. Yeah, good afternoon, sir. What exactly the meaning of implicit locks? Where we can use that implicit lock in the databases? So what is an implicit lock? So this basically comes in the context of multiple granularity lock. So when you get a lock on a relation, let us say, without getting locks explicitly on the individual tuples, you implicitly have locks on the individual tuples. You didn't ask for the locks one at a time. So you said lock the relation. You didn't say lock individual tuples, but implicitly you have locked individual tuples also. So this is part of multi granularity locking. I didn't get into multi granularity locking in detail, but I think this example should make it clear. On the other hand, if another transaction wanted to lock individual tuples, but not the whole relation, it will get what is called an intention lock on the relation and then get explicit locks on tuples. Now, obviously these two conflict, and the conflict is detected in a different way, but conceptually the conflict is because the first transaction implicitly has locked all the tuples, even though it didn't explicitly lock it and the other one explicitly locks it, but this conflict also has to be detected and it is detected by multiple granularity. Does that answer your question? Another question, sir? Yeah, yeah, exactly. Another question, sir, how to view existing locks on the database? How do you see what locks are there on the database? Currently. Again, most databases do provide a way to view this, but it's non-standard. So if you're in Oracle land, I think if you use one of the common things, stored is a common tool, it can tell you what all locks are there at a point in time. For PostgreSQL, I have not seen this. It doesn't exactly do locking. I don't know if you can see the internal details like that, but I wouldn't be surprised if there is some way to do it because people who debug these systems will need to know what is going on. So that is probably some way to do it, but I am not familiar with it. Okay, let's go on to the next few slides and then I'll take some more questions. So last few things in the recovery section that are coming up now. The first is log record buffering. So earlier, the assumption in the logging, in the whole algorithm was that whenever I write a log record, it is immediately output to stable storage. That's a big cost. If every time I write a log record, I have to do a disk I have to output it. That can be very expensive. So I said what I want to do is allow many log records to accumulate in the same page. And then when the page is full, I'll write it out to this. So I don't write the same log page today many, many times. So I would like to postpone it. I can't postpone it indefinitely. When a transaction wants to commit, I have to output the log records. I can't postpone it much beyond that. So the idea here is that when I write a log record, I don't necessarily write it immediately to stable storage. I will keep it in memory. However, the problem is if I postpone this and meanwhile, a block of data in main memory is output to the database, without the log record describing the changes to it being output, we are in trouble. So think of a transaction which has updated a block. It has written a log record. The log record is still in memory and the transaction is not committed. Meanwhile, that particular buffer block is written to the database. So what you have is an update of an active transaction. It's in the database, but the information required to undo it is not in the log. If you have a crash at this point in time, you cannot undo that update. That's a big problem. So the key solution here is that before you write that particular buffer block, you have to make sure that all the log records for that buffer block have been written to disk already. That's a minimum number of log records that must be written to disk before you output a particular block. This is called the write-ahead logging rule. The write-ahead logging rule says before a block of data is output to the database, all log records pertaining to data in that block must have been output to the stable storage. Strictly speaking, only the undo information is output, but in our algorithm, undo and redo are stored together. Some algorithms keep them separate. Then you need to put on the undo information. Then there's another issue. We implicitly said that when you write a block, it doesn't have to be output to the database, but it can be output at any time. It can be output while the transaction is active, and it can also be output well after the transaction commits. So there are variants. For example, the force policy says that when the transaction commits, all the blocks that it updated must be output to disk. The no force does not force that. Force is more expensive. Some algorithms reduce force, but our algorithm doesn't. And real algorithms don't. The other is the steal policy. That is, no steal means that if a block has an uncommitted update, you cannot write it to disk. It's banned. It has to stay in memory while it has uncommitted updates. Again, this is very restrictive. So our algorithm and any real algorithm allows the steal policy. It allows blocks to be output, even if they have uncommitted updates. The next small topic is that we assume that we use trick-to-face locking with exclusive locks held to the end. There are certain situations where you want to allow early lock release, such as for high concurrency access to B plus trees. And to deal with this, you need a notion called logical undo. And the recovery algorithm in the book called the advanced recovery algorithm, which goes into detail on how to deal with this. And like I said, there's the Aries algorithm after that in the book. Now the last part of recovery is remote backup systems. What is a remote backup system? So here is a primary copy of the database. It has data. It has log records. Now here is a backup system sitting somewhere else. And there's a network which is used to send things which happen here to the other one. So what is the goal of remote backup systems is to provide high availability. In case something happens to this site, the data center burns down. There's an earthquake, and the building crashes. Or as in certain buildings in Mumbai and in Bangladesh, for other reasons they come crashing down, you might lose data here. So you don't want that to block operations. You want high availability, not supposing there's a power failure. And the database has to shut down. The data is not lost. But you don't have availability at that point. That is also an issue. You don't want a state bank to stop operations because there was a major power failure in Navi, Mumbai, where the primary is located. So they have a secondary in Chennai. So problem here does not mean that state bank all over India comes to halt. So this area in general is called high availability or HA systems. So there are several things which need to be done. First of all, the database itself must provide a means to ship data to the backup. So what data? Initially the database contents have to be copied to the backup. Subsequently, all the log records that it generates have to be copied to the backup so that every operation done in the primary is also copied to the secondary. In fact, what many implementations do is they have a storage system down here. And that storage system, every write you do to the disk is copied over to the backup system. So that way there's a file system level backup, which is continuous. And there's a database system level backup which copies the log records. So both are done. So a little redundant, but it gives a little more safety. Now if a failure occurs, the other guy has to take over. But the first step is how do you know that failure occurred? It may be possible that this link between the primary and the secondary is down. This network link is down. Now if the backup says, ah, I can't talk to the primary. It must be down. I will take over. What does it mean to take over? Now there's an application server sitting from there. The application server knows about both of these. If it cannot contact the primary, it will go to the backup. So what shouldn't happen is that some of the application servers can't reach the primary. So they start using the backup. The backup also says, I can't reach the primary. I'm the king now. The king is dead. I'm the new king. And starts ruling. Meanwhile, at the primary, the king didn't realize he had died. And the king continues to reign over his part of the kingdom. Now the kingdom is split into two parts. This, of course, would have been a real issue long ago. And in fact, it is an issue even today. You never know who is the king of BCCI today. It may change. But in the context of databases, what we need to do is to have multiple links between the primary and the backup. So if a link fails, hopefully the other links will still be alive. If all links fail, then the likelihood is that the primary is actually dead. Not a guarantee, but it improves the chance vastly. And on these links, heartbeat messages are same. What is a heartbeat message? It's like saying, I am alive. I am alive. Every second you send a heartbeat message over all the links. So if all the links are unlikely to die together, so as long as the primary is up, the secondary will keep getting heartbeat messages from at least one link. If no heartbeat messages come, the primary is dead. So now the secondary takes over. What does it do? It first performs recovery using its copy of the database. And all the log records that it received from the primary. If the whole file system is replicated, it has a copy of the logs and of the data. It performs recovery. And then it is up. Now the primary should not take over as soon as it comes back up. It has to do a handshake with the secondaries. So the secondary says, OK, the primary is up. Now I will stop processing transactions. And meanwhile, it will send all the updates that it does dead to the primary. So the primary catches up. And then the primary can become the king again. So there are some issues. The first issue is, how long does it take to recover? Supposing you have to process a lot of log records, it may take a long time to recover. So I have to process all of one day's worth of log records. It may take a very long time. That's not acceptable typically. So what is done is what is called hot spare configurations. In hot spare, the backup is continuously processing redo log records as they arrive. Every update that comes, every log record that comes, it's doing a redo. In other words, the backup is continuously doing recovery. It had an old copy of the database. It's continuously running recovery, so it is up to date. When a primary failure is detected, there are no more log records to come, it just finishes the undo pass. Note that our recovery algorithm is very amenable to it. Redo is done for everything, regardless of whether it succeeded, failed, whatever, we don't care. And then undo is done only for active transactions. So this is pretty easy to implement. And this is very widely used. All the banks today have this kind of setup. Everybody who needs high availability does this. And in earlier days, only the big commercial databases supported high availability. Today, Postgres SQL has a high availability module, which lets it ship log records to the backup. And this configuration is widely used. The gate exam this year, the registration, processing of applications, results, everything was done using a Postgres system with high availability turned on. Two copies of the Postgres database. That's it for recovery. I'll take a few questions at this point. Live and chat are both welcome. We have HCTM, Haryana, please go ahead. Hello, hello, my name is Dr. R.K. Bhatla from HCTM Technical Campus. Hello, sir, I have a question. Can we make use the concept of access list and capability list to implement log key mechanism for maintaining in log-based protocol? You're talking of access control list, which is a security mechanism. You want to know if you can use it for locking. Normally, locking is done in a different way. So the databases internally don't have access control list normally. Some databases do support it. But this is not how locking is done. Access control lists are not designed to deal with concurrent updates to access control lists and so on. So no, the answer is they are two separate things. I don't think you can mix them up with locking. Does that answer your question, or do you have some follow-up? Yes, sir. And the capability list, sir? Same thing. Capability lists are also used for access control and or, I don't think that has anything to do with locking. That's a different concept. Now, of course, access control to data and databases is important. I briefly mentioned the SQL grant model. And I think I mentioned that it is kind of broken in the sense that it only allows you to access all of the records in a relation or none of the records. What it does not allow you is a particular person to access a particular record. So there has been some work on how to do fine-grained authorization. And Oracle SQL Server and some other systems have some specific ways of doing this to allow access to certain rows but not to others. What factor should we consider for map reduction programming model? OK. So I'm not sure what you mean by what factor should you consider. If by that you mean, when should you use map reduce? When should you not use map reduce? If that is the question, map reduce, first of all, you'd use it only if you have a very large amount of data. So you want to process it in parallel. And second, given the availability of FIVE and other things in open source today, you'd use it only if you want to do stuff which is not easy to do within the SQL FIVE framework. So those are the contexts in which you would go down to do raw map reduce programming. Otherwise, you may be better off using Hadoop but with FIVE on top. I didn't get into FIVE but FIVE is available open source. You can download and run it along with Hadoop. Does that answer your question? Thank you, sir. Thank you. You also mentioned research somewhere along the area. There are many, many areas of research on how to effectively use these kinds of massively parallel systems to do many things which were done on smaller amounts of data earlier in data mining in particular. So there are a lot of research areas there. It's an active area. It's worth looking at. We have Charu sir, please come in. Hello, sir. My question is how many number of versions are generally possible or capped when multi-version transaction is implemented? And does the post-grace has the implementation of the same? That's a good question. How many versions should be kept when you have multi-version concurrency control? As I said, with multi-version concurrency control, every time something is updated, a new version is created. So the question is how many versions do you keep around? The first part is that how long do you need a version? So it turns out that you need a version only as long as there is some transaction which may access it. So the transaction has a timestamp. It will access a version if its timestamp is more than that version, but less than the timestamp of the next version. So the idea is that if there are some very old transactions there, they may require certain versions. But after a period of time, transactions don't hang around forever. They commit or abort and go away after some time. After that point in time, an older version could be unnecessary at that point. So it can be garbage collected. I didn't talk about this earlier, but in the book, this is discussed. So essentially, based on what transactions are alive at a point in time, you can decide that certain versions are not needed and delete them. After some time, you just need one version of the item. You don't need old versions anymore. If it has not been updated recently, one version is sufficient. Another way to interpret that question is there a limit on how many versions are required, can be kept. I don't think at least post-crisquial has such a limit. In Oracle, there is another issue that the way they get old versions is through the use of undo log records. And there, what can happen is because of space limitations, undo log records may be thrown away and you can no longer access old versions. In that case, some transactions which needed that may be rolled back. And then they can be restarted with a new timestamp and hopefully this time they will complete. So what about post-crisquial's implementation? Post-crisquial does implement snapshot isolation with multi-version, of course. That's part of snapshot. So it keeps multiple versions of data of tuples around. Turns out multi-version complicates life in some ways. If I have an index, primary key index, what is going to happen? I have a new version, I have an old version. And so I have two records with the same value for the primary key. That doesn't mean there is a violation because they are separate versions of the same thing. So both might have to be indexed because one transaction ran with time stamp x, another ran with time stamp y. So both have to be in the index. And then you have to filter it out, depending on the transaction. Similarly, for foreign key and so forth, you have to deal with the fact that multiple versions are there and ensure you are using the right version at any point in time. Does that answer your question? Another question that I have is, in snapshot isolation, each transaction used separate snapshot or the single common snapshot? That's a good question. In snapshot isolation, each transaction gets its own logical snapshot at the time it started. If two transactions start at almost exactly the same time, they may get the same snapshot. In particular, if one transaction, read-only transaction, I mean any transaction starts, and another starts just after it, if we were using, if we knew they read-only, we could give them the same time stamp. If they are not read-only, they would have slightly different time stamp. So the point is that their snapshots would be the same if no update were committed in between. So between these two guys starting, if there was no transaction that completed and wrote its updates, they would actually have the same values in their snapshots. So that can happen. But otherwise, each one has its own snapshot. Anyway, each one has its own snapshot in the sense that if they do an update, it is written in their copy, but not in the other person's copy. So logically, each one has their own snapshot. And their own updates are reflected in their snapshot, not in other people's snapshots. Yeah, go ahead. This is from team college. Boy sir, go ahead. I have transactions, various DCL transactions such as a commit and rollback. Can you please emphasize on some save point transactions? So I said that the transaction is rolled back. And I assume that it is rolled back all the way to the beginning. Now there are some cases where you don't necessarily want to roll it back all the way to the beginning. It's possible to roll it back partially and then restart from there. Now this is more practical if this transaction is a stored procedure. If the transaction was done from an external Java program, there is no way to say, come back up to here and then restart from here. However, if the Java program itself decided that it wants to roll back, but not fully, it can create a save point. What is the save point? It's just a marker in some sense that is set. And then you can say rollback up to this save point. So the transaction doesn't roll back fully. It rolls back partially up to that point. And then the code which told it to do a partial rollback might say, OK, now we will continue from here and do something else. So maybe it tried something that couldn't complete because of some violation. It rolls it back partially to undo that part of the work and then tries an alternative way of achieving what it wanted. So that's what save points are used for. Can this save point can be permanently stored in the database or only for some time? The save point is only for the duration of the transaction. The transaction commits a robot. That's it, the save points are gone. Professor, just continuing with the same thing. You said external Java program. What do you mean by that? Meaning you can distinguish a case where the transaction is a stored procedure. So the database system knows exactly what is the next step and what happened. And it could do some special casing if it wanted. I'm not sure that anybody actually does this. But the point I wanted to make is that if the database decides to roll back to a save point without the knowledge of the program which is issuing the SQL commands, this is not practical. It has to tell the program that you have been rolled back to the save point. The program has to figure out what to do. And those kinds of things are not really supported today. So what happens is that if the database decides to roll back due to deadlock or something, it will probably just roll back the transaction completely. However, a program which is running these queries can ask for a save point to be created and can request the roll back to a save point. And that is practical use of save points. That is the main point that I wanted to make. Is that clear? Thank you, Professor.