 which is recovery. So, the overview is we are going to first look at what are failures and how does that affect recovery, a little bit on storage impact on recovery and then we will focus on atomicity, logging and we will wrap up with something called remote backup system. So, first of all what are the failures that we need to deal with. Recovery has two issues, but the primary issue in recovery is how do you deal with failures of various kind. So, one kind of failure is transaction failure meaning there is a logical error for example, an integrity constraint violation and therefore, you have to roll back the transaction. If you have to roll it back how do you undo the partial effects of the transaction, so that is part of recovery. Then there could be system error for example, you may have a deadlock which forces the database to roll back some transaction. The recovery issues are similar here, how do you undo the partial effects. Then you might also have a system crash. Now, what is the system crash? One kind of system crash is when power fails and the system just shuts down, but there are many other things. You may have had the operating system hang blue screen on windows. What kind of things can go wrong here? Now, in the worst case something very bad can happen and there is a bug which corrupts the database. These kind of things are very very hard to recover from if there is serious corruption. You have to essentially take an old backup and recover from that it is not easy. The main focus of all the techniques will be you know the techniques we look at make what is called the fail stop assumption that is non-volatile storage contents are assumed to not be corrupted by system crash. Now, this is an assumption you cannot prove that it holds, but you can do a lot of stuff to make it happen as far as possible not a 100 percent guarantee, but all database systems and for that matter operating systems and so forth do a lot of things to check internally for corruption. They have checksums on various data structures. They have checksums on pages and so forth. So, if they find checksum is violated that means some corruption is happened and at that point they just stop and run a recovery system algorithm, recovery algorithm which we will see. So, the basic idea here is that before something actually results in the disk copy being corrupted the database system will usually block it. This is not a 100 percent guarantee rare bugs can cause corruption, but it is a tribute to the careful way in which database systems have been written that there are so many applications which depend critically on databases, but very rarely have you heard of somebody losing data due to a database crash. They are very well written piece of software typically all those which we use have been well written and most of them have had some occasional instance of corruption due to a bug, but it is extremely rare, but all of them all system which we use do have problems where the system crashes and power fails and so on. So, these are the kinds of things which we need to deal with. Now, you can also have a disk failure which actually leads to data loss. We know how to use rate to minimize that chance, but if in spite of that data loss happens what you do. So, then you have to have a backup and you have to recover from the backup. Now, storage structures divided into three levels now, we already saw volatile and non-volatile. Now, on top of it we define another thing called stable storage which is in some sense a mythical form of storage that survives which is assumed to survive all failure. Practically that is not possible you can have coordinated failures that destroy it, but you can get an approximation of it by using two things. One is rate. So, that if a disk fails you do not lose data, but rate by itself is not enough. You may have a fire in your data center which destroys it and both copies of your disk may be gone. So, you need a non-local remote storage as well. So, stable storage needs to have remote backup and so forth. We are going to come back to remote backup in end of this chapter. So, we are going to assume that there is stable storage and we will say write data to stable storage meaning that once it is written it will not be lost that is critical. Now, let us move on to data access. Physical data blocks are those data blocks that are resulting or residing on disk. In contrast the buffer block is a block that is temporarily in main memory. So, this is based on the idea that there is a disk buffer. When you want to read a block, the first step is to input the block from disk. By disk I am including solid state disk. It did not be hard disk. So, you input the block from the disk or the physical location to main memory. So, now it is in the database buffer in memory and now you can proceed to read from the in memory block, write to the in memory block, but the in memory block is copied out to disk when an output B operation happens. The output operation transfers the block B to the disk and overwrites whatever was there on disk earlier. So, input and output are the two operations that affect the disk content for the stable contents. Reads and writes are assumed to happen in memory. So, this is shown schematically here. This is the buffer and the buffer has several blocks, two of them are shown A and B. Input A copies the block from disk into buffer. Output B copies a block from buffer to disk. Now, when a transaction wants to read or write a tuple x or y, the first step in order to read x is to find which block it resides in. Input it if it is not already resident and then return it. Similarly, when transaction wants to write y, it is written in the buffer. It is not directly written to disk at this point. The write to disk happens only when the output operation happens. So, essentially writes could be done in local memory of the transaction. It could also be done on the buffer, but until the output happens the disk copy is not updated. I hope that is clear. Input output affect disk, read and write are assumed to be in memory operations. Next, we need to ensure atomicity in spite of various kinds of failure. The key to ensuring a atomicity is to first output information describing the modifications to stable storage. What do you mean by stable storage? I already told you it is assumed to survive crashes of all kinds. Once I have written something to stable storage, it is safe there. Even if the system crashes, I can get that data back from stable storage. That is the idea. So, we are going to output information about the modification to a log on stable storage before doing the modification on the actual disk copy of the database. The focus is on what are called log based recovery mechanisms. There are alternatives. There is something called shadow paging. We are not going to cover it here. The book briefly talks about it. It is used in non-database applications. Your editor typically makes a backup of your file while you are editing it. So, if you quit the edit session in the middle, the original file is not affected. So, that is called shadow copying or shadow paging. So, sticking to logging, log based recovery is based on log records, which are stored in a file called a log on disk. Log can have actually multiple files. A log is a sequential structure. So, it is a sequence of log records and all updates are basically logs over here. Now, when a transaction starts, it registers itself by writing a ti start log record. Before it does a write x, it writes a log record ti x, which is the identifier of the particular record. Then, there are two values which it writes. The v1 is the old value of x and v2 is the new value of x. Now, when I say x here, it could be a whole tuple or it could even be an attribute of a tuple. It does not necessarily have to update the whole tuple. So, this log record says what is the old value and what is the new value of whatever that part of the tuple is. This log record is written to the log on stable storage before p executes the write. That is the key idea here. Now, when ti finishes, the last statement of the transaction is a ti commit. This is also written to the log. The basic idea is that when this commit record goes out to the log and the commit record is written to the log, then the transaction is safe in that all that updates that it did are now on the log in stable storage. If a failure happens from here on, we have enough information to replay whatever the transaction did. We can tell the user that now you can assume the transaction is committed. So, if the cashier wanted to do a transaction deducting 100 rupees from the bank account, after the log record is written, the cashier can be told go ahead now. It is committed and then the cashier hands out 100 rupees to the customer. So, now there are actually two approaches to using logs. One is called deferred database modification. The other is called immediate database modification. In deferred database modification, updates are not done immediately. The updates are done to the database only when by database I mean even the quack here. In deferred update, the updates are kept local to the transaction until the transaction commits. At the time when the transaction requests a commit, the updates are written to the buffer here. On the other hand, in immediate modification, the update can be written to the buffer at any time during the transaction's execution. It does not have to wait for commit. So, those are the two approaches deferred and immediate. So, here the immediate modification allows updates to be made to the buffer or even to the disk itself and we will see this later even before the transaction is committed. And the log record for the update must be written before the database item is written. And at this point, we will assume the log record is output directly to stable storage. Later we will see how to postpone it for efficiency. The other part is that the order in which blocks are output to the disk can be different from the order in which they are written. And in fact, the output of updated blocks to the stable storage on disk basically can take place at any time before or after transaction commit. So, this is important. With this technique, with logging basically, there is no need to write the update immediately to the database. Why would you postpone it? You might postpone it for efficiency. You want to collect multiple updates and write it together. What if there is a crash and it has not yet been written? Well, all the writes have been logged. So, the log has enough information to redo whatever is required such that the database state is recovered in case there is a crash. So, you can postpone outputting to the database. That is a key idea. The different modification scheme, as I said, differs updates even to the buffer until the transaction commits. And this simplifies some aspects of recovery, but it has an overhead of storing local copies. We will not go further into deferred modification. We will focus on the immediate update. The next topic is something which I already said that a transaction is committed when its commit record is output to stable storage. Now, all the log records are written in order. So, when the commit record is written, everything else the transaction which has already been logged is going to be output to log already. When I said writes perform a transaction may still be in the buffer when the transaction commits and may be output much later. Why do this? It reduces the number of discrites to commit a transaction, faster commit. In particular, if a page is updated many times, because it is not immediately output to disk, the updates of this transaction and the next and the next may all be collected on that page before its output. So, instead of 3 or 5 different outputs, its output only once. So, the number of outputs done actually can decrease. So, efficiency improves. So, here is a small example of immediate database modification to illustrate what goes on. T naught starts log record is written. T naught updates A from 1000 to 950 and it updates B from 2000 to 2050. So, it is transferring 950 from A to B. So, the log records are written and then the write happens. A is assigned 950, B is assigned 2050. So, this write is in the buffers that it is not yet been output. Then T naught commits T 1 starts, T 1 updates C from 700 to 600 and it also does a write to the buffer at this point. Now, T 1 has not yet committed. At this point it is possible that the buffer blocks for B and C are output. So, you are seeing two things that A, first of all T naught has committed, but its updates have not yet been output. B that T 1 has not yet committed, but already its buffer block containing the update to C has been output even before T 1 commits. After this T 1 commits and maybe after this the buffer block A is output much after T naught commits. So, this is what immediate database modification allows, but the important thing is the log records already have the writes which T naught did and which T 1 did. So, even if something has not been output it is ok. Conversely here if T 1 does not commit, but a crash happens at this point. After the buffer block has been written out a crash happens. So, what has happened is that the disk copy reflects a transaction which did not commit and whatever update was done has to be undone. The good news is that this log record T 1 C 700 600 has enough information to undo it. What is the information? The old value of C that is available. We can use the old value to undo this thing which on the disk copy and that is part of recovery. I will stop here, but I will take a couple of questions. Sankhara Charya, I am having a doubt that how shadow copy concept is maintained in distributed systems. First of all as I said shadow copy is not widely used. Initially when recovery in databases was thought of people did implement shadow copy scheme. The problem with shadow copy scheme is that it does not work well with concurrent updates to the same thing. So, then people extended it to variant with shadow paging, but even that runs into trouble if two transactions update data on the same page. So, the shadow base schemes have been abandoned for things with higher concurrency. They are still used in you know when you edit a file a shadow copy is made, but then two people are not editing the file at the same time. Furthermore the file is relatively small. So, you can afford to make a copy of a file. You cannot afford to make a copy of large parts of a database. So, it is not widely used and I do not know of anybody who has even worried about how to use it in a distributed setting. Hello sir, is there any graphical or simulation tool to visualize transactions and locking? Is there a tool to visualize transactions and locking? I do not know of any tools specifically for this, but in today's lab you are going to be doing exactly this by running transactions concurrently from multiple windows. So, you will do an update in one window then do something in another window and these all have to be part of the same transaction. So, in PostgreSQL you know you have to do this by a begin statement which ensures that the following statements until rollback or commit are all part of the same transaction. In other databases you have variants. So, you are actually going to be seeing this in action. So, it is not a tool it is just manually typing things into multiple windows, typing a command in one executing it, typing a command in another executing it and seeing what happens. Sir, one other question, which type of lock is maintained by PostgreSQL actually in implementation time? So, how does PostgreSQL do concurrently control? Let me rephrase the question that way. As I said PostgreSQL implements form of snapshot isolation. So, up to PostgreSQL 9.1 this was the snapshot isolation which I briefly talked about. The basic idea that when a transaction starts it gets a logical snapshot is supported by PostgreSQL. Even today PostgreSQL does do snapshot isolation. The only thing is that the checks which are done when the transaction does an update or decides to commit are a little different. These checks ensure serializability. So, this protocol is called serializable snapshot isolation. The core of snapshot isolation which is that reads are done from a snapshot remains unchanged. The only thing is that when a transaction wants to commit the checks are a little different and those checks ensure serializability. So, you will actually be seeing this in action today. You will see that although a transaction committed, another transaction which started before it and is still running is seeing data from a snapshot. It is not seeing the latest committed value. You will see that it is reading an old value because of snapshot. So, the snapshot becomes visible through the lab exercises which you will do today. Hello, good morning sir. So, I have one question. My question is how GFS is given from HTFS? GFS is Google file system while HTFS is Hadoop file system. So, GFS was the first of the modern generation distributed file systems. Now, distributed file systems have a long history. Back in the early 1980s to mid-80s there was a project at CMU called CODA which built a distributed file system and there were others even before. So, they have a long history. Google files. So, they were research tools and some of them were used in practice, but not widely used. A Google file system was the first very widely used distributed file system and Hadoop file system is essentially a open source clone of Google file system. So, it is not the exact clone. It is not exactly the same, but the basic ideas are the same. Good morning sir. I am from NLA Institute of Information Science and Technology Gopal. My question is from Surveillet session. The question is sir, why are restarting of web server is necessary every time after changing in Surveillet? The question is why do you have to restart the web server every time you change the Surveillet? So, if you remember the Surveillet basically gets compiled into a class and that class is dynamically loaded into the database server. Now, when you make a change the database server has to replace the current version of the class code with the new version. Now, the question is how is this done? So, it has to reload it and one way to reload it is to restart it, but you know many web servers have when you use JSP in particular they have a thing where they automatically detect if something is changed and they reload it. So, if that is available then you do not have to explicitly restart the web server. They will check time stamps and reload it whenever anything changes and underneath. But even in Tomcat you do not actually have to restart Tomcat. There is a Tomcat controller there is a web interface which we did not get into which allows you to reload just the Surveillets that you want. So, even if automatic detection is not there it is possible to reload just the Surveillets which you changed. But for simplicity we did not get into it, but if you want to run a production system yes these are useful. You do not want to shut down the system because one Surveillet changed. Thank you sir. The next question is what are the mechanism we can implement in Surveillet to reduce transaction time in web server and databases? Can it would be possible with stored procedures? So, the question is can what can you do to reduce transaction time and can stored procedures be useful in this? And the answer is absolutely yes. So, what are the issues here? So, if a transaction is if the Surveillet is doing a long transaction with multiple accesses to the database what happens is that every time you access the database a message is sent across the network something happens then a message comes back and all of this leads to some delays. The delays may not be much it may be a millisecond or even less than a millisecond, but they add up. So, each round trip delay if it is a millisecond it adds up over multiple round trips. So, one of the ways of optimizing this is to code all your logic into a stored procedure and then just execute the stored procedure. So, now there is no round trip delay and the database can finish the whole transaction may be in one millisecond. So, one round trip and the whole thing is done. So, that might be preferable if your Surveillet is doing multiple steps. So, that is one of the ways to optimize web interaction. So, it optimizes the database interaction and correspondingly the web in the user also sees the faster response. Good morning sir. My question is from Surveillet station. My question is cookies are stored in client web browser and with cross server script attacker can access the client or user cookie. So, is this any safe way to store cookie and can be used by only client not accessible by attackers? So, the is already a security thing built into cookies. A particular cookie can only be requested by a page from the same website or the domain which created the cookie in the first place. So, google.com stores the cookie. If you do a HTTP interaction with iitb.ac.in, iitb cannot access your cookie. That is the basic safeguard that is built into all web browser. So, the only way to fake it would be for domain to fake itself as google.com and then get the cookie. Otherwise, your browser is not going to give the cookie to somebody else. Thank you sir. My question is can you please define phantom problem with respect to oracle or any other commercially available databases. I mean you said that oracle has defined it in a different way and they claim that it does not exist. IBM did it long time back. You have Microsoft also. So, what is, is there any paper definition something of that thing? Yeah. So, the phantom phenomenon has been loosely described and you know different papers means slightly different things by it. The definition, I did not give you a definition. I gave you an example of a phantom where with tuple locking you know you could have a non-serializable execution. That is not exactly a definition either. So, the way the phantom phenomenon was described in the SQL standard was unfortunately loose and they defined it partly in terms of repeatable reads. So, the way they defined it they assume you use locking and the idea was that if you use locking and you run a query which says show me all students who have taken CS 101 and after some time you again issue the same query and if you did tuple level locking and did nothing more then the set of results could change. The exact example I gave you if you use tuple level locking I showed you non-serializable execution but the same example what it shows is that if T1 says find me all takes tuple with 101 it gets a result. If it again re-issues the same thing find me all things in 101 it sees a different set of results. This is what happens with locking. So, what the SQL standard unfortunately did is it said that phantom is where predicate read like this. Predicate read is one which says find me all tuples that satisfy a predicate. If a predicate read is repeatable meaning it will give same result every time you run it. They said there is no phantom phenomenon. The phantom phenomenon occurs if a predicate read is not repeatable. Now what Oracle did is it said look I am running off a snapshot. In that snapshot I do not even see the updates of other transactions. So, I will get repeatable predicate reads. So, the trick was the following the SQL standard unfortunately defined phantoms in terms of repeatable predicate reads. And that is not enough because with snapshots you get repeatable predicate reads but the core problem which I showed you of non-serializable execution will continue. The same problem will occur in Oracle. Whatever I showed you can be executed exactly as is and could well not quite the particular example I showed you depended on let me go back to that example. So, if you can see this on your screen in this particular example T2 did an update which T1 saw. It will turn out that with snapshot isolation T1 cannot see that update of T2 because it is reading from its own snapshot. So, this particular example may not cause a violation with snapshot isolation but you can construct other examples which show this problem with even with snapshot isolation and this problem certainly can occur with snapshot isolation with a slightly different example. And the core issue is that there is a conflict between a read here and an insert here or an update for that matter which is not detected by in the case of locking by tuple level locking. In the case of snapshot isolation the write are only checked against writes but not against reads are not cross checked and so there can be many reasons for non-serializability with phantom with a variant of phantom being one of their causes. That was not a probably a very clear explanation but I hope that gave you some idea of what the phantom problem is. Thank you professor that was nice. I think we will stop here for the C break.