 So, today what we are going to do is not just talk about how to do updates in SQL, but also bunch of stuff about transactions. I think the concept of a transaction was introduced on day one and we are going to go deeper into that today. So, the first one-third of today is stuff which will reflect on what you do in the lab and the other two-thirds is probably not what you will do in the lab. You may do a little bit, but these are concepts which are very important. You will need to use them when you build actual systems. So, that is an overview of today's three hours, advanced queries, updates, views which we covered yesterday and then transactions. So, the first topic of today is this business of join types. You saw natural join, you used it and we discussed what is natural join using relations and then in SQL you just wrote let us say account join depositor or account natural join depositor. What exactly happens and what if you need to do more than one join? That is one issue. The other is this issue of outer joints. Somebody had asked a question that if you join account and customer and a particular customer does not have an account, what happens to the customer record? It does not come out in the join. So, now what if you wanted to appear in the join? You want that to see the customer in the result, but with some value which would be what? For the account information, what value would you store? Unknown. Unknown. So, you wanted to appear in the result, but with null for the account information. How do you do such types of joints? These are called outer joints. So, here is a summary of the join types and there is another issue of what are called join conditions in SQL. So, I will come back to that slide to show the summary, but let us first of all see what are outer joints. So, here is a relation loan and a relation borrower. Now, look carefully at this and you will see that there is a loan number L 260 for which there is no borrower information here and correspondingly there is a loan 155 for which there is no loan information here. This is probably an inconsistent state of the database, but if I want to join these two tables and get that partial borrower information and the partial loan information, what do I do? So, there are several types of outer joints. Let us start with the simplest one. Before we get into outer joint, the normal joint is also called inner joint and you can use these syntax we saw before, select star from loan comma borrower where loan dot loan number equal to borrower dot loan number that would give this table here. So, what happened? L 160 which did not have a matching borrower vanished and then that borrower haze whose loan number was missing here, these two rows have vanished from the result. So, only the other tables are present. So, that is a normal joint and this shows another syntax in SQL for the same joint operation. So, you can say loan inner joint borrower on and you give a condition. So, the where clause condition can appear after an on and you can stick this whole thing in the from clause. So, you can say select star from loan inner joint borrower on loan dot loan number equal to borrower dot loan number, this is just syntax so far so good. Now, what we want let us say is we want to find loan information even if there is no matching borrower. So, I want to find out what are such loans and what is the amount in the branch, the only thing which will be missing is the customer name. So, how do I do this and the way to do it is as follows, I will say loan left outer joint borrower and now I have to give this condition here on loan dot loan number equal to borrower dot loan number. Note that the condition for an outer joint like this cannot be in the where clause, I will come back to why that is the case, but for an outer joint like this the condition has to be right here, you have to say relation left outer joint another relation on condition. So, the condition has to be there. So, now what has happened when I say loan left outer joint, this left part of outer joint says that the left input which happens to be loan here is preserved note apple from it is lost in the joint. If it matches, if a tuple in loan matches something in borrower it appears as in the normal joint. If it matches two things in the borrower relation it will appear twice. However, if it matches note apple in the right input by matching what do I mean matching on this condition, the on condition here. If it does not match anything there then you are going to output it in the result, what are the columns from the right input that is borrower here said to null. So, note that loan number actually appears twice here, this one is from where loan dot loan number, this one is borrower dot loan number and here borrower dot loan number and borrower dot customer name are both null. So, this is very useful in many situations where you want to get complete information about customers whether or not they have policies. These two tables the foreign key is not there just two primary keys are there in two tables and we are joining on that particular field. In this case if you had a foreign key from borrower to loan one of these situations that is the data here, this one where there is a haze with loan number L 155 which does not exist that cannot occur. But we have not defined a foreign key in the other way, we have not insisted in our database that a loan must have a borrower, a loan can exist without any borrower which is probably an error also. So, you could have a different kind of integrity constraint which says that if there is a loan here there must be at least one tuple in the borrower relation. This is not a foreign key dependency, but it is very similar to foreign key dependency, it is called an inclusion dependency unfortunately there is no standard syntax in SQL for it. There is a way to specify it using a check clause, we briefly saw a check clause. So, you can add in the check clause that this value is present in the other table and then you just hope that the database does it efficiently. There are many databases which do not handle check clauses very cleverly. So, the point is that foreign key is one type of integrity constraints, but there are others which are not directly modeled in SQL which may be required to prevent this sort of thing happening. However, in this case if it does happen then you do outer joints to retrieve partial information. There may be reasons you know in a bank with loans this is probably not going to happen. It is a bad situation, but there are other situations where you may have partial information. So, maybe when you have customers you have an extra table which is some external information about the customer. Let us say some information from the income tax department about that customer. So, that may be present for some of your customers, but you do not have it for all the customers. So, now if you do a left out join of your customer with that income tax information from an external source it makes sense. You cannot assume there is a foreign key. So, there may be information missing then you do an outer join, but here we are using a simple example just to keep continue using the schema which you are familiar with now. So, that one was left outer join. As you can imagine there are other variants of it. So, now here are another variant of the outer join, but this time we are using a natural join. So, what is the difference between let us take this top table. First let us start with natural inner join which is the same as the natural join. The inner is implicit. So, in your queries yesterday you wrote some queries which said account natural join customer. That is completely equivalent to saying natural inner join and this is the same result which was there on the previous slide almost. What is the difference? This result and look at this result. In fact, the result is the same, but when you do a natural inner join duplicate columns should not appear. I do not know how this got in here it should look like that. The natural join removes duplicate columns. Similarly, on this one if you notice this time it is a natural right outer join. So, first let us look in the right outer join part. In the right outer join the right input is preserved. The left input may not be preserved. So, here what we are guaranteeing is that every borrower tuple is going to appear in the result. So, we had borrower had what? A loan number and a customer name. So, what were the tuples that were there in borrower? L170 zones, L230 Smith, L155 Hayes. So, all those are present. However, the attributes from loan might be null if there is no matching loan. So, in this case L155 did not have a matching loan. So, branch name and amount are null. Not one other difference from the previous one. This was left outer join, but in terms of the attributes in the result what is the difference? This was just a plain left outer join on. This guy is a natural right outer join. What is the difference in the? Loan number appears only one. So, once you say natural repeated columns are equated. So, if the same column name appears in the left and right input implicitly the condition is that they are equal. And once they are equal there is no point outputting them twice and so, the definition of natural join says replace them with one copy that is it. I want to know what will happen if the attributes are different in the tables and only the key is same like you got my point. The common name is there in the right table, but it is not there the attribute is not available in the left and the same vice versa like what will happen in. If there is no common attribute at all between the two tables what happens? Only the key is the common thing other things are totally different other attributes. It does not matter whether the common attribute is a key or not. In any case you will do a join equating them. Now, if the common attribute is a key that means one tuple from here can match at most one tuple from there that is all. If it is a key of both relations that is simply a special case nothing really different about it is just the same as this so, it does not matter whether the attribute is a key or not. This query illustrates actually two different things. First let me describe the outer join part of it. This query says loan full outer join borrower using loan number. So, first of all full outer join means that tuples from both the relations are preserved using list a set of columns which should be equated not preserved bigger pardon using list a set of columns which should be equated. So, in this case we are saying equate only loan number you do not have to equate even if there is another column which is the same age between these two tables they are not equated. So, that will avoid the kind of problem which you are talking about where you have some extraneous attribute which just happened to have the same name and it ends up causing a mismatch. So, here with the using you ensure that only loan number is taken into account and then you get this result. In the other table if loan number is not there that particular entry is not appearing those will not come. That would be an error it will say that you are using this attribute loan number in the using clause, but it is there in only one of the relations. So, that is an error the SQL compiler will catch it at that point right. So, now from this table you can do something like if you want to find those who have an account or a loan, but not both what would you do. So, in this table what you have for those who have both you have tuples which have both a loan number and account number. And for others you will have null for account number or null for loan number. On this result if I want to extract those who have only one of an account or a loan, but not both I can add a where clause condition which is what? One of them should be null. So, you can say in the where clause that loan number is null or account number is null you cannot say equals null you have to say is null. And notice that that condition is in the where clause not in the on clause or the using clause that has to come later. So, now let me come back to the question of why should these conditions be in the on or using as opposed to in the where clause. And the reason is as follows when you have regular join you have a set of relations and then a set of conditions in the where clause. And the semantics is you take a cross product first then apply the conditions. For outer join that is not quite right, that is not how outer joins work. You have to take for each tuple see if there is a matching tuple in the other one, but matching on what condition that is the issue. And that condition is specified in the on clause and that is why you have to provide it right there. And outer join is defined for two relations only. You can't it does not make sense to say outer join of three relations. Whereas you can say cross product of three relations is well defined. But an outer join will preserve the left input or the right input of both. Now, what is a three way outer join? It is not defined. You can do an outer join with three relations if you do it in two steps. If you have relations a, b and c, you can take first of all a outer join, say left outer join b and then on that you can do outer join c. Or you can do maybe b outer join c and then outer join a. But the results are not the same. Even if you keep the same order, you can say a outer join b, outer join c. Let me write it here. Can I get this input? So, this symbol over here is the left outer join symbol. So, let's say a, b and c are relations. A left outer join b, left outer join c. So, that is one possible query. Now, another possible query is a, left outer join b, left outer join c. It turns out these two are not the same. The results are not the same. I won't go into why this is the case. But just believe me that the results are not the same. And it should be even more clear that the results of that are quite different from b left outer join a, left outer join. Here it is very clear that the results are different. Because here the tuples in b are preserved, whereas here the tuples in a are preserved. And if you compare these two, it turns out that tuples of b are not preserved here. And over here, tuple of c which joins with a will be preserved, even if it does not join with b. Whereas here, a tuple of c which joins with a may in fact vanish from here. So, let's, I am trying to give you some intuition for why this and this are not the same. So, for outer joins, you have to do two at a time. And that is why the syntax used is slightly different. So, here is a actual query which uses an outer join in the from clause. Outer joins always appear in the from clause. So, what has happened here? We have done depositor natural full outer join borrower. And that is in the from clause. And then from that we can select whatever attributes we want. So, here we have selected customer name. And here, we did the query which I talked about a few minutes back, where this is null or that is null. So, this whole syntax is restricted to appear in the from clause. And you should put brackets around it, so that it is an ambiguous. So, after this you can do one more. You can say natural full outer join, it should be properly bracketed, so that it is clear that you first outer join depositor borrower and then do the outer join with customer. Any questions? The syntax for full outer join, if we omit the word natural. So, that is still valid? Not really. You have to use one of natural on or using. If you say on, the condition is explicitly specified. If you say using, it is on the specified columns. If you say natural, it is on all. So, coming back here, you can see that the join condition is either natural on or using. And the join type is either inner join, left outer join, right outer join or full outer join. So, any combination of this is allowed. You can say natural inner join, natural left outer join and so forth. Or inner join on predicate, inner join using, right outer join on, right outer join using. Now, this is another kind of query, which is top K query. If I want to find the top few results in some order, without looking at all the results in PostgreSQL as well as in Oracle and SQL server, I can do this. Although the SQL standard does not allow this, most databases support this syntax. So, here I am saying select star from RS, where r dot s equal to, I mean r dot b equal to s dot b. That is anything. It is just a regular query. I have an order by clause and then I say limit 10. So, what will this give me? It will give me the top 10 or bottom 10? Ascending. Ascending. Therefore, bottom 10. It will give me the ascending to the lowest one. It will give me the bottom 10. If you want the top 10, you say descending. If you say nothing, if you do not give an order by clause, what will happen? Ascending. No, no. If you do not give an order by clause. It can be anything. It can be anything. Correct. So, if you do not give an order by clause, if you just add a limit, it can be anything. So, sometimes you just want to look at a relation, but you do not want to see the entire set of rows. You can do this. So, let us run this here. Let us take the same query and add a limit 2. You see just 2 rows up here. So, if I believe that the answer is very big, I do not want to see all the answer, but I want to see a few rows from the answer. I can add that limit. If I want to find the top few rows, I will add the order by. So, in this case, let us say I want to order by name. So, I get Amitabh and Charles. If I had increased the limit, I would get more names. Now, we will move on to the next part of today's presentation, which is on modifying the database. And how do you do modifications using SQL? Then for any modification, you have to specify a condition, and then you specify the update or delete, insert, whatever has to be done, or you can provide values to be inserted. There is a completely different way of doing modifications, which you can do from a programming language. You can have a query, open a cursor on that query for update, and then step through that cursor, and then for each record, you can decide whether to update that record or not. This mode corresponds to what you might have done in COBOL. You must be doing it all the time. So, you take a file, you step through it to the record that you need, either with an index or sequentially, and then update that record in the file. So, SQL, when it is embedded in COBOL or other languages, allows you that mode of operation, and people do use it. It is quite useful to do things that way. I am not saying that is a wrong way of doing things. But if you want to do it directly in SQL, then this is how you do it. And this is what is done more often now. People are more comfortable with this. So, here are several things which you can do. Let us start with deletion. So, this says delete all account tuples at the peri-reach branch. So, delete from account where branch name equal to peri-reach. All the account tuples vanish. Now, what will happen to depositor tuples when this happens? If the account vanishes, what will happen to the depositor tuple? The foreign key constraint. So, if there is a foreign key constraint, this will not be allowed. However, if you specify the foreign key constraint, you can actually add a clause to the foreign key constraint, which says that depositor account number, foreign key references account. You can add a clause in there which says on delete cascade. In that case, what will happen is when you delete the account, the depositor tuple will also get deleted. It is called a cascading delete. I have not covered it in the talk here, but you can look it up from any manual. Yes, which we discussed now. If customer table is the primary one, having the customer ID as primary key, our account number as the primary key, which one if you delete it will cascade? If you delete a record, whatever record you delete, if somebody references it, they will get deleted. They will get deleted. But if you reference something, that will not be deleted. So, in this case, let's say that I have a depositor record. The depositor record references account and customer, both. So, if you delete the depositor record, the account is not going to get deleted, the customer is not going to get deleted. Even if you have specified on delete cascade, because the direction of the primary key is from, let's say that this is the reference relation, this is the referencing relation, so it's pointing this way. If you delete this fellow, it doesn't affect this guy. If you delete this guy, it affects this guy. So, one of two things happens. If you have a foreign key, if you delete this guy, the system says, sorry, somebody is referencing you, you are not allowed to go away. But if you have specified on delete cascade, then the system will delete the referencing guy also. But in this case, we have not specified on delete cascade. So, what do we do? What if I do want to delete accounts from the Perirage branch? If I just run this, it's not going to succeed. What do I do? How can I delete accounts at Perirage branch? Given that I have a foreign key constraint without a non-delete cascade. I have to first remove it from the depositor. That's exactly the solution. So, first I will run a query which finds the depositors which should be deleted, delete them, and then come back and delete from account. So, here is another query which says, delete all accounts at every branch located in the city need have. So, what this shows is the where clause condition here can be local to this or it can be a sub-query. So, in this case, delete from account where branch name in, select branch name from branch where branch city equal to need have. So, you can have nested sub-query. You can't do a join here, of course, because delete from a single relation. But the where clause can have nested queries. This is quite useful. Here is another query. Delete the records of all accounts with balance below the average at that bank. Now, this is obviously a silly query to update. You wouldn't actually want to do this, but this shows you what you can do. How do you do this? So, let's say we write this query. Delete from account where balance less than select average balance from account. Something funny could happen depending on how it is executed. So, let's say we start with one tuple, with the first tuple in the relation. We find the average balance. See if the amount is less than the average balance. We delete it. Now, we come to the second tuple. If I compute the average balance again, it's going to change. Now, I have a problem. Depending on the order in which I delete tuples, a different set of tuples can be deleted. If I first chose A, then the average may change by some amount. Then B may not get deleted. But if I chose B first, then the average may change in such a way that A still gets deleted. So, depending on which I delete first, the final result can change. That's very bad. So, what is the solution? SQL will not recompute this subquery every time. It computes it once in effect. In fact, the way to look at it is, it doesn't actually do any deletion. It first finds out which all tuples should be deleted. It runs this subquery. It doesn't delete itself. It just marks it for deletion. And then after it finds out what all tuples have to be deleted, then it goes once and deletes all of them without recomputing this condition. So, as I just told you, first compute this part, compute the where plus condition, find out which all tuples to be deleted, and then delete all the tuples without doing any further testing. So, the opposite of deletion is insertion. And for insert, you just do the following. Insert into account values, and then you can give the values that you want. You can also write it like this, insert it into account, and then list the attributes which you are going to provide, and then provide values for those attributes. What if you omit an attribute from here? What value will it take? It will be null. Unless you have said not null, or it's a primary key, in which case this insert will fail, saying you are trying to store null in a value which cannot be null. In fact, SQL also has a, when you declare an attribute, you can even give a default value for it. Maybe the default should be 0. When you create a new account, maybe the value should be 0 by default. So, if you don't specify the balance, it will be 0 by default. If you have specified that default in the table declaration. If you want to explicitly set balance to null, you can do that also. Insert into account values, and here you give null. Note that there are no quotes. Strings are enclosed in quotes. Integers and numbers in general are not in quotes. They are just typed as numbers. Null is not in quotes. It's a special name which is not in quotes. So, now here is another query which does the following. This illustrates that you can insert a large number of tuples at a time by running a query and inserting the result of that query into another relation. So, this is illustrated here. First look at the query, and then we will see what it means. Insert into account, and now here is a select query. Select something from something where something. So, this is a general form of an insert which can insert multiple things from a query result. You can also have a variant of this values one after another, many different values. Those have to be explicitly listed. Whereas here, you use a query to create something. So, in this case, for whatever reason, we have decided to provide as a gift for all known customers of Peri-Ridge branch, a $200 savings account. Now, if you make an account, you have to give it an account number. This is a problem. So, the solution used here is a shady cheating solution to keep the query simple. In reality, you would have a procedure which creates accounts. It will have a sequence number for accounts and create a new account number on the fly. But here, to keep it simple, we have just assumed that let the loan number serve as the account number. This is to illustrate the query. So, the loan number and account number are both in our schema which we have used. So, that is fine. So, now you say insert into account loan number branch name $200 from loan where branch name is Peri-Ridge. This matches the first attribute of account which happens to be account number. This matches the branch name attribute which is the second attribute. And this corresponds to the third attribute which is the balance. So, that adds into account. And now we also need a corresponding thing in the depositor. So, customer name loan number is the same thing. In this case, the loan number becomes the account number which we have also used here. Except here, we had to do one more step because to get the customer name, we had to join with loan number. So, if you this query actually does a join. Whereas, this one did not need to do a join. All the attributes were available in loan itself. And again, the select class here is evaluated fully before anything is inserted into the relation. Why does that matter? Because you are allowed to do something like this. Insert into table 1, select star from table 1. What does this do? If you kept running the select on the fly, then insert it up and it appears and there gets inserted again. You get into an infinite loop. So, here again you compute this query completely get the result and only after that you start the insertion process. Now, here is another thing which is increase all accounts with balances over dollar 10,000 by 6 percent and all others by 5 percent. So, now how would you do this? Well, you could write two update statements. So, if the value is greater than 10,000, set balance star 1.06. So, this is the syntax of an update statement. Update table name, set and then you can have any number of these column name equal to expression. You can have multiple of these, you can update multiple columns and then where some condition. So, in this case we just updated the balance to this and the condition was balance greater than 10,000. Now, this fellow does what? For those which are less than or equal to 10,000 you set balance to 1.05 times this. Now, this is a very tricky query. This query works, but supposing you flipped these two what will happen? So, the balance was just under 10,000. So, first you give 5 percent then you give 6 percent. So, the customer may be happy, the bank is very unhappy. Okay. So, when you do an update like this you should not do it really using two statements it is dangerous. There is a way to do it which is shown here. There is a conditional update. So, you can express that by using a case statement. Update account set balance equal to and then the case statement when some condition then something else something else. So, this value is returned by the case statement depending on whether balance is less than 10,000 one of these two is done. So, this is the right way of doing that it is not a good idea to write two updates do it in one. Now, in these examples the value to be updated is available locally in the tuple. So, I am taking the value from this tuple and multiplying it by a fixed value. What if the update has to be done taking into account a value from some other tuple in maybe in some other relation what would you do? Take this I am updating the value using the old value and multiplying it by a fixed amount just take this query. Supposing instead I have another table which says for each account how much interest to give and I want to update account by using information from that other table. What do I do? What you will have to do is over here instead of saying balance times 1.06, I will have a nested sub query there which returns a single value and that nested sub query can use you have to match it of course. So, you have to look for the corresponding account number in that other table. So, that nested sub query can use the account number from this account that is a common mode where you have a nested sub query. We saw nested sub query in the var clause, but nested sub query in the set clause is also supported. In this example we have only two things one is over 10,000 dollar to 6 percent and then 5 percent. If you have more than 2 6 percent, 5 percent, 3 percent. Here use the case statement and have multiple var when something something and this is the else case where none of the conditions is true then this is the last one. So, use the case statement. The case statement is available with only update or with insert select also. It is available with anything it is a general construct in its queue. Again you may find some databases do not support it, but it is reasonably widely supported. It seems to be powerful means it can make life easy. Yeah, absolutely. What all can we do with a view? We saw that we can use a view in a query. The next question is can you update a view? Can you insert something into a view? Can you delete from a view? Can you update a view? And it turns out this is a very tricky question. So, take this view. It is actually a very simple view. It just takes loan and projects out only loan number and branch name. So, what is not shown? What is not there is the select clause here. The amount for loan is amount. So, loan amount is not available. So, this is a view you might make available to somebody. How do you make views available? We will see later. There is a grand thing. But now, if you try to do the following. Insert into branch loan, values L 37 peri-rich. What will the system do? The view is not an actual relation. So, it cannot actually put a tuple in that relation. So, the best thing it can do is go and update the underlying relation which is loan in a way such that this will appear in the view branch loan. And I can do that actually in this case by inserting L 37 peri-rich NULL into the loan relation. Provided of course that the loan relation allows the amount to be NULL which in a real bank cannot be the case. One loan to be NULL. But if you allowed it, then a database might allow you to do this. So, this is a relatively easy case. Here is another one which says create view B as select loan number branch time amount from loan where branch time equal to peri-rich. So, what is this view doing? It is giving me a subset of the rows of that table. If I insert this view this particular table L 99 down-to-23 before we get into down-to-1. Supposing I insert L 99 peri-rich 23 can that be implemented? This update of the view how will it be implemented? It is not difficult. Go insert that into the loan and that will appear in the result. Instead of peri-rich, I have put down-to-1 here. Now, there is a problem. If I insert loan it will not appear in the view because the view only contains peri-rich. So, it is not actually possible to insert this into the view. So, the database will actually reject this. It will say you cannot do this. The simplest solution might have been to say that you are not allowed to update any view at all. No updates are allowed. Then life is easy. But then there are some situations where people need to be able to update the view. So, for example, if you give this view to this branch you want to allow the branch to insert tuples into their view. But insert their own tuples. They should not be inserting tuples for some other branch. And therefore, database will do support these kinds of things which will work as long as the condition here is matched. If it is not, it will be rejected. Sir, I am still not able to understand why do we need to update a view. Insert a value into the So, the simplest solution is say no updates on views. Forget it. But there are situations like I said where you may want to allow a bank branch. But it will not show the true picture of the relations which we are using. That is right. This particular view is allowing you to see only records at your branch which may be reasonable. So, if you have a bank which does not want a branch to go and see all sorts of other records or if it is not a bank, maybe it is a company where you allow a particular office to see only sales from their office, not from other offices. That may be secret. You do not want them to know about it. So, then you might give a view like this and still allow them to update it. So, this was one case where you could still do certain updates and others are rejected. Now, there are other cases where you cannot do anything. So, this view, all customers let us go back up and see this. All customers view does the following. It is a view consisting of branch and their customers. That customer might have a loan or an account or both. We do not know. So, what the view does is it takes all the depositor account pairs and selects branch name, customer name. Similarly, borrow a loan pair and selects branch name, customer name. So, the result of the view has branch name, customer name. But you do not know whether that particular customer had an account, a loan or both. And now, if you try to insert into all customer values, period is drawn. This database has no clue what to do. Should it insert an account? Should it insert a loan? In either case, it would have to insert a null value. So, this update cannot be handled at all. So, database systems, if they see a view containing union, will simply reject it. In fact, unfortunately, some database systems do not even support views with unions. Forget updating them. But even if they did support view with union, they will not allow it. Similarly, a view with a join, most database systems will not allow you to update them. Although there are some special conditions under which it is still possible, many database systems will not allow you joins in the view. Only select views are supported typically. And aggregates also mess up the whole thing. So, no aggregates, typically no joins, although some do support joins. So, that was updates and views. So, with this we are going to move to a different phase of this talk, where we are not going to look at queries anymore for the rest of today. Rather, we are going to look at a whole bunch of concepts related to transactions. Using cursor for selection or update, like we are doing it also from the cobalt side. But what I have observed is that resources taken is so huge. So, during your lecture, can you throw light on that? Like it takes a lot of time to open a cursor, then select it and then update it. Means what I felt was that for practical point of view, it will not be feasible like. No, I think the problem which you had may be related to the question he had. If you have open a cursor with a where clause condition and there is no index available, then it will take a long time. It is going to search through the whole relation to find matching rows. So, that is probably the reason for your problem. We can discuss it in more detail afterwards. So, before the tea break, let me introduce you to this notion of transactions. You probably saw a little bit of this earlier, acid properties, but let us go over it again in any case. So, what is the transaction? It is basically a unit of program execution that may access and update various things in the database. So, a small example of a transaction transfers 50 dollars from account a to account b. So, here it reads a modifies a, writes it back and this guy reads b modifies it and writes it. It is the same transaction which does both. So, obviously, this is a simple transaction. You guys in LIC will have a lot of transactions going on. When you issue a policy, you have to create a policy record. You have to create a record for the payment which is received. You may have to create a record for something which tracks when to send reminders to this person. So, there may be many relations which are updated when a policy is created. All of these together is conceptually one transaction because when you interact with the customer, all of those steps should happen. It should not be that you take money from the customer and do not give the policy or give the policy and do not take the money or taking the money in the physical act. From the view point of the database, correspondingly you update something in the database saying money received. So, you cannot issue a policy without putting money received. You cannot put money received without issuing the policy. They are a unit. They are a transaction. So, as long as the system is running properly, no crashes and so on, generally there is no major issue here. However, there are two major issues to deal with when you have transaction processing. The first is failures of various kinds. Power may go off in the middle of a transaction. Your computer system may have a problem in the middle of a transaction. It may die for some reason. What to do in the presence of such failures? The second kind of problem is because multiple transactions may execute concurrently. Why should they even execute concurrently? I will describe that a bit later. So, coming back to this simple transaction, what are some of the requirements of this transaction? The first step is atomicity, which I just described. So, what you want is that either all the updates of this transaction are done or it should appear that none of those was done. That is atomic. Atomic in the sense not divisible in the old meaning of atomic. Of course, we know atoms are divisible, but here the old meaning of atom was something which cannot be divided. So, a transaction should not logically be divided. Physically when it runs, of course, it has to first execute one update and then the other. You can't prevent that. But what should happen is that if a transaction has partially executed then its updates are not reflected in the database or equivalently they are removed from the database. So, that after cleaning up, it appears as if the transaction never ran. So, you cannot help a failure in the middle, but if a failure happens in the middle, you should undo whatever was done to restore the state. That is the idea of atomicity. So, if you don't do that money will either be lost or generated here. So, that is atomicity. The next requirement is durability. Once this transfer has taken place the customer has been told your money is transferred if the database somehow loses this update. Let's say that both of these updates were done in memory and now power fails. So, the disk is not updated and power comes back on you forgot all about the transaction. Now, the customer is going to be very unhappy if this happens. B may issue a check which bounces and land up in jail. So, you can't afford that. So, it has to be durable. So, the updates must persist even if there are software hardware failures and then there are one more requirement which is consistency which is actually a slightly different requirement. This is not the job of the system although it comes under the asset property set. This is usually something which is the job of the programmer rather than the system. So, in this case when you transfer money from an account to another the total amount of money should not change. You can't, the bank cannot just generate money from out of the blue. Money has to come from somewhere. You can move money between accounts but it cannot just appear from or you can accept money from outside but it cannot suddenly pop up in the middle. So, that is a consistency requirement and your transaction code must maintain the consistency requirements. Each domain has a different set of consistency requirements and the code should make sure that they are satisfied. So, there are many kinds of consistency requirements. Some are primary key or foreign key. These are explicitly specified to the database and the database ensures that they are satisfied. If your transaction violates that, the transaction is aborted. We will see what that means. However, many others are implicit. For example, the banks let's say this is a very simple bank. All it does is get in cash as deposit and then hand out cash as loan. That's all. Then the sum of balances of all accounts minus sum of the loans which have been handed out must be equal to the cash in hand. This is a real life constraint which must be respected by the data and the database. Cash in hand is also a variable in the database which reflects physically the cash in hand and these must be consistent. How do you ensure this? Again, your transaction code has the responsibility for ensuring this. You cannot specify an integrity constraint in the database which enforces it but your code must take care. We will assume that that is the case. What happens is that if a transaction when it starts must see a consistent database while the transaction is executing the database may be inconsistent. After this step, money has vanished from A. It's not yet gone into B. Money has temporarily vanished. The database is temporarily inconsistent. But after the next step and I mean the end of the transaction, the database must again be consistent. This is the guarantee that the transaction gives the transaction code. If the database was consistent, when it started, when it finishes successfully the database will be consistent again. If it doesn't finish successfully, what should happen? Everything that it does should be undone. That is atomicity. But consistency is when it succeeds the database is again consistent. Regardless of the order in which you run A, then B, then C or then B, C, A it doesn't matter. Each transaction starts from a consistent database takes it to a consistent database. The next one of course sees a consistent database so it will also leave it in a consistent state. If individually transactions guarantee consistency regardless of the order in which you run them, overall you will have a consistent database. This is an important property. The last property we have seen atomicity, durability then we have seen consistency. Now we will come to isolation. In fact we are going to spend a lot of time on concurrent access. So take a simple example. This transaction is the same one it is transferring money. Let's say there is another transaction which reads A and B and then prints the total. In this case it is A and B. Maybe it looks at all the accounts and loans bank and adds up all the things and prints that. Adds or subtracts as appropriate and prints the value which should be the cash on hand. Now what is going to happen here? You have removed from A and you have not yet added to B. So this guy is seeing an inconsistent state of the database. So isolation is the idea that a transaction should not see an inconsistent state due to other transactions. It should either look as if the other transaction never has not yet started or that it has finished. So this transaction T2 cannot see an intermediate state of T1. Either it should see the state before T1 executes or it can see the state after T2 executes. Either is fine but it cannot see this intermediate state. Now isolation can be ensured very easily by just running transactions one after the other. First run T1 then run T2 after T1 finishes. This is probably acceptable for a small branch. The transaction takes a fraction of a second. So even if you have 10 counters if it takes one second for a transaction and there are 10 counters you run them one after another it is still acceptable. However when you have a large centralized system with multiple things coming in this will waste resources tremendously and we will see this later. These are basically these things that we saw. Atomicity, consistency, isolation and durability and these are formally defined here. Atomicity either all operations are reflected in the database or none are. Consistency is execution of a transaction in isolation. At this point we do not worry about other transactions coming in and messing around. If you execute it with nothing else happening then it preserves consistency. Isolation this is actually a little hard to define. There are different notions of what is reasonable isolation. So this is one kind of notion although multiple transactions may execute concurrently. Each transaction must be unaware of other concurrently executing transactions. They should not be aware that something else is running and therefore intermediate transaction states must be hidden from other concurrently executing transactions or equivalently. For every pair of transactions T i, T j it appears to T i that either T j finished execution before T i started or T j started execution after T i finished. So T i cannot see an intermediate state of any transaction T j. And durability after a transaction complete successfully the changes it has made persist even if there are system failures. Now of course this depends on what is a failure. An example which is there in the book is what if a black hole swallows the earth. All data is lost. We are all lost. And of course the update of the transaction is also lost. So there are limits to what you can do for durability. Of course you could maybe replicate it on another star or something completely invisible but it is just stretching the example to a silly point to a silly level where obviously you cannot guarantee durability. However you should not lose data on a small common error. Common errors are dys failures, a common error. Another common error is a network failure. We keep having that every 5 minutes here. You cannot afford to lose data on these standard errors. Another common error is a fire in a room. It happens not that common but it happens. You cannot afford to lose data so we will see what you can do to replicate data remotely so that you can guarantee durability even if this place burns up it is still there. But if both are blown up well tough luck. So we will stop here.