 So, that was the short question break, let us progress on with more topics from this chapter. The next topic is transactions and there is a lot on transactions coming up later in this course, but at this point I am not going to spend a lot of time on it, but just briefly mention what is a transaction. Transaction is basically a unit of work, if you recall I told you on the first day that if you go to a bank and withdraw cash, for you that is a transaction. If you go to a bank and make a payment for your mobile bill from your ATM, what is it doing? It is deducting from your bank account and crediting it to your mobile bill. So that is a transaction which involves two updates. Now what you need for this is both the updates should succeed. It should withdraw from your account, it should credit to your mobile bill account. Now a transaction which spans two entities, your bank and your mobile phone service provider is more complicated. So let us stick to simpler transactions for now which happen within a single database. So if I want to transfer money from my account to your account, then I need to tell the bank please deduct 50 rupees from my account and credit 50 rupees to this person's account. Both these steps should happen. If not both should fail. If one happens, if the bank deducts 50 rupees from my account and does not credit it to the other account, then I will be unhappy the bank has eaten up my money. On the other hand, if the bank credits your account for 50 rupees and does not debit my account, then the bank is going to be unhappy because it just lost 50 rupees. So what we want is the two updates should appear to be atomic. Now in reality it is not possible in any computer system to do two updates exactly at the same time. Therefore, it is a job of the recovery system of the database to ensure that if anything goes wrong in the middle of a sequence of updates which form a transaction, then all the steps up to that point are rolled back. They are undone. So that is the job of a database system. So an atomic transaction is a sequence of steps which are either fully executed or rolled back as if none of the steps are correct. So you compensate if I deducted 50 rupees from my account and you could not credit it to the other account for whatever reason, you will credit 50 rupees back to my account and settle that. So that is one aspect of a transaction. Another aspect is to prevent concurrent transactions from stepping on each other's toes. So if two people are concurrently trying to update my balance, there could be trouble. We will see this in more detail later. The programmer who is using SQL should not have to worry about concurrency, but the programmer needs to tell the database what is a transaction, what set of steps together constitutes a transaction. So in the SQL definition standard, by default a transaction begins and all the steps which you execute consecutively are part of the same transaction until you either say commit work or roll back work or just say commit or roll back. The keyword work can be dropped. This is standard. Nobody implements the standard. What do real databases do? They treat each SQL statement which you submit as a transaction on its own. So if your SQL statement updates 100 rows, it will run atomically. Either all 100 rows will be updated or if something goes wrong in the middle, it will roll back and no row would have been updated at the end. So all database systems will treat a single SQL statement as a transaction. The question is how do you force the database to treat a set of SQL statements as a transaction? Now here the syntax, again SQL has a standard which I told you, in fact there is another standard which says, in SQL, which says if you say begin atomic and then have a set of statements and then say commit or roll back, all of those should be one transaction. Again this is not supported directly in most database systems. So for example in PostgreSQL, you will be doing this later on. You can say begin and then have a set of things which are all part of the same transaction. It says like the syntax here begin atomic and there is no end. You say begin, drop atomic, execute statements and then say commit or roll back in PostgreSQL. And all of those statements in between will be treated as a single transaction. Another way of doing it, if you are submitting the queries from an API such as JDBC, JDBC containers realize that each database has its own syntax. Therefore, they provide an API call which basically says on this connection turn off auto commit. What does that mean? It says do not immediately commit each SQL statement by itself on this connection from now on. Then the connection will receive a series of SQL statements and then you say connection dot commit or connection dot roll back. We will see this later today. So the point is that using the API, you can define what is a transaction. So whichever way you do it, transactions are an important thing for any programmer who is building an application. Many times people forget to create transactions and can leave their application in an inconsistent state like an employee record when employee joins, maybe you should have put it in two places. But because of a failure, it goes in one place and does not go in the other. This is a little dangerous because you normally will not notice this happening. 99.9 percent of the time everything will go fine. And then in that one case when there is a power failure at some critical point, things go haywire. And then a very important central database system has an inconsistency and all kinds of problems can arise. So you have to be careful about transactions. So coming back, we have a little bit of time before the break. And what I want to do is quickly cover integrity constraints. We have already discussed some of these. We have seen the not null constraint. We have seen the primary key constraint. There is also a unique constraint which declares something to be a super key. And then there is a check constraint. So let us just go over those. Not null. We have already seen. I won't repeat it. Unique. You can just say unique and list one or more attributes which declares that those attributes form a super key. That is no two tuples should be exactly the same on those attributes. Now there is a, actually we say it forms a candidate key, but SQL does not actually have a way of checking if it is minimal. So it is probably better to call it super key. Now note that when you say unique, the system allows you to put null values for those attributes. In contrast, if you say primary key, null values are not allowed. So it is possible for two tuples to be, to have all the attributes a1 through am, all of them being null. That is not a problem. The system will allow both of those tuples to exist. So unique is occasionally useful when you want two things which are unique. So I want id to be unique for an employee. I also want an email id to be unique for an employee. So I may declare id to be the primary key and I may declare email to be unique to make sure that there is no mistake where two employees get the same email id. Now the, moving on to the check clause. Check can be thrown in as part of an integrity constraint. Let's look at the bottom of this table. Create table section. There are a number of attributes. I am going to skip all of those and go to the very bottom and it says check semester in fall, winter, spring, summer. What does this check clause do? Whenever a tuple is inserted or updated, it will check if the value for the semester attribute is in one of these listed values. If it is not, the check fails. What happens is the check fails. The transaction is rolled back. That update fails. So, you can see that this is important, otherwise people can store any old garbage in the semester field. So maybe somebody will say fall, somebody will say autumn, somebody else will have an typing error and say f a l and so forth. Then your data is really messed up. So check constraints can be quite useful. In this particular case, the semester in fall, winter, spring, summer, some of you will no doubt say why do you need a check constraint here? Why not create a semester table? This is often called a master table, a semester master table which lists what are all the possible semesters. So the semester is a primary key in that semester master table. If you had such a semester master table, what can you do here? You can just declare a semester is a foreign key referencing that table. So immediately whatever values are there only can be here. In fact, that is probably a better solution for this case because tomorrow if I want to add a new semester, I can add it. This happened in IIT Bombay, initially there was no winter semester. We had only autumn, spring and summer. Then we had to add a new semester and later we had to model certain projects which span a summer and an autumn. So they actually created a semester corresponding to this longer term because each course was supposed to match to a semester. So if by creating a master table, we get this flexibility of adding new ones without going in changing the schema. On the other hand, if I want to make sure that the credits for a course are not garbage. So I know from my domain knowledge that credits cannot be zero or less than zero. That does not make sense. Zero may make sense. Occasionally you have a non-credit course. Less than zero certainly does not make sense. I also know that credits are supposed to reflect how many hours you spend on something in a week. Now what is the maximum number of credits possible? I assume 144 is the number of hours in a week and it is 24, not 144, I am sorry, 24 into 7, 168. So that is the number of hours per week. So credits cannot go more than this. Deducting time for sleeping and so on obviously is going to be much smaller. So you could put a constraint that credits cannot go beyond say 50 or 60. So that is a limit. So the check clause can check that credits are greater than or equal to zero and less than or equal to let us say 70. So that is a pretty useful check clause where I cannot have a foreign key with a master table. So in fact if you look at our university schema, there are quite a few occurrences of check clauses to do some such validity checks including this one which we have shown here. Referential integrity is just another name for foreign keys. We have already seen foreign keys. What I am going to do is show you a couple of extra options that SQL allows with foreign keys. The first one here is what we have seen before. Course has a attribute called department name which references department. So that is fine. Now this one is a modification of the previous one which in addition to saying department name references department it says on delete cascade, on update cascade. What does that mean? Cascading updates means that supposing I delete a particular department and I have declared course to be a foreign key on delete cascade. Then if I delete the department all courses in that department will also get deleted. If I did not specify on delete cascade what would happen? The department deletion would simply be blocked. It will say sorry I cannot delete this department because there is still a course referencing it. Now it probably does not make sense to go around deleting departments but in this domain we had to create an example like that. But let us take a different one. I have a purchase order, a bill which has to be created, a bill has many lines in there what are the things you bought. If you store it in a normalized schema you have a bill master and then you have a items in the bill, another relation so there are two relations with a foreign key reference. Now maybe I decided to cancel a bill and delete the whole thing. So if I delete a bill I should also delete all the items that reference the bill which are all the lines in that particular bill. So there on delete cascade would make a lot of sense. Similarly on update cascade why is that useful? Maybe a department decides to change its name. We have had that happening in IIT. Metallurgy department at some point decided that people there are not just doing stuff with metals but they are also dealing with nonmetals and in general they became a metallurgical engineering and material science that name changed. Now it is probably a bad idea to have a name as a primary key because a name is long. In our university schema we used it to keep the schema simple we did not want to add more and more attributes. In reality you would probably have a department code or a department ID which is what would be referenced. So you probably will not go around updating it. But if there was a need to do it for whatever reason then if you specify on update cascade you go update the ID of a department from 5 to 11 then every tuple which references it will also be updated from 5 to 11 so that is what on update cascade does. There are alternatives for example you can say on delete set null or on delete set to default similarly on update set null or set to default. So those are cascading actions and they can be useful. I believe our schema has a few other examples of cascading actions. You can go see the DDL in the lab today. So now in while a transaction runs an integrity constraint may be violated. Now ideally if a transaction runs we know that while a transaction is running for example if transaction which is debiting one account and crediting another is definitely going to leave that have an inconsistent state in the middle when it has debited 50 rupees from your account and has not yet updated the other account the sum of the account balances has actually changed which should not happen. So in the middle of a transaction certain conceptual integrity constraints are violated but what about foreign key constraints should you allow those to be violated in the middle of a transaction and the answer for all database systems by default is they will not allow these to be updated at any point. The moment you do an insert it is checked whether it violates the foreign key constraint. If it is rejected the transaction is rolled back immediately. This is perfectly good for most situations but here is a situation where I have a table person which has mother and father as attributes both of which are foreign keys referencing person. Now the question is how do I insert person? I have to set the mother and father values for that person. Now supposing I insert data, bulk load data for lot of people in some arbitrary order. So it is possible that when I insert a particular person then father and mother of that person have not yet been inserted. So those ID values I am storing the ID values in this record but right now the other records have not been inserted. They are going to happen later in this series of insertions. Now ideally the foreign key constraint should be checked at the end of this transaction not during the transaction. Now most implementations by default will check it immediately and will roll back. So what do you do? So there are several options. One option is to order these inserts so that you will insert the parents first and only then the children. So you go in the history order. So you can be sure that when you insert a person all ancestors of that person have already been inserted in particular the parent and parents have been inserted and therefore this insertion will not cause a foreign key violation. So father and mother records are already there when you insert a particular person's record. But this requires more work I may have to sort the relation somehow which is actually a fair amount of programming work. So another option is to do two passes. In the first pass you said father and mother both to null even though you know what they are you will set them to null for the moment, insert all persons then do a second pass going and updating all the records setting the father and mother appropriately. This is safe because in the first pass all people have been inserted in the second part those IDs will be there in the table it won't cause a problem. So that is the second solution which is actually used quite frequently. In a few cases the second solution may fail and may require what is called differing of constraint checking. So in SQL standard you can tell the database please don't check the constraint now hold on let me do a bunch of updates at the end go back and check the constraints. Now why is this needed here is a case. So if mother and father is declared as not null I cannot set it to null I would have to sort and that may be a lot of work. Now there is a even more complicated case with cyclic constraints. So here is a case where I have a spouse attribute for person and now not all persons have spouses. So presumably you cannot declare it to be not null but supposing I have a table married person for a person to be married this has to be a spouse. So it makes sense to declare spouse as not null for married person. Now what happens let's say I have a foreign key constraint I am giving a name to that constraint called spouse ref it's a foreign key constraint which says that the spouse attribute references married person. So we are adding this attribute and this constraint to a table called married person. So this is a reference to the married person table. Now we have an interesting situation. I have two people who are married to each other supposing I declared spouse to be not null if I insert the first person then the spouse reference has to be provided because it cannot be null but that person has not been inserted yet. So that will fail. On the other hand if I insert the other one the same problem will arise because their spouse is the first person who has not been inserted first. So whichever order I do these two insertions will definitely fail. So let me repeat this in case you didn't get it. I have two people let's say Ram and Sita married to each other and there is an ID for Ram and ID for Sita and I am loading both of them into the database. When I load Ram the spouse attribute which is an ID is an ID which is not yet been inserted in the database. So the insertion of Ram will fail if I do it first. If I insert Sita first then similarly the ID of Ram is not yet there it will fail. So the previous solution was to set both IDs to null but the constraints is not null. So the only solution here is to tell the system please differ checking the integrity constraints till the end of the transaction. So then I will insert both Ram and Sita constraints are not checked yet. When I commit it will check all the constraints and at that point everything is fine. So these are the features which SQL supports. Again it varies by database not all databases may support all of this. This is the last one I will cover before our break which is the check clauses in SQL can potentially be more complex. I showed you a simple check clause. The syntax of SQL allows you to write anything in a check clause. But most databases will say sorry your check clause is too complicated. I cannot implement it efficiently. In particular, here is an example. If you have the schema diagram with you, you will observe that each course section has a time slot associated with it. So this course runs in time slot 1, time slot 2 and so forth. Now in the time slot table, time slot is not a primary key because we may have multiple rows one of which says time slot 1 runs at 830 on Monday. Time slot 2 runs at 930 on Tuesday. So in that table time slot is not a primary key. As a result, I cannot declare time slot to be a foreign key referencing time slot table. So now what do I do? How do I make sure that the time slot information in section is correct? I cannot create a foreign key. So here is an attempt I make. I say check time slot ID is in, select time slot ID from time slot. If this is supported by a database system, if it can check this when it inserts a new tuple or updates a time slot ID, I can make sure that it's a valid time slot ID, it does exist. Unfortunately, most database systems do not support this. In fact, no database system that I am aware of supports this. Such complex conditions in a check clause. The check clause conditions today can only be very simple conditions, which just refer to the local attributes. It cannot have a subquery in particular. So this is ruled out. There is another feature called assertions, which is part of the SQL standard which nobody supports. So that wraps up the constraints at this point. Let me take just a question or two. Let's see if anybody has any questions coming up right now. I see a question from Amrita Colom. Let me select you. Amrita Colom, I can see you. Please go ahead and ask if you have a question. So the question was about the materialized view. And it was already answered. And the other question which we have is, can we index the entries in index? So what index? The view has the values, it has certain attributes. So can we index those attributes? That's a good question. Can you index a view? Now the answer to that is, if you have a view which is materialized, so the tuples are actually stored. Yes, certainly you can build an index on a materialized view. And most databases which support materialized views allow you to have indices on materialized views. But supposing the view is not materialized, does it make sense to create an index on something which is not actually stored? And the answer which pretty much any database system today says, no, you can't do it. SQL server says that if you try to index a regular view, I will automatically treat it as a materialized view. So their answer is slightly different. A third possible answer is to say, let me see if I can support it by creating an index on the underlying relation. As far as I know, no database supports that today. So what you would have to do is find out what are the underlying relations, and you go create the indices directly. It is not going to do a translation of an index creation to underlying relations. I hope that answered your question. And let's move to Jaipur Engineering College. They have a question. Jaipur? Please go ahead. Suppose when we create two views, then we can apply the join operation on both the views at a time. Over to you, sir. OK. So as I understood the question, it says, if I have two views, can I write a query which joins those views? And the answer is yes. You can do anything with those views just like a regular relation in terms of querying it. In terms of updating it, there are some restrictions. But for queries, it's just like any other relation. You can join. You can aggregate. You can put it inside a nested subquery. You can do anything at all with it. It doesn't matter. It will work. OK. Last question from PSG College, Coimbatore. PSG, you have been selected. Please go ahead and ask your question. In the commit transactions, if we commit five times or six times, where that times value stored, is it in the separate application table or inside the engine? I'm not sure I understood your question. If you commit a transaction, where is the commit times stored? Is that the question? Yeah. How many times we committed already to roll back in the correct point? OK. So the question is, can you commit a transaction multiple times? The answer is no. When you commit a transaction, that's it. It's committed. It's like when you get married, you're married. There is no divorce. So that's it. Now a few databases allow you to define what are called save points and then say that I can roll back to a save point. But that does not mean anything is actually committed. That is just some point. So it's still uncommitted. It lets you roll back. To some intermediate point, instead of rolling back all the way to the beginning. So save points are really markers which allow you to do partial rollback. But once you commit, you cannot roll back anymore. Now you can have a compensation in the sense that you can have a new transaction which can undo the effects of the first one. So supposing you committed in marriage, you can have an undo which is a divorce. Similarly, if you commit to removing funds from an account and it's committed, at this point, if somebody viewed your account balance, they will see that money has been debited. But you can undo it subsequently by going and crediting money back in. So once committed, you can only run a fresh transaction to compensate for what happened earlier. You cannot any more roll back that transaction which was committed. I hope that answered your question. Back to you. Thank you, sir. So coming back to the last SQL session, I'm going to wrap up chapter 4 with a few miscellaneous topics. I'm going to go a little bit fast on this. The first topic is a few more types in SQL. These include the date, time, timestamp, and interval types. Note that SQL does not have a specific type for year or for day or so on. But we do have date and time. However, again, many databases do their own thing. In particular, Oracle does not have a time type. It has only a timestamp. So what is the difference between these? A date is just a date. A time is the time of the day, 9 o'clock, without specifying which date. A timestamp combines date and time. And it actually has a controllable precision. An interval is a period of time. For example, your fixed deposit may start on a particular date. And its validity period may be, let's say, one year. So now, if you take the starting date and add one year, you get the ending date. Now, is adding one year the same as adding 365 days? The answer is not quite, because of leap year and so on. But most business tasks, for example, ignore this issue of varying dates. And they say this is for one year. And therefore, it will be on the same date next year, regardless of whether it's a leap year or not. So interval one year has a meaning, which is different from interval 365 days. If you do interval 365 days, it will be exactly 365 days afterwards. So those are the basic date and time related types. In our book schema, in fact, the fact that Oracle does not have a type called time has an impact on our schema. We wanted a schema that could work on all databases. So we ended up with two schemas, actually. In the book, we use start time and end time for the time slots. That is the simpler type, the time type, which is part of the SQL standard. However, if you see the actual data in the sample data which we have provided, since Oracle does not have a time type, we actually broke up time into start hour and start minute, both of which are integers with constraints on them. Start hour is an integer between 0 and 23, while start time is an integer between 0 and 59. It's minutes. So that's how we worked around the Oracle limitation. So you should note this. If you ever deal with the time slot relation, note that the book and the schema which we have loaded differ in this subtle way. So to get things rolling again in terms of the quiz, please press the ST button now to be prepared for the quiz. And here is the quiz question. Don't answer the quiz yet till we tell you. So the interval type basically is a period of time. So the question is this following expression. Date, 2010-12-14, which is 14th day of the 12th month December of 2010. Plus, and we are subtracting two different dates there. What would it return? So if you want to answer this question, you can read the bullets about the interval type, which is listed just above. And then read this and answer the question about that particular expression. Is it valid and returns a date? Valid returns an interval, is it invalid, or none of the above? So the bullets above tell you what happens when you subtract a date from another date and what happens when you add intervals. So let us now start the question. The timer has started. Please check that the red LEDs on your clicker are lit up. And then choose one of the answers A through D. You have a minute to answer it, or rather 45 seconds as of now. Time is up. Let us see what people have chosen. Just give a few seconds. But while we wait to see what people have chosen, let us see what is the correct answer. Now as the bullets above say, if you subtract a date or time or time some value from another, what you get is an interval. So the difference between two dates is an interval. So if you see that expression, the subtraction there gives an interval. Now that is added onto a date. So if you add an interval and a date, that is legal. You get a new date. So the first option one is the correct one. It's valid and returns a date. It does not return an interval. So two is not a correct answer. Three is also wrong because it is valid as is four. So now let us look at the bar chart. This time quite a few have managed to connect through. Just three of four remote centers have not made it this time. But very, very few responses have come. Maybe I didn't give you enough time to read the question or what? So I purposely didn't read the whole slide out to you. I gave a question which required you to read the slide. I thought I had given enough time for you to read it. Maybe I didn't because clearly people have been confused and have randomly chosen one of A through C more or less. As I said, expressions which subtract a date from another date given interval, you add an interval to a date, you get a date. So the query is perfectly valid. So now moving on, the next one is on creating indices. Now what is an index? We are going to see this in a lot of detail in a little bit a few days from now. But an index is basically a structure which lets us find tuples which we are looking for very quickly without going through every single recording. So for example, I have a student relation. If I want to find a student with a given identifier, one way is to go through every record of that relation sequentially till I find the correct student I'm looking for. That can obviously be very slow if you have a lot of students. So is there a way to find the relevant student very quickly? And indeed there are data structures based on trees. Data structure called B plus tree is very widely used for this, which can find the student record given the ID very quickly. What do I mean by very quickly? We'll see this later. But typically with a few disk accesses, maybe 20, 30 milliseconds worst case, we can find a student with a given ID. Whereas scanning a whole student relation may take seconds or minutes on a really large system. So SQL as a language did not address how to create indices. Although indices are very important. If students log on to the system and you need to display their name given their ID, that's a lookup. And if it happens frequently, you need an index on student ID. So how do you create this? Most database systems support a syntax which looks like this. I can say create index, give a name for that index, on relation name, and then in parenthesis, the attributes on which I want to create an index. In this case, create index student ID index on student ID. So that index is created. I can create an index on multiple attributes which are treated as concatenated together to form a single key. We will see this in more detail later on. So now if I give a query, select star from student where ID equal to 1, 2, 3, 4, 5. I'm not saying use the index, or I'm not saying don't use the index. I'm not saying anything. It's the job of the database system to figure out that, hey, there is an index on ID. And therefore, maybe I should use that index to answer this query. And that's the job of the query processing system. And every database has a fairly intelligent system which will do this. It will use the index if it is available. So we will see more on this later. But this can be viewed as part of the data definition language of SQL, the create index construct. The next topic is a new type called large-optic types. Now in pretty much every implementation of SQL, there is a limit on how big a single record can be. A single record is usually limited to maybe 16 kilobytes, 32 kilobytes, 64 kilobytes. There's some limit typically. And there is a reason for this limit to do with the implementation of recovery and atomicity and various other things, which is greatly simplified if you do not have very large records. And most records are small. If you allow one or two records to grow very large without knowing about it, it complicates life a lot for the database. So pretty much every database puts limits on the size of fields in a record, total record size and the size of individual record fields. So if I want a string which is stored in a tuple, maybe that string can be two kilobytes, eight kilobytes, 16 kilobytes. Some limit is there, some small limit. If I want to store a very large string in there, I cannot. But obviously many applications need to store large strings in there, maybe a whole webpage, maybe an entire book or maybe an image, a photograph, which can be quite large. So how do you store all such things in a database? And the answer is to create a new type called a large object type. In fact, there are several large object types, two of which are widely used. There is the character large object or C log and there is the binary large object or B log or blob or clob. So there are these two types which are widely used. And if you want to store anything very big, more than a few kilobytes in length, you should be having attributes using one of these types. Now there's another problem with large objects. If I run an SQL query and say, select image from some table and then the thing tries to print an image on my screen in ASCII form, I'm in trouble. So usually what happens is you would have an application which fetches the image and then displays it. Now if the image is 10 megabytes, this is not a problem, but say we have a movie which is two gigabytes and now it's fetched onto your computer and maybe your computer has just two gigabytes of memory and it's going to fill all memory. So usually what happens is these large objects cannot be directly fetched using an SQL query. Instead, an SQL query will give you a kind of a handle, a file pointer of some sort which will then allow you from the API to fetch bytes from that large object or conversely to store bytes into that large object. So essentially the database ends up treating these large objects as separate files much as your file system stores files. These are in effect separate files in the database system and the interface to it consists of reading byte after byte of the file or writing byte after byte into a file. The specifics depends upon the API, I'll skip that here. The last topic for chapter four is authorization on relations. Now a database can have multiple users and if you create one account per student, you probably don't want the student to go and see the data of another student and by default the contents of particular student's data are not visible to anybody else. But then two students may collaborate on a project or you as a teacher may want to give a table which is available to all students. How do you do that? The way to do that is to grant privileges to specific users and then there's one more concept called a role which I will come to in a bit. So the grant statement, grants privileges on some objects typically relations of use to a list of users or roles which is a concept we will see in a little bit. So that's the basic structure. A user list can be an actual user ID or a role as I said. It can also be public which means granted to everybody. You may find this useful if you create a table and want it available to all students of your course, you say grant privilege which is select privilege on this table to public. I'll come back to this slide in a moment after seeing some examples. So here's an example which says grant select on instructor to U1, U2, U3. So those three users alone have access to instructor, select access to instructor. Meaning they can read from the table, they cannot write to it unless that privilege is granted. Now who grants the privileges? The privileges are normally granted by whoever owns that particular table. If you created the table, you own it. So you would then be able to give authorization. The database administrator role can also grant authorization for any table created by anybody else. Another privilege is the ability to insert a tuple into a table. So you may want to grant someone select but not insert. You may want to grant someone select and insert but not update, meaning they can keep adding new records but if you allow people to go back and modify old records, maybe they can do shady stuff. They can go and cook the account books afterwards. So maybe you will give an insert privilege but you will not give an update or delete privilege. So they can only keep adding records. This is a common business need. Or you may give the ability to update an existing record, delete records or all privileges which means do anything to that relation, typically it's relation, it could be a view also. Now coming back, if I grant a privilege on a view, if I grant select on a view, that does not imply I'm giving any privileges on the underlying relation. Why is this important? Think back to the faculty view which hit the salary attribute. The idea was a staff should be able to see a department name, address, whatever of instructors but not their salary. So that view, if I give grant select on that view to somebody, the view is defined in terms of instructor but that person does not have select on instructor. They only have select on the view, not on the underlying table. Now once you've granted authorization, there may be a need to revoke it. If that person moves to a different department, the person leaves the company, whatever, you may wish to revoke it. So the revoke command says revoke, whatever privileges you want to revoke on relation of view name from whichever users you want to revoke it from. The revoke key list could be public which means public loses privileges. Now what is interesting is the following situation. Let's say I explicitly grant select on a table to a particular user U. Then I also grant select on the table to public. Then I revoke select on the table from public. So first I gave it to U1, then I gave it to public which is everybody. Then I revoked it from public. So what should the system do? This person is also, U1 is also a member of public. Should it be completely revoked from U1? The answer is no. What happens is a revoke revokes a grant which was made earlier. So it's revoked from public, but the other grant made to U1 is still there. It has not gone away. So when I revoke from public, U1 can still see it because there was a grant to U1 which is still there. If I revoke that also, then U1 can no longer view the contents of that relation. When in the revoke, you can explicitly list privileges select insert delete or all to revoke all privileges that were granted already. There's another kind of privilege called the references privilege. Now this privilege is required to create a foreign key referencing somebody. So let's say that I grant reference department name on department to a user called Mariano. This allows Mariano to create a table which has a foreign key reference to department. If I do not grant this privilege and have not granted all privileges, Mariano cannot create a foreign key referencing department. Why? Because if Mariano creates a foreign key referencing department and creates a tuple with a particular value say computer science, I can no longer delete the computer science department at will. So what I have allowed is Mariano to do something which prevents me from deleting a tuple. Now if Mariano gets this access by default, he can cause me trouble. Therefore by default he will not get it unless you explicitly grant the references privilege or you say grant all privileges that includes the references privilege. So that's the reference authorization. And finally you can grant an authorization with somebody with the permission to grant it further to some others. So when I say grant select on department to Amit with grant option, that means in turn Amit can grant that further to somebody else perhaps with grant option again. So you can have hierarchy of administrators. The top guy grants it to somebody, this lower guy can grant it to other people as required. And then I can revoke select from Amit cascade which means that whoever Amit has granted it to will also have the privilege revoked. On the other hand if I say revoked select on department from Amit restrict what it means is if Amit has granted it to somebody then I the revoke will fail. Anyway there are a lot more details I'm not going to go into all the details. I'm going to stop here with respect to the authorization features. Maybe I'll take a few minutes for questions from chapter four. So if anybody has questions please indicate it on a view. So the first question says where is the commit point stored? For example say after 5,000 records stored. This is actually a function of what you define as a transaction. There are performance issues in supposing you want to load a lot of data. If I want to load million records. If I try to load all of them as a single transaction many databases will run into trouble saying that the transaction is way too big. So what they will do is they will force you to break it up into smaller pieces. Now on the opposite end a user may say every record which I want to insert is a separate transaction. I will insert this record, commit, insert, commit. That will be very slow on many database systems. So what people do in practice is they insert some number of records let's say 1000, 5,000, whatever. The programmer chooses this. After doing 5,000 inserts they commit, start a new transaction, insert 5,000 more, commit and so on. This is under the programmer control. Now what is the optimum value that depends on the database system but a rule of the thumb maybe a thousand records is not a bad choice on many database systems because it's to do with how big the records are, how big the pages but a thousand is rule of the thumb. The next question is can we create a view for a role? So this question was an anticipation of roles which I didn't actually cover much in this slide but since that question has been asked let me say a little bit about roles. So I can create a role in just like I create a user. So example I can create a role called instructor. I can create a role called student. These are widely used if you have started using the model system as I hope all of you have you'd have seen that people have roles as a student or a teacher or a teaching assistant as a center coordinator or whatever else. There are many roles. So the idea is that you can create a role. You can grant privileges to a role just like you grant it to a user. So the role teacher can have certain privileges. The role student can have certain privileges. Now when I create a new student I can grant the student privilege, the student role to the particular user. So I've broken it up into two parts. Grant privileges to role, grant role to user. This is actually a much more sensible way of granting permissions than to grant each permission individually to each user. That's not required. All the students have basically similar privileges. There are some differences. Like a student can see their grades and not other people's grades which is called fine grained authorization. Unfortunately SQL does not support this kind of thing although any application which you build implements it in application logic. So coming back, it makes sense to have roles, grant privileges to roles, and grant roles to users. So the question was can we create a view for a role? When you create a view, the view is created by a user or if you, there is a way to say, you know, treat it as belonging to a role rather than belonging to a user. Now you can grant access to that view to other roles and grant the role to users. So that gives you whatever flexibility you need. So I hope that answered that question. The next question is any guidelines for defining views considering performance and security views? There is no, you know, clean set of guidelines. So in particular for performance views, it's actually a fairly difficult task for a human to choose which views to create to support set of queries in an application unless you understand what the optimizer is doing. Now a good system administrator who understands query optimization can do this, but most people will find it difficult. So what many databases today support are what are called wizards or assistants or whatever, which will let you, you know, monitor the database system for some time and see what queries are being executed and then suggest that the system will suggest maybe you should create these three materialized views and these 10 indices which will help you run these queries faster. So that is widely supported in commercial databases Oracle, DB2, SQL server, all supported. PostgreSQL as far as I know doesn't support it. There may be some tools which some people have developed, but it's not part of the database system as of now. The next question is what is the difference between a recursive relation and a self-join? A recursive relation is the view which is defined in terms of itself. If you have done a recursive program in C or Java, you know what recursion is. It's the same thing. A self-join is a much more simple concept. A self-join is simply joining a relation with itself with a copy of, think of it as making a copy and then joining it. So that is not the same as recursion. The next question is can we have a foreign key referencing to some unique column? And the answer is yes. In SQL, you are allowed to have a foreign key, reference is a relation and list the columns which it references. This is okay as long as the referenced columns are declared as unique. If they're not declared as unique, you can't do this. The next question is in attribute defined as unique, I have observed that we can have only one null values in SQL server 2000. So maybe there is some implementation defined restriction there. So this business of is null equal to null. As we discussed, null should not be equal to null because we don't know what the value is. So for a column declared as unique, as per the SQL standard, you should be able to have multiple rows with null value. But some databases may not quite follow the standard and it appears you have found this happening in SQL server. But there is another issue. If you group by and a particular set of rows all have null for the group by column, what do you do? Are you going to create one group per null value? SQL itself is inconsistent. I think the standard says that all the null values will be put in one group and you get a aggregate for the null group. So grouping is kind of special in this SQL standard. The next question is can we rename the attribute of a view while creating a view? And the answer is yes, you can certainly rename. We had some examples of this in the with clause and you can do the same thing with views. Here's another interesting question. If you have two different databases, can we use the primary key of one relation in database one as a foreign key for another relation in database two? That is an interesting question. And the answer is generally no. So the question is what is a database? The idea is databases are independent of each other generally. So you should be able to bring up one database while another database is shut down. In such a situation, creating a foreign key reference from one database to another cannot really be enforced. If the reference database is down, how do you even check or vice versa? If you're deleting, how do you know if anyone is referencing you? So in general, this is not allowed. However, some databases allow you to define views in terms of another database. Oracle allows this. So that view will basically run if the other database is up, if the database is down, tough luck, that view will fail. So as long as both are up, the view will run. But note that an integrity constraint is something more serious and you cannot generally allow such references to other databases. Is there any kind of file pointer for large objects? As I mentioned, the SQL standard requires support for something called a large object locator, an object locator, which is just like a file pointer. Now the specific way in which it is implemented depends on the API you are using to talk to the database. Our next topic is actually JDBC. So JDBC has a specific way of accessing large objects. I'm not going to cover it in the talk today, but if you're interested, you can go look it up. It's not very hard. It's much like reading a file or writing to a file. The one more question is, how can we store an image in a database? The BLOB type, BLOB type, is generally used for storing images. I should note that many systems find that if you have really large collections of images and you put it inside of the database, then certain procedures like backing up the database and so on become very slow. So there are many database systems which will let you store objects like this external to the database, but they will also control that part of the file system where you store the object and make sure that that object will not get deleted arbitrarily. So they actually divide up the space into regular database and then a file system storage for large objects. So DB2, for example, supports this feature. So you may choose to use that to store large objects or you may choose to use regular BLOBs in the database to store images. It depends on how big the images are. So for our ID card database in IoT, our images are not very big. Maybe there are 10 kilobytes or 20 kilobytes is a fairly small image. And we can certainly store this for thousands of users without any problem in the database. And with that, we have wrapped up chapter four.