 Welcome to the fourth day and today we are going to cover little more on query processing including a bunch of features which are not exactly writing queries but related features such as embedded SQL, triggers, authorization and we will also talk about decision support systems. Here it is a fairly high level view about what they are sort of as an introduction to it not as in you know here is a tool to help you with data analysis and after that we are going to spend some time on how query processing is implemented. There were a lot of questions about this yesterday about indices, efficiency and so forth. So, we are going to spend the last part of today talking at again at a fairly high level not at low level detail about how query processing is done and how query optimization is done to some extent I will not go into details and how that affects you and what are some of the things that you need to watch out for. So, that is a overview and the first topic today is embedded SQL. What is embedded SQL? It is basically SQL which you can directly invoke which is integrated in some sense with a programming language it is embedded inside a programming language. So, what I am going to do now is give you an language independent example embedded SQL looks more or less the same whichever language is embedded although there are some details that vary. So, we will look at it at that level and in the afternoon some of the cobalt specifics and how to compile and so on you will see. So, idea is that the embedded SQL is inside a language which is called a host language and within that language itself you directly have SQL, but SQL does not quite follow the grammar of whatever the host language is the grammar is different. So, what you do is you create some delimiter like this exec SQL and then end exec these keywords delimit the part of the program which is actually SQL and the exact syntax varies for example, the SQLJ which is the Java embedding and there the thing says hash SQL open curly bracket blah blah close curly bracket in cobalt though this is what you see exec SQL. And this is actually not really a modification of the language I mean if you add SQL directly into the language you need to modify the grammar of the language to include SQL that is usually too much of a change although that has been proposed in many cases. The cheaper alternative which is what is used in practice is you take this SQL embedding like this and run a pre-processor on that language. The pre-processor understands the language grammar to some extent at least it also recognizes exec SQL and exec and then it takes everything which is inside here you are embedded SQL and translates it into function calls to the database and the database is generally separate from the application. So, you have to send a message or invoke a function in effect across the network or within the same machine by inter process communication you send a request the database executes and sends the result back. So, what happens is the SQL statements here are translated into function calls which carry out this activity. In other the other way of doing this is not to actually have an embedded SQL, but rather provide an API to the programmer using which the programmer can directly send request to the database request in the form of SQL to the database and get results back. That model is used in the JDBC and ODBC APIs for which are used in C and Java and in general there is a ADO dot net the whole Microsoft stack across the Microsoft family of languages there are it is basically variants of ODBC tailored to other languages. ODBC was designed for C, but there is also ODBC for basic and then extensions of it which let you create queries and send it to the database and get the result back. It is all through an API the language is not modified at all whereas, here there is some modification in that there is a pre processor. So, the basic issue is that the SQL query is within this block exec, SQL and exec, but it must communicate in some way with the external program. What are the ways? There are two ways in which it communicates. One is there are values from the external program that should be passed into the SQL query and the other is there are results from the SQL query which must be given back to the outside program. Both of these are achieved by having special syntax here of the form colon variable name. So, anything in the SQL query which looks like colon variable name is actually a variable from the host language outside. So, here what happens is somewhere here you would have assigned a value to the variable amount and then this query is directly using that value colon amount. So, this is a query which finds the names and cities of customers with more than amount dollars in some account. So, this query is joining depositor customer account. So, getting the entire information about the customer and their accounts with the joint conditions are here and this is the part which ensures that the amount or the balance of the account is greater than the value which is in the host variable amount. Is this clear? Now, we cannot directly execute the query. We have to get the results from that query. What is the result? It is a set of tuples. Now, language like cobalt in fact even modern languages do not have a predefined type equivalent to relation. So, there have been proposals to create relation types and use those and in fact, in Java for example, you can get a result set or then there is a row set equivalent in the Microsoft stack. So, you can get a set of things back as a programming language type. It is not a built in type. It is a type defined by the library, but that type gives you the abstraction of a set of rows through which you can do things. In cobalt in embedded SQL here, you do not actually create a new cobalt type. Cobalt is not really suited for adding things like this. Instead, you would do the following. You declare a cursor, declare C cursor for and then a select query. So, now you can use this cursor C to step through the results one at a time and each time you fetch. Even with JDBC or ODBC, you essentially doing the same thing except when the query executes, it gives you back some result type and then on that you say next or fetch. Here also it is similar except that there is no programming language type for it. Instead, it is there in the embedded SQL by declaring something to be a cursor for a result. So, what does the previous statement do? It did not actually run the query. It just declared a cursor. It wrote the query, made the query, set up the query and gave a name for that cursor. So, now you can open the cursor. This is when the query gets executed. You say open C. Now, the C was defined earlier. So, that name is known over here. So, you open C. Fine. That causes the query to execute, but the results are not yet in your hands. How do you get the result in your hand? Well, you basically do this exec SQL fetch C into and here are two host variables Cn and Cc. Why these two? Because this query selected two things. The types are better match. In this case, both are where care some number. So, you have a corresponding type in COBOL for the variables Cn and Cc and it will get the results into those variables and this is one fetch. Now, if there are multiple results, you put this in a loop. Keep on fetching one after another. Now, how do you know when you are done? So, in the COBOL version, there is a variable called SQL state. It is actually even in the C version. It is the same. There is a global variable called SQL state. It is part of what is called an SQL communication. There are multiple variables in there. SQL state is one of them and that variable has some special codes which indicate what happened when you did a fetch. So, if there is no data available, that thing gets said to 0 to triple 0. If there is data available, if it succeeded, it would be some other code. If there is an error, it might be yet other code. What exactly the code is? You should see in your manual. So, what you do is you look at that state and decide whether the fetch actually fetched the result or not. If it did, use it. If it did not, quit the loop and finally, when you are done, you do an exec SQL close. What is a close do? What actually happens in most implementations is when you open the cursor, a temporary relation is created with the entire result and it is kept around until you execute the close. If you do not close, eventually, when the connection is closed, the temporary relation may go away, but otherwise, you may have many temporary relations hanging around while your program is running. So, you should close it at the end. Now, if you do this on a very large relation, you are going to possibly get a very large temporary relation. That is a problem which I believe has been observed in LIC. I do not know exactly how it is handled on your specific system, but some systems allow you to say that do not create a temporary relation. Let it be a cursor on the original relation. Therefore, and do not fetch everything upfront. In fact, what happens with many systems is you end up fetching the entire result into the program. If you have an extremely large relation and open a cursor on it, you might end up trying to fetch a huge relation into the program area and then the program may crash. This should not happen, but there are some implementations which are not very good. In those, you have to be careful, but others are careful about this. So, they do not fetch the whole thing into the program. Any reasonable implementation would not do this. As you do fetch repeated fetches, it will go back to the database as required and fetch things. So, any good implementation would do this. So, that was for a simple query. What about updates? So, you can do an update based executing a regular SQL update statement from embedded SQL. Nothing prevents you from doing that. However, there is another mode which many people find very convenient, which is very similar to your normal mode of file interaction. Open a file, get a record and if you look at the record, you make some decision, you may write back to the same record. I am sure this is a common mode which most of you have used. Whenever you update, that is what you do. There is something essentially equivalent to that with SQL, where you declare a cursor for a query, where the query should be select from a single table and then you say for update. So, once you have done this, what happens? You can actually update it using the following syntax. By the way, all of this has to be within exact SQL and exact. I have omitted the delimiters, but it is implicitly there inside the COBOL program. So, to update it, you will do update account set balance equal to, let us say in this case, we are just adding 100 to the balance, where current of C. So, for the current record on that cursor, its balance value is updated. So, this is a very common mode of updating. You can of course, have select conditions here. So, you restrict it to records which you are interested in looking at, not all possible records. And of course, once you have done this in SQL, the indexes to be used to fetch the record and so on are the headache of SQL. Assuming the index is available. If there is no index, SQL is going to take a long time. We will come back to indices later, but you don't explicitly have to say use this index. SQL takes care of it for you, as long as the index exists. So, what is this cursor exactly? Is it a pointer? You can think of it as a pointer into the relation with into the result actually of that SQL query. So, it steps through the results. So, think of it as a pointer to the current record in the result. And when you say for update, it's actually in the original relation, not on the copy of it. So, when you update it, it's actually going in updating the original relation, not temporary copy. But in general, if you have a query, you will get a temporary relation and the cursor is effectively like a file pointer into that temporary relation. Initially, when we were talking about embedded SQL and this API based. So, essentially, what is the difference? It is the implementation because ultimately, the package like the API providers also might be doing it the same way in the end. You can think of it this way. The embedded SQL gets translated into some underlying API, which is hidden from you. It is cleaner to use the embedded SQL if it is available. Whereas, if you use the API, you have to do a lot more work to set up queries and get their results. If you look up, I have not put the slides here. There are slides on the ODBC and JDBC API in the book site. And you can compare the embedded SQL here with the APIs there. And you will see even for a small query like this, the previous one, to just get the value into account. You have to do more steps. You have to worry about setting up the connection. You have to set parameters, get the results. And then here, with one query, fetch c into colon c and colon cc gets you all the attributes of a record. One line gets all the attributes. Whereas, with the API, you will have to have one call for getting each attribute. So, overall, it is a little messier. It is more work for the programmer. In this case, in a relation where the number of attributes is very high, say maybe 50 or 30. Then I have to declare all the variables like this, colon c and colon c. There may be ways to, if you have a record type, fetch into a record which has all the types. I don't know whether your particular embedded SQL supports such features, but it could. You should check this out. So, when we are fetching a record into a COBOL record, especially the date. The date in SQL comes with, I think, hyphen in between the digits. So, I mean, in COBOL, that kind of a concept is not there. So, the variables might, something might happen to the values in the variables. The embedded SQL implementation takes care of type conversions. So, it actually knows the type of the variable in SQL. And it knows the type of the COBOL variable. So, it should do the conversion as long as you use a date type in SQL and the corresponding, what is the date type? Is there a date type in COBOL? No. So, if there is no standard date type, if it is just a string, you are going to get the default string representation. If you don't want that, you can actually set up an SQL. You can convert date to string using SQL functions, in which you can specify the format. So, actually, when we tried a small example where we had tried to get it, then what happened was the value got converted into something, some junk kind of thing. Then we had used this function to exactly translate. So, everywhere will be required this function to, I mean, to be put in the declaration so that, you know, you get it exactly translated in the way it wants to be in COBOL. Yeah, you might have to do that. There are built-in functions in SQL. Yeah, that we have used. We saw that because you get it in that, what do you say, YYMMDD format, the same way you use it in COBOL. That way, only we got it. But I just wanted to know whether we have some other way around it rather than using this function to get it. No, I don't think you can. There may be some default setting in your compiler, pre-processor, which understands that this is a date and converts it. So, otherwise... I don't know what your pre-processor does. I cannot answer that question. So, something like we have to take care that every time we get back the values in the way we want itself, that we have to take care basically. Yeah, that's true because you're not using a date type in COBOL. If COBOL had a date type, I'm sure the later version of COBOL have it. Maybe what you're using has it and you're not using it, I don't know. But if there were a date type, it would convert it automatically. You don't have to worry about it. The problem is you're using strings to represent dates, and that causes this problem. Okay, so that was a very, very short introduction to embedded SQL. You will be doing a lot of work with it, so you will need to actually learn more about it. But conceptually, it is fairly straightforward. These two, three slides have covered the key concepts. There will be a lot of detailed issues which you will have to deal with when you do the program, but it's not very hard to use. It's straightforward. So, the next thing which we'll look at are procedural extensions to SQL and what are called stored procedures. So, for a long time, SQL was simply a basic query language. It just had the things which we sought till now. It had at best views, which are sort of like procedures, but they are not really procedures. View cannot do multiple steps. It just gives you a result, that's it, runs a query. That's all the viewers. It cannot even do an update. But people realize that many times you want to have some application logic, which is in the database, and you run it from the database. So, what you want is a procedure which sits in the database and can be executed. So, what language should you write this procedure in? There have been different approaches. For example, there are databases today where you can write a Java program and store it in the database and have it executed by the database, or that is an oracle. If you go to Microsoft Land, you can have any of the .NET languages compiled into their .NET code, stored in the database and executed by the database. These are called stored procedures. Now, those are in specific languages. And in fact, those in turn can invoke embedded SQL or JDBC, ODBC equivalents internally and again talk to the database. So, database calls a procedure. The procedure again can invoke something in the database, which may call a procedure and so on. It can go down many levels. What we are going to look at is how to extend the SQL language itself to do procedural things. So, you do not need a new language. The database already knows SQL. Extend SQL with procedural constructs. So, the nice thing about stored procedures is that the external application can work at a higher level. It can say register employee without having to say what are all the steps in registering an employee. The employee may require an update to five relations. But one procedure can encapsulate all that. Otherwise, it is in code outside of the database. And then all bets are off. You might run one version of the code. He may have another version of the code. And when you update the schema, you update your code. He forgets to update his code and a mess happens. And it is not possible to control it. So, it is actually very logical to use stored procedures. Unfortunately, stored procedure syntax is not standard. This was each database did its own thing. And later SQL standardized it. And now none of them is willing to give up on their particular version and go to the standard. So, it is completely non-standard. You are locked into a database once you start using procedural SQL. It is unfortunate, but that is life. So, SQL 1999 standardized stored procedures and it is called a PSN, I think persistent storage module. Not storage. Procedure is something. I forget the expansion. So, functions and procedures can be used for many reasons. One of the reasons they use is with specialized data types. So, if you want to store image data or map data and so on in the database. And you want to manipulate it in the query. You can actually extend the SQL type system today to create new types. You can have line segments, polygons and so on for a map database. Now, a query might want to find two roads. A road is maybe a sequence of line segments. Now, I want to find if two roads intersect. So, I want to have a function intersect, which will take two roads and return true if they intersect. And I want to use that in the SQL query. Select star from road 1 where road 1 intersects road 2, which is a given road. So, all the roads which intersect the given road 2. I want to be able to do this. So, these types of things are possible. If you extend the type system and the set of functions available in SQL. And you can do that today by adding functions as you want. Earlier, it required a change to the database code itself underlying code. Today, you can add your own types and your own functions. And the functions themselves can be written in SQL. They can also be written in C or Java or .NET and so on. In fact, some database systems like PostgreSQL even support table valued functions. Normally, you think of a function as returning a value. Here, in PostgreSQL, for example, the function can return an entire table, a relation. And after you call that function, it's effectively like a cursor. You can step through the results of that function call and use them. You can even use them in another query. So, what was required to do all this? First of all, you had to define procedures and functions in SQL. And then you had to define loops, if then else, assignment and all the usual programming language constructs. So, here is a function which you will also be using in the lab. This is create function account count takes a customer name. And it does the following. Select count star into a count from depositor where depositor.customername equal to customer.customername. So, what is this doing? Declare this as an integer. Runs a query which selects into a variable. So, like an embedded SQL, you had fetch into colon n. Here, you say select into. Now, how can you do this? If there is some implicit assumption about this query, then you say select something into a variable. What is that assumption? Correct. The assumption is that this SQL query returns a single value. And here it is clear that it is doing count star. There is no group by, so a single value is returned. And that you can assign into a local variable. And then you return that variable. So, this function tells you how many accounts a given customer has. And you can use that in a query. So, this query finds customers that who have more than one account. So, what are we doing here? Select these things from customer where account is this function of customer name greater than one. It is clear that this is a simple function. Utility is very obvious. Any questions? Procedures are similar except that they do not have a return type. However, they have the ability to have in and out parameters. So, this is just like in cobalt 2 you can do this in and out parameters. So, here a count is an out parameter. And this guy selects this into a count. This query has some errors. It should have been the same query as in the previous page. But there are some typos in this query. Just copy the previous query into here. Yeah, you could do count star here. So, we will just correct that. Count star into this. This should have been customer name instead of title. Note here that we are saying account.customer name. That is the name of the procedure. So, if the depositor also has customer name, how do you differentiate? This customer name from there to the one which is in depositor. This is how you do it. You use the name of the function of procedure. However, a simpler thing would be to not use the same name. Use a different. That is what this query did. Actually, this query did not do that. So, maybe we should have used a different name here. Call this just name and equal to name. That would avoid the ambiguity about what this refers to. So, now once you have a procedure, you can call the procedure from embedded SQL. So, with an embedded SQL, with another function or an SQL procedure either way, you can have declare this as an integer and then call this procedure over here. This is if it were inside an SQL procedure. If it were an embedded SQL function, you can directly call it with a external host variable in that colon variable name and it would save it straight into that. So, that was a simple procedure which had just a single select statement. You will find today when you, afternoon when you play around with PostgreSQL, that PostgreSQL does not actually support something called procedure, but it combines these two. It has functions and functions can have out parameters. You do not declare in, it will not recognize in explicitly. So, leave the in out. Default thing is the parameters in, but if you say out in a function parameter, it is an out parameter. So, now what are the procedural things it supports? We have seen the functions and procedures, but within that there are loops while loops are shown here while n less than 10, you can say set n equal to n plus 1 where n is integer. So, you can have a repeat until end repeat. You can have a begin end which is a compound statement. Everything between the begin and end is a single statement. Note that this begin is different from the begin we used yesterday to start a transaction. So, that is PostgreSQL hack that begin semicolon as starting a transaction. In SQL, you can have a begin and an end. If you want that entire content of the begin and end to be a single transaction in standard SQL, you can say begin atomic. In PostgreSQL, you say begin transaction and you can even say see realizable. Of course, it does not actually do see realizable in PostgreSQL, but you can define the consistency level that you want read committed or see realizable. So, that is a begin thing. Now, coming back you have for loops. Look at the for loop. It says for r as and a query in here. So, what this is doing is it is a loop which iterates over the results of this query. So, for every account in peri rich, it does what? Set n equal to n plus r dot balance. n is initially 0. What is this doing? What is this bit of code doing? You probably should not write this query like this. You should use a select sum of balance. That would be a better way of doing it. But there are certain situations where you cannot directly aggregate like that. Then you can do this. So, no, not necessarily. So, there is a trade-off. If you do all this computation in the application, you may have to fetch data from the database. There is overhead to fetching data and if you need to update it to sending it back. Whereas, if you are able to filter out by using stored procedures, you can reduce the amount of data that is shipped back and forth. You may get a performance benefit. Even though you are adding some CPU work to execute the procedure, you are reducing the amount of data which is moved back and forth. That is expensive. Relative to doing a small amount of CPU work, copying the data over a network is not IO. IO is there anyway. That is orthogonal. But copying it over a network from the database to the application program can be slow. Whereas, doing it in the database could be faster. But then if you are doing very expensive communication, if you are running a procedure which does a lot of computation, you may be better off doing that in the application. And the idea is you have a centralized database server. You can put many application servers around it. So, there is a trade-off which you have to do judiciously. Where to run the procedure, whether it is in the database or in the application server, you have to choose. It would be nice if the system could automatically choose it. But today that is not supported. You will have to do that. There are if-then-else statements. Again, it is fairly straightforward. If-then-else-if-then-else. And there is a case statement which is similar to the C case statement. This is not the case which we saw, case expression. This is a case statement. So, that is also there. And finally, you can have exceptions. So, you can say declare out-of-stock condition. So, it is a condition which can be raised. And a procedure can raise the condition or signal it. So, here is a… So, this procedure does what? If something goes wrong, it signals out-of-stock. Now, what happens when you signal it? Well, that is space-wide here. Declare exit as the handler for out-of-stock. Exit means just exit the procedure. But you can also call another procedure which would be invoked when this signal happens. So, there is an exception control in SQL. Most languages today have an exception mechanism. Does the version of Koval which you use have an exception mechanism? You have not used it? Yeah, but it has a way of specifying exceptions. So, this is similar. And finally, this slide shows how you can link up external procedures which are in a different language. Not in SQL. So, you can say create procedure, account proc in, out and so on. Language C, external name, this thing. So, that is the executable code, the .ofile which contains this procedure definition. And if your database supports loading of external language procedures, it can actually load the procedure from here. Now, there are security issues here. A C procedure can cause arbitrary damage to the database because it can go and access any location in memory. Therefore, this is something which is probably not made available to anybody except the administrator of the database. Whereas, a Java program can be run within a sandbox. You can prevent it from causing any damage outside. So, if this were a Java thing, this may be allowed to any user. So, different databases will do different things. So, we just discussed some of these issues. The benefits of external language things is more efficient for some operations. Rather than write it in SQL, if you write it in C, it may be much faster to execute if it is a complex computation. More expressive power than SQL itself. But the drawback is that it has to be loaded into the database system and executed in the database system's address space, which can corrupt memory and can crash the database. And it can give users access to unauthorized data because it can bypass the entire authorization mechanism. So, there are alternatives which give security at the cost of worst performance. Some databases do this automatically. If you execute a C function, it is actually run in a separate process, not in the database process. Run as a separate process. The parameters to the function are sent to that process which executes and then returns the value and then the database continues from there. So, you get security, but at the cost of net, it is not network, but inter-process communication to send the parameters and get the results back. Whereas, if it is in .net or Java, you can avoid that and execute it locally. So, that ends the SQL procedures. Are there any questions at this point? Sir, in the beginning, beginning you told that currently many of the databases are not supporting the standard SQL syntax, maybe SQL 19 or something. So, suppose we are starting right now, whether we should stick to SQL standard. So, later on, all databases will be coming to that standard. Otherwise, moving from one database to another database, there will be a lot of problems. There will be a problem. You can use it if you have made the choice of database. If you have signed the contracts with the database vendor, that is fine. But later on, there will be a situation from moving from one database to another database. See, if you use the standard, there are couple of databases, Oracle and DB2, which are close to the standard. If you wonder why, they are the people who are the most active in setting the standard. So, they took whatever they had and put that into the standard in FA. SQL servers stayed away from all of this. At that time, it was a very small team actually. And PostgreSQL implemented many of these things quite early. Some of the things they did actually came into the standard, but there are small differences. So, if you move, you have to rewrite these functions. The differences are not enormous, but they do exist and you have to deal with them. There is a lock-in issue with procedures. So, what some people do is use SQL, sorry, Java procedures. Then, you are not linked to the specific version here. It is Java. So, Oracle supports Java, DB2 supports Java, SQLJ variant with embedded SQL. Java with embedded SQL. So, you could use that. There is a slight impedance mismatch between Java and SQL. So, it is like embedded SQL. It is not as clean. Here, the type system is all combined. You do not have to worry at all about those issues, fetching, storing and so on. Whereas, if you use SQLJ, you are coding as a little clumsy here because it is an embedded SQL. But that is probably safer in the sense of portability. PostgreSQL does not support Java as of now. SQL server does not. So, the Oracle and DB2 end up being the ones which support Java anyway. And both of them, their procedure SQL is fairly close to the SQL 99 standard. So, I guess in either case, you are sort of locked into these two if you follow either of these routes. And if you use Microsoft Transact SQL, you are locked into them. So, moving back to authorization, just like you have permissions in the file system, you have permissions on relations. And there are multiple types of permissions. In file system, you have in Unix, you have read, write and execute. Here, you have different set. You have read, insert, update and delete authorization on relations. And also, it can also be used on views and other such things. So, the authorization is obvious. Update allows reading. Insert allows insertion. Update allows modification. Delete allows deletion. And there is also an all which allows all of these. Actually, all allows many more things. So, these are the basic ones. But there are other things which are schema level authorizations. For example, there is an authorization which you create an index. Authorization to create new relations. Authorization to alter a relation. Authorization to drop a relation. These are schema level authorization. This is part of the standard. However, some databases don't quite follow it for this part. But this is used by all databases. The schema level things vary a little bit. So, how do you actually specify the authorizations? You can grant the authorization or revoke it. So, you can grant privilege list on relation name of union to user list. In fact, you can even have a list of relations here. So, user list could be a user ID. In your case, in the lab, you have user IDs, LIC 130 or so. It could be public. So, if you grant something to public, all valid users get that privilege. Or it could be something called a role. This is called role-based authorization. I don't think I have slides on that. I don't have slides here. But let me tell you briefly what a role is. A role is sort of like a user, but it's not really. It's just some, you can think of it as a placeholder, which you can grant to people and you can grant privileges to the role. So, what are roles? Maybe a manager is a role, counter person is a role, section manager, division manager. You can have different roles. And different people might get that role. A role is conceptually some role in an organization. And you can grant a privilege to that role. And in turn, you can grant that role to individuals. So, you might be a division manager. And when you grant role division manager to you, to your user, every privilege that was granted to division managers in general will be granted automatically to you. So, that's the idea of a role. And this avoids giving individual permissions to individual users. Organizations don't operate that way. You have a notion of categories of users. And that's really what is used in practice. Directly granting privileges to users is something which you might do if you have a user per role, which is also done. When you build an application, you might have different users which don't correspond to individuals, but they correspond to roles conceptually. And then you grant it to that user. So, that is what used to be done. But then there is a different model which lets you grant it to individual users. Now, the thing is when you have web-based systems, the individual users are web users. The database doesn't know who the individual user is. It has no idea. Therefore, this whole grant model is completely useless from the viewpoint of individual users. However, it can be used in a different way where you create a limited number of users and each application program can run as one of those users. So, for example, in IIT, we have some things which are accessible only from inside IIT and some other things which access from outside IIT. We don't want that part which is accessible outside IIT to be able to perform certain updates. So, we create a separate user for that and grant that user read-only access to most of these tables. So, even if some hacker outside somehow manages to bypass the normal application security, the connection to the database is using this read-only user and therefore, they cannot modify these relations. So, you can use it in that way. The privilege system is still useful, but not from the viewpoint of individual users in a web-structured system. But if you have individual users logging into SQL, then it's still useful. So, you can actually grant a privilege to a view and when you grant a privilege on a view, you can see the result of that view. But that does not imply granting any privilege to the underlying relations. So, if the view was to select something on account where branch equal to peri-range, if you grant read on that view, you do not have any read permission directly on the account relation. You only have it on that view. That is the point here. And more important, the grantor of the privilege, whoever is doing the grant, the user is logged in and executing this grant. The grantor of the privilege must already hold the privilege on the specified item or be the database administrator. So, there is a grantor and a grantee. The grantor grants a privilege to a grantee and the grantor must have that privilege. In fact, in SQL, you must not only have the privilege, there is also an authorization to grant that privilege. So, you can, for example, give a privilege to read to somebody, but not allow him to grant the privilege on to somebody else. So, to be able to grant it further, there is a different authorization. We will not get into all the details. Most of those are not used that much in practice. So, I will just look at the top level. In the previous slide, how to create user and create role? Assign a role to a user. You have a create role statement and then you can grant a role. Just like you grant any privilege, you can grant a role to a user. You can also drop a role. For a view, we have given privilege to someone for update. I will come to views in a little bit. Hold off the view question. Any other question? So, here is an example. Actually, the read authorization in SQL is specified using select. So, you say you grant select on branch to a set of users. Similarly, insert update delete in all privileges, as we saw. And the syntax is the same. Grant insert update delete all to on table to user. The revoke takes a privilege back. So, you can revoke privilege list on relation from user list. So, you can just revoke select, for example. You can grant all and revoke select. So, the remaining permissions will still be there. If you revoke all, then all privileges are revoked. If it is public, then nobody has that privilege unless they were already granted it individually. So, supposing I grant you a privilege, then I grant it to public. Then I revoke it from public. You still have that privilege because it was granted explicitly to you. I am not revoking it from you. I am only revoking it from public. In the exercises today, you can play around with this. In fact, if the same privilege is granted twice to the same user by different grantors, it should be grantees or grants. It should be grantor in this case. If you got the privilege from two different grantors, if one of them revokes it, you may still have it because the other guy granted it to you. Some privileges are dependent on other privileges, and if you lose one, you will lose the other. I have not shown you examples, but... No. Okay, good, good. You asked a good question. So, basically supposing A grants a privilege to B, B grants it to C. Now, if A withdraws the privilege from B, then B no longer has the privilege, and therefore the privilege is also withdrawn from C. You can look at the details in the slides of the book, but you should have a grant chain from the administrator or the creator of the table, whatever the predicted unit is. There should be a currently existing chain of grants which lets you access it. If the chain is cut somewhere, you lose it. It cascades in some sense. So, you lost it. You had granted it to somebody. That is also revoked. It is a cascading revoke. Yes. Roles avoid this cascading effect. So, what happens is supposing you were the manager, you got a privilege, you granted it to somebody working under you. Now, you leave the organization. So, the privilege is revoked from you. That guy you granted it to is still in the organization. It will be revoked from him. Whereas, if you had used roles, if the privilege was granted... By the role. Yeah, by the role. You were assigned to the role. Yeah. You were in the manager role. You granted the privilege to the employee. You go away, it doesn't matter. The manager role still has the privilege, and you granted it as manager, so that person continues to have it. In practice, so this is a very reasonably complex set of rules, but at least for web-based application, what I have seen in practice is people use it in a very simple manner. They don't use such complex chains of grants for web-based applications, whereas the application does everything. But if you directly give access to database relations to individual users, then you need to worry about these issues. So, now we come back to views. So, an authorization can be given on the view without any authorization in the underlying relation. Granting it on the view does not mean you granted anything on the underlying relation. So, you can use this to enhance security by defining views specific to specific users. We discussed this yesterday, where each branch could have a view which allows them to see data pertaining to their branch and not to anything else. So, you can have a relation-level security plus view-level security to get finer-grained access control. So, note that at the relation level, there is no way in SQL today to say that I will grant you access to all the rows of this relation where branch name equal to pavai. That's not possible. However, I can create a view with the selection branch name equal to pavai and grant you access to that view. You have to use that view in your code, though. This is actually a shortcoming of SQL, the fact that you are not able to grant privileges on subsets of rows is a shortcoming. And there have been proposals including some things which I worked on to extend this. But as of today, it's not there as part of the standard. Yeah, you can do it on a column, but not on rows. There are other things. Oracle has something called virtual private database which lets you add warehouse conditions to specify that and say that whenever this user accesses this relation, you really add this condition to that warehouse. Yeah, it's like a view. The difference is that the user doesn't access the view. Their query is still in terms of the original relation. It may be the same query. The application which you are using is running on behalf of you or on behalf of me. It should be connected to the database as you or me. And it issues the same query. But when the database says that this connection is for you, it adds a particular warehouse. For me, it adds a different workload. So we see different results depending on what we are authorized to see. It's called Oracle Virtual Private Database. There are similar things implemented in Informix, sorry, Cybase, Row-level Security. I think Informix also has something along these lines. SQL Server doesn't have this. It has some other model of Row-level Security which is slightly different. They have access control list. It's not based on a predicate. It's a different model. DB2, some versions have something like this, but not all the versions as far as I know. Maybe they have it today, but as of a few years ago, their mainframe DB2 had some of these features, but not regular DB2. So this is another example of a view. Customer loan. It joins borrower loan and links branch name and customer name. So you don't know what loan this, links up this customer with this branch. The loan number and amount and so on, all those details are hidden. And you can grant this to a clerk who needs to know which all customers are there at the branch without knowing what are the underlying loans. So the case where the clerk can run select star from cuss loan and get back. Now, as I said again, if you have a client server environment where the user is actually running with a database connection in their name, all of this makes sense. If you have a web-based environment, most of these don't make sense. Although Oracle VPD kind of things can be used even in a web setting. It's designed to be able to recognize a web user and then add a corresponding authorization depending on who is the user. These are also useful for reporting tools. Generally, people don't write SQL queries. You guys write SQL queries, but which user writes SQL queries? Nobody. Everybody uses applications. So these can be done on the output of a relation. So the application runs the query and if you don't have authorized into a relation, it may be rejected. So even if you use it through an application, as long as the connection to the database is in your name, this control policy is enforced. In a typical client server environment, suppose the connection is established, user fires a query. Now the thing is in the database code, database fires that query, but by that time the result, it is committed also and by that time the results are going back to the user, connection is lost. So how to handle this type of situations? It is committed, but the user doesn't know that. That's a good question. The way to do it is you would have something which is part of the transaction which records that this thing ran. So if you go to Indian Railways reservation system, you will find this in action, the IRCTC thing. They not only allow you to do a transaction, they will allow you to look at your past transactions. So if the network fails somewhere during while the thing is running, you don't know whether it succeeded or not. And they don't know whether you saw it or not. But you can always log in again and go to the history thing and see what booking you made. So you, the user can have a final level thing to see whether that thing committed or not. So this concept is basically what you use to deal with crashes that happened just before or just after the commit. You don't know when it happened, but you can go back to the database and see what happened and then make a decision. So you should not blindly rerun the booking. You'll have to see if that booking succeeded or not. And in many cases, what you do is you have a transaction ID for the booking. Here, the time when the transaction was run serves as the ID. The user is not giving a specific ID. You have a messaging system where you get a message and execute a transaction. So there is a message ID which came in and you can log that as part of the transaction. The transaction can do multiple updates. So it can write to a log relation. And so now if the message is still in the queue, it can delete the message from the queue actually. If the queue is also in the database. But let's say you want to check whether this message was successfully executed or not. You can look at that log relation and see what happened. No, the idea I got from you only because yesterday when you are logging to that VNC session, after giving VNC session password, you are able to connect to that specific session. Is that kind of thing available for client-serve environment also? If the session is lost with some kind of password, you can reconnect. Can you go back to the old state? No, I don't believe that is the case. When you connect again, it's a fresh connection. The state there is in the database. See in VNC, the state of this windowing system. The session is actually on the server. The session is there at the server. The server did not crash. It was only the communication link which crashed. In this case, the state is what is in the database. So when you connect again, you can see the state of the database. That's it. You cannot see the state of your connection. Because in VNC, the machine is actually working as a display password. It's like you switched off your display and then again, it will come. Correct. But nowadays, this application server also implements all these asset properties. So if your entire transaction was a part of a transaction in the application server, then that also takes care, whether it was completed or not. So let me answer your question in a different way. When you use a web, after every request, your connection is actually terminated potentially. It's not kept around. So when you go back to the web server, it's like a fresh connection. However, you still see the same thing. The server still knows what was the state when you left off. So the server is maintaining the state for you. So if you refresh your page, it's a completely new request. But there is a cookie which tells the server who you are. So it gives you back the current state when you refresh the screen. So that deals with transient network problems. The answer to his question was, which I gave was, what if there is actual database crash at a critical point? Then what do you do? Not if there is just a failure between the web server and the client. If you were running in client server mode, where you're directly running queries on the database, what if there was a crash just at that point? Then you don't know what happened. As long as you have stored stuff in some relation, you can go and look at it. So that was a small example. You can have triggers not just on update. You can have triggers on insert, on delete, and update which we just saw. You can restrict the trigger to changes to specific attributes. The one which we just saw, sorry, this is going to be executed even if you did not change the balance field. Even if you change the branch name field of an account, this would get triggered anyway. Of course, it's unlikely that you'll change the branch name. You will almost surely not change the account number. So in this case, it doesn't matter. But in some cases, there may be a frequently updated column which you don't care about, but a rarely updated column which you do care about. So you can say after update of balance on account. So the balance attribute of the account relation. And as I said, referencing old row as referencing new row as. So for deletes, you can, old row tells you which rows were deleted. For inserts, new row tells you what was inserted. For updates, you have the before and after value of each row. You can even have a trigger which is activated before an event. So if somebody tries to read a value, you can convert null to zero, for example, or the other way. This is not, I don't know how useful this would be, but you can do it this way. Before update on R, you set the new row.phone number equal to, when it's blank, set it null. So what this means is, if somebody tried to set the value to blank, you are going to set it to null even before the thing is executed. Now, why should you do it before it is executed? Well, supposing it was the other way, value was non-null and somebody is giving a null value for it. You can say before the update, if the new value is null, change it to blank, the opposite of this. So if there was a constraint which says that the value is not null, if you tried to do this afterwards, it's too late. The transaction would be rolled back because of the not null constraint. Whereas you did this before the update actually happens, the update would never set it to null. It would set it to blank and then it would succeed. So that's where it before might be useful. There's also a variant which lets you do handle updates, a whole set of updates at a time. So the previous trigger handles each row update. So for each row that is updated, this whole, this thing is executed for each row, one row at a time. Now if you load a whole table with many, many rows, this can be inefficient. Now this is unlikely in a bank situation, but there are other situations where you want to, when you load a table, you want the trigger to be executed. In most cases what happens is you will turn off triggers before doing a bulk update and then turn on triggers. If you don't want the trigger to execute, you can deactivate specific triggers. However, if you want to do a big set of updates and the trigger should be executed, it may be more efficient to use for each statement trigger instead of for each row. And then each statement table will actually get something like referencing old table as and new table as. So you get tables containing all the old values and all the new values. With that, with these tables you can write an SQL query which can in one query do whatever is required for each of the updated rows. So it can be more efficient. It's an efficiency issue. You could always use for each row, but this is more efficient in certain situations. So now the motivating example I gave you was when the inventory level falls below something, trigger and alert. Now would you want a trigger to actually go and do an external world action? Should it go and create a purchase order? Should it send an email? Probably not. Then it's impossible to get transactional properties if you do such things. What if the order is sent, the purchase order is created and sent off and then the transaction bots. Transaction always a bot. So the moment you do an external world action you cannot roll back the external world action. So you lose the transactional properties right there. So a safer alternative is not to do any external world action as part of the transaction. But instead put the action to be executed into a table like a message table or a reorder table. So the trigger simply puts a row into that table and then you have an external process which periodically scans that table, finds any row in that which needs processing and then processes it. In this case the orders table contains orders that are to be placed and the trigger will actually update this. Now there are other tables, the inventory item level, minimum level. This is the current level, this is the minimum level. This is the amount of that item to be reordered each time it falls below the minimum. So you can write a trigger using these and add a row to orders which an external process will be reading and actually placing the orders. So now you have transactional properties. The external process will take over only after this is committed. It will not see an uncommitted row here and therefore it will not place an order until this is committed. So here is this trigger, create trigger, reorder trigger after update of amount on inventory. So whenever you issued an item, the amount is updated referencing old row as a new row as a new row for each row, this is standard syntax. See what is going on here. When the new rows level is less than the minimum level for that item and the old row level was greater than the min level for that item. What does this mean? This is the transaction which brought it below. See what will happen? Supposing the minimum level you want is 10, a transaction moved from 11 to 8, another transaction moved from 8 to 6, you are going to update it. When it goes from 8 to 6, you will issue one more order immediately. So you will issue multiple orders. You do not want that. Only the transaction which made it go immediately below should result in an order being placed. So when this condition is satisfied, so this is a when condition begin, insert into order, select item amount from reorder where reorder dot item equal to old row dot item. So the reorder tells you how much to order and that is dumped into the orders table, a new row in the orders table. This does not, you know, in reality you may want to add some more fields into the order table such as a sequence identifier so that the external world action can keep track of what all it has dealt with and what it has not yet dealt with. So in this case, the external world action just sees rows of a table and when it is done, it should probably delete the row from the orders table to record that it has been dealt with. But more realistically, you may have a time stamp and use it to track which have been dealt with and which are new. So this is an example of good use of a trigger. There are ways around this. For example, if you could encapsulate the whole thing, if you used object-oriented programming language, you will know that it is possible to prevent direct access. You make the attributes private and then provide methods to read or write that. So instead of using a trigger, you could use such a mechanism to say whenever the procedure, the function, actually it is called the method in Java or C++, the method which updates the item should check if the value is falling below and then do the reorder. So you do not need a trigger if you do this because by preventing others from accessing the value directly, you are guaranteeing that any update to the value must be done by this particular method and that method can take care. But in SQL by default, that is not there. SQL does not have a way of encapsulation. It allows anybody who has access to the table to go and update that thing. There is nothing which forces them to use a particular method to update the table. That is unfortunate, but that is how it is. There are probably good reasons for it. So given that, the next best thing is to use a trigger to detect when a change has occurred and then deal with the change. So triggers were used for many things. In earlier days and even today, they are used for maintaining summary data. For example, if you want a relation which keeps the total salary of every department. Whenever, when will the total salary change? You want to store this, compute and store it. When does the total salary change? Whenever an employee is added, deleted or the salary of an employee is updated. So in each of these cases, you can have a trigger which goes and updates the pre-computed total salary. You also use triggers for replication. Whenever a relation changes, you can record the change in another relation which is called a change or a delta relation. And then the replication process will look at the delta relation, see what all is new in there and then copy it over to the remote site where that change is applied. So this is also valid use for triggers. There are better ways of doing this. Many databases today provide the built-in way of doing replication by using the recovery logs. That is probably a better way than using triggers, but others don't. With PostgreSQL, you can't access the logs, but you can define triggers and in fact, the replication systems for PostgreSQL, the thing called SLONI, creates triggers. When you tell it that you want to replicate a relation, it creates triggers which will and it creates these change relations and any update to a relation is also logged into the change relation. And then the replication code reads it and copies it over to the remote site and applies the change there. So underlying all that are triggers. But you don't have to write that. The replication system is available which will create that code for you. You don't have to write that code. Similarly, for summary data like this total salary for each department, in PostgreSQL you would have to do it like this, but if you use SQL server dv2 oracle, there's something called materialized view which we will see later today. And that will take care of it without you require to write triggers. I'll come back to this. We'll stop here for the break.