 So, where we left off yesterday we had done a quick tutorial of SQL, today's primary goal is ER modeling, however I need to cover a little bit more of SQL because today's lab in the afternoon is again SQL, so there were a few more things in SQL, there is a lot in SQL, the current SQL standard back by around 1992, 1992 had already gone up to several thousand pages, 2000 pages and it has been increasing steadily ever since, they broke it into parts because it looked very embarrassing, so it goes into I do not know how many thousand pages today and there is a huge variety of features in the SQL language. Now not all databases support all features, so the parts that we have been covering are standard SQL features which everyone implements, now the set of things which everyone implements has been steadily increasing over the years. So there are many things which we are not going to cover because of limited time which database systems today do implement and chapter 5 has coverage of many of those things, but we would not have time to go into it in this workshop or even in the main workshop we will have a little bit of time we will skim the surface on chapter 5, just to give you an idea of what are the new features which SQL implementation support these days. But today we are going back to simpler features in particular we are going to look at the outer join operation, so what is a join expression in SQL, we already saw the natural join operation, the outer join operation seeks to match rows just like the join, but there is a difference from the normal join operation, what is the difference? The difference is that if one of the tables has a row which does not match any row in the other table in the normal join it is discarded, there are many situations where you do not want to do this, but let me give you an example, we saw yesterday a query to find out the number of instructors in each department, now supposing I want to get a list of departments with the count of instructors, we could write the query as we did yesterday using a sub query, supposing I want to get a list of departments with their associated faculty and if a department does not have any associated instructor I still want the department name to appear in there, so what would I use, I would use an outer join operation and we will see the syntax shortly, so an early version of SQL this was not supported, you could actually get the same effect using more complex SQL query, you could take a join and then in a separate query you can find tuples in the first relation which do not match anything in the other relation, so then what do you do with those tuples, you union them with the first set, so a join, so let us look at the syntax for outer join and then we will see what it exactly does, so the basic join expression in SQL in the from clause is a Cartesian product which of course is usually a bad idea and it requires you to add a join condition and the problem with this is that it is easy to make mistakes and forget the join but it is okay, people have got used to it, but if you want to write it from scratch you can write it differently, so let us take a slightly different example which is course and prerequisite, so supposing I want to show all courses along with their prerequisites, now not all courses have prerequisites, so I still want the course to appear with no prerequisite shown against it, so that it is clear to people that there is no prerequisite for this course, so here is a small relation course with only 3 roles and a prereq with 3 roles, you will notice that bio 301 has a prereq, CS 190 has a prereq but CS 315 does not have a prereq, on the other hand there is CS 347 here which is not in the course relation which actually violates the foreign key dependency, but just for the purpose of illustration let us pretend there is no foreign key dependency and this is the database that we have, so now let us look at how outer joints work, so this first query is course left outer joint 3 req, what is the left outer join mean, it means the relation on the left side in this case course, even if it does not, if a particular row in course does not match any row in 3 req, in the normal join it would be thrown away, in the left outer join the rows are preserved and they are retained, but of course you do not have a matching value for columns from the other relation, so what you do with those columns you put a null value, so in this case this is a natural left outer joint, so let us just see the schema in the previous slide, what is the common attribute here, course ID, so natural join equates course ID, note also that course ID is not null here, if you see the course ID from the left relation is 315, pre-req ID is null, there is no matching tuple, so if you actually looked up the course ID from the pre-req relation for this third tuple it would be also null because there is no tuple there, but because this is a natural join it takes a non-null value, the column appears only once, remember natural join column appears only once, so the non-null value is taken for if you split this course ID into course dot course ID and pre-req dot course ID, pre-req dot course ID would be null, course dot course ID would be not null, but since they are merged we keep the non-null value that should be straight forward, now there is a symmetric natural right auto join which preserves rows in the right hand input, in this case pre-req, so there is a row in pre-req which is CS 347, CS 101 which does not have any matching course, so you will observe that it appears here with null for all the fields from course, which are not there in pre-req, course ID is now not null because it is in pre-req and that non-null value is taken for course dot course ID it would be null, but it is merged into the non-null value and finally there is a full auto join which preserves rows in both relations, so the first two tuples as before are from the inner join, so not that the regular join is called the inner join to contrast it with outer join, so the first two rows here are common in all the relations they are the inner join and then the outer join adds extra rows, in this case all rows from both sides are preserved, so we get those two extractable, any question? So, the SQL syntax for this they could not quite put it in the wire class, in fact oracle tried to do this, oracle those of you who have used oracle may know about the syntax where you say plus equal to, it is actually a very clumsy notation and oracle eventually has given up on it and they have they still supported for backward compatibility but they discourage you from using it, so don't use it, this is the right syntax which is supported by all the major databases, so now the general syntax for outer join has to say which rows have to be preserved, so it turns out that in the standard SQL from clause you just give a list of relations, there is no ordering, what order you list I mean really does not matter except for the order in which attributes appear in the result, other than that it has no impact, for outer join the way you do the outer join, so supposing let me write it here supposing you did R and let me show the symbolic notation for outer join here, R left out sorry R left outer join S, left outer join P, so this is a natural left outer join in the relational algebra syntax, supposing this is your query, now in what order do we do the outer join, you could interpret this in two ways, one is like this and the other is, now if you did this with natural join it doesn't matter which one you use, whether you use this one or this one the result is exactly the same for natural join, it doesn't matter, however for outer join it does matter, if you did this you will get a one result, if you did this you would get another result, so what is the difference here supposing there is a T tuple which does not have any matching S tuple, but it does match an R tuple, in this case the T tuple could match the R tuple in get output, in this case what would happen if it does not match any S tuple it would be eliminated and it would not appear in the final result, because it's a left outer join not a right outer join, so these two are different and therefore the syntax of the query itself has to make it clear whether you're using this or this, S tuple actually has a default, if you say nothing it assumes that, but at the least you should be able to write it out in order, if you write this using the old oracle syntax, it's not clear which you mean, there's no way to clearly specify whether you meant this or that, so the syntax now explicitly asks you to list the join almost as if it were relation algebra in the from clause, this is where the natural join also came in, remember we wrote the natural join in the from clause with parenthesis, all the join expressions in SQL follow the same basic principle that you have to list it as an parenthesis expression, the parenthesis can be left out but it's implicit, you have to do that in the from clause and we will see examples, but the syntax is fairly general, it has two parts, the first part is a join type and the second part is a join condition and you can mix and match between these two, so what are the join types in a join which is a regular thing, there is the left outer join, right outer join, full outer join, the join condition you could have a natural join, what does that do? Matches all columns with the same name, but it has one extra effect which is it removes duplicate column, on predicate applies the condition and only allows through matching rows which satisfy the condition, using a 1 through a n list of attributes is exactly the same as natural join except it's only on the attribute which is specified, the other attributes are not equated, they stay as is, so that's the, these are the two types, the join type and the join condition, you can mix and match and get any expression you want, so here is an example, course inner join freereck on course dot course id equal to freereck dot course id, this is the standard inner join, you could have written it as from course comma freereck where course dot id equal to freereck dot course id, in contrast you write course left outer join freereck on course dot course id equal to freereck dot course id, now supposing you instead did the following, course inner join freereck and we need some on condition, say on true which means all rows match where course dot course id equal freereck dot course id, this is the normal SQL, this is what happens in effect because it takes a cross product logically, the optimizer will try to evaluate it in a more efficient way but this is normally what happens, now supposing instead of saying inner join I said left outer join, what will happen, so the first step is every course and freereck pair is generated, whether it matches or not and then the filter is applied but what happens, does this give you the left outer join, no why does it not give you the left outer join, you are applying the condition of course dot course id equal to freereck dot course id after the left outer join, in this case the left outer join did not really have an impact because anyway everything matches but if you had a condition here instead of on true, I put some other condition here, you would get the left outer join but then this condition may eliminate all those rows which had null on the right hand side, in this case there would not be any but if there were a condition here, if freereck dot course id is null this condition would eliminate all those rows, so the point is if you want a left outer join, the condition has to be part of the outer join specification, the on class, you cannot do it in the where class, that is the important point to note, is this clear, so whenever you specify left outer join, right outer join, full outer join, the condition should be specified right there not in the where class, do not put it back in the where class, so moving on, the right way to do it here is course left outer join freereck on that condition and then there are the variants which we discussed, we have the course natural right outer join, course natural left outer join and course right outer join freereck using course id, which in this case is the same as natural join because that is the only common attribute but in general it may not be the same, this should have been full outer join, it is not right outer join. So here is, here are a couple of questions, you do not have to actually answer it with a clicker but just for you to think about, are s left outer join and s right outer join the same, no of course not, very simple question, then the next one, which of the following give exactly the same, are natural join s, r join s using b, r join s on r dot b equal to s dot b, again a very simple question, exactly the same result meaning the same set of columns, remember natural join and join using both retain only one copy of a column, if the same column is there in both retains only one copy, whereas on retains both columns, whether they match or not both columns are retained. So it is not b and c, a and b, a and b are the same, c gives an extra column, which in the case of, in this case it is a regular inner join, so both will have exactly the same value but they will be that way, r dot, what is the common, r dot b and s dot b will both be there. In the case of left outer join, supposing we had the same thing, but c part is r left outer join s on r dot b equal to s dot b, then you may have rows where r dot b is not null, s dot b is null, you would get that. So that leaves us just a couple of small topics, which we will wrap up SQL with. The first is view definitions, you are all familiar with view definitions, we saw the width clause yesterday, views are more or less the same except that they are persistent, they are stored in the database. Once you declare a view, it is like a regular relation in the database, almost like a regular relation, why? Because its contents are not necessarily stored, it is just a definition which is stored. If you use it, the view can be computed on demand, in fact it need not be computed fully on demand, in fact that is not how it works. The way the database is handled views is they take the view definition and expand it into the query and we will see how that works in a moment. So let us take a view, create view faculty as select id name, department name from instructor, what is this doing? It is just removing the salve attribute. This is one example of the motivation for views which is to hide things from certain tables. It turns out that the SQL has a means of allowing people to see certain tables and not others. So this is called access control. Access control in SQL is at the level of an entire table or on certain columns of a table and it is very useful in certain ways. For example, if you build an application, you have different people managing the application, you can predict this only certain people have full right to it, other people have only read rights and so on. However, the SQL access control mechanism which was originally meant for end user who are using a database system, these days is not all that useful because on most databases the front end is a web or some other tool which does not actually take end user logins. It uses some internal logins. So the access control mechanism, SQL are not useful for web applications typically because web users do not correspond to database users. So there is a whole means in SQL which many of you might know for granting privileges and views can be part of that. So if you want to allow people access to certain things, you can create a view and grant them rights to view but not to the underlying relation. In this case, the view simply hid a certain column but you can have it differently. You can create a view which hides certain roles. So maybe you want to reveal the salary of certain people but not others. You can have a query which only outputs the salary of those people and then that can be exposed while the salary of others is hidden. So that view can be made available to a user but not the original relation. So views are still very useful for access control but not for web applications. Okay, now coming back, here is another example of a view. Oh sorry, before that, here is a use of a view, select name from faculty where department name is biology. So what happens here? Conceptually this faculty relation here is replaced by the view definition. And in fact it's very easy in SQL today. You can have a subquery in the frontcloth. So you can just replace faculty by that whole query and you're done. That's a valid SQL query and then that's what it's executed. There's other kinds of views which are also very useful which is to get aggregate, not the raw data. In fact this is very common where you do not want to expose raw data but you do want to expose aggregate to users. And so this is an example where you don't show the individual salaries of instructors but you are willing to show the total salary budget of a particular department. So select department name from salary from instructor group by department name. Now what happens to a department without instructor? Yeah, that department is missing from the instructor table, therefore it will not appear in this result. So supposing you did want it to appear, departments with zero instructors wanted to appear with zero budget. What would you do? We have had such department site. When a new department is formed, sometimes it's created with nobody in the department actually. There's somebody from some other department who is initially in charge of recruitment or whatever. So there can be periods when this happens. Yeah. So we can use left out. We saw another way yesterday which was to write a subquery in the select law. The other way is to use the left auto join. So how would you write this query? We want department left auto join instructor. So just for variety, let me write it using relational algebra. So department left auto join instructor. This is going to give me every department with null for all the instructor fields if there is no instructor. Now what do I want? I'm going to use the gamma or the calligraphy operator. And I want to group by department name. And what is the aggregate that I want here? Some of salary. I can write this in SQL also. So syntax should be what? Select department name comma some salve from department left auto join instructor. If you say natural left auto join instructor group by department name. Now what will happen? If a department had no instructor, the set would be empty. It would have only a single element, sorry not empty set. The set has a single element which is null. And that null is eliminated. You get an empty set. So what is the sum of an empty set? 0 null. Here is the lab assignment. Write this out today. We saw yesterday what to do. If we want to replace null by 0, we saw what to do yesterday. No, NVL is oracle. In standard SQL there is a way which is supported now by everybody. You can use coalesce. Coalesce something let us say some salary comma 0 would take some salary if it is non-null. If it is null, it goes to the next one which is 0. So that is what if you do not use NVL. I think oracle also supports coalesce. So try this out in the lab. So there are other examples of views, view expansion. I will just skip all those. I will just show this slide which points out that you can define one view using another and then use that to define a third view and you can have a cascade of views like this. So the view expansion basically replaces, it takes a given query. If there is any view in it, it replaces by its definition. That in turn may have another view in it. Again it replaces all those views by their definitions and keeps doing it until there are no more views left. So that is the basic algorithm. Find any view relation, replace it by the expression until no more view relations are present in e1. Now as long as the view definitions are not recursive, this will terminate. Now what do I mean by recursive? Supposing I define a view in terms of itself. Is this even a good idea? Why should we allow it? It sounds like a bad idea. This algorithm will never terminate. So should you even consider it? Should you just ban it and be happy? Does it make any sense to have recursive view? What? Let's decide this democratically. How many words for having recursive views? Nobody. One. How many words for totally banning recursive views? Why? Nobody is voting. Zero. Okay, one. Poor water turned out. Election cancelled. Okay. So of course this is not democratic. So there are in fact uses for recursive views which were not supported by the SQL standard earlier. So there were a whole bunch of researchers working on recursive query processing before it came into SQL. And in fact my own PhD was in this area, recursive query processing. How to efficiently evaluate queries where there are views which are defined recursively in terms of density. There is a section in the book which in Chapter 5 which we don't have time to get into. But it gives several motivating examples for recursive views. So one such example is if I have a chain of managers. Most organizations, there is an organization hierarchy. Each person has a manager. The person they report to. Eventually there is a boss, the CEO. And who does the CEO or the director report to? Sometimes there is a board or for private institutions there is an owner. The owner doesn't report to anybody. So you have a natural hierarchy. Similarly there are parts. So you may say that a car consists of these parts. An engine, a door and so on. Engine itself consists of these parts. Now one of those parts, engine itself consists of subparts. So you have a hierarchy. So hierarchy is one example where supposing I want to say find me the bill of materials for this part. What is the bill of materials? This is an individual part, indivisible part from which this thing is constructed. I have this information in terms of hierarchy. This part consists of these parts. So many numbers of these parts. So a car has, you know, four doors, four wheels, one engine and so forth. And then one engine has maybe so many subparts, cylinders, pistons, head assembly, whatever you name it. You can have all of this represented in a database. How do you represent it? You could, for example, represent it like this. Subpart, part, subpart, number. So this part, say car has, you can say car, wheel, four and so on. Car, engine one, car, sorry, engine, something maybe four and so forth. So I can represent it as a relation like this. Now supposing I want to say find me all the bottom level parts of car. How many are there? So there is a way to write this which SQL standard supports this and several SQL databases support it. Some of them have slightly varying syntaxes for this. Oracle has its own syntax. PostgreSQL has its own slightly varying syntax and so forth. But most database systems today support at least some form of recursion. And such queries can be written in SQL by defining a view which says, which is defined in terms of itself. So how would you define such a view? You can say that a view, for example, contains, this is just a base part. That is the individual one, which you can't break up even further. This is the view which I want to define. So how would I do this? There are a few more relations. I won't get into the exact syntax for this, but let me give you an intuition. Supposing I know that this, I know what are the base parts. So if I know that actually this number part is harder. So let me cut this out for the moment. A little trickier to define it. So let me skip that for the moment. So supposing I want to say that a part contains a base part. So if I have a subpart, I know that something is a subpart of another directly and that part is a base part. It does not have a further subpart. So I have a relation base part, which says these are the ones which do not have subparts. Or I can infer it by saying here is a part which is not in the subpart relation. So either way, I know that this does not have any further subpart. So this is the kind of thing that I want in this output. So I can say that it is contained in this if it is directly subpart. Otherwise I can have two levels. I can have essentially a join between subpart, part one and contain, this is again just intuition. This is not any particular syntax. So I have a structure like this, where I have a subpart of a particular part and its subpart part one. And I know that part one contains a base part. I have already inferred this somehow. Then what can I infer? From this I can infer, contains, part, base part. From this I can infer, contain, part. There is syntax in SQL to write such things. And then there is a means of evaluating it also, reasonably efficiently. So I will just stop here as far as recursive views are concerned. But you can go back and read it up from chapter five. So you can't obviously do it by view expansion. There is a different mechanism for handling this. And there is even a mechanism to handle the count of numbers and so on, which is a little trickier to define. So I won't get into that. Finally, to wrap up views, can you update a view? What does it mean to update a view? So the first thing which you could say is don't ever allow view update. Views are defined in terms of other relations. If you intend it to update something, go intend the underlying relation. Don't update the view. So that's a first cut solution. But then in practice, people found that there are many situations where somebody is given access to certain roles of a relation with some condition. But they should be allowed to update those roles. So then practically speaking people found that it is useful to allow people to update views under some limited conditions. So in general, so you cannot actually update a view. All you can do is take an update defined on the view and translate it into an update on the underlying relation, such that the view will reflect the same. So if I had a selection, I can only look at and modify faculty in computer science. That's an instructor relation, but I am allowed to view or update things in computer science. So I cannot directly go write an update on instructor. I can write update on the my view of instructor, which is only computer science. And that will be translated by the database into an update on the instructor relation. So it turns out you can't always do this. But there are some limited cases where you can in fact do this. And the slide over here which I won't get into detail talk about this. For example, if you have a join, it becomes very difficult in general. If I update the result of a join, how do I know what update should do to the underlying relation? It's not well defined. There are many ways of doing it. Which do you choose? The answer is don't allow it. But if it's a simple select query, I'm just seeing a subset of the rows, it's usually a lot easier. So those are the cases which database support. Initially the SQL standard didn't say much about it. Today the SQL standard does say that in certain simple cases you should allow updates through view. So I'm going to skip the details here. There are slides here on which show examples of when it can be done uniquely and when it cannot. So for example, where it cannot, I have a view history instructor, a select staff of instructor where department name equal to history. If we try to insert into this somebody in the biology relation, what will happen? It can never appear in the view result because this is a biology, not a history. So you cannot do this. You should not allow this update. So many SQL implementations will check even for a view like this. They will check if the condition is satisfied that department name equal to history. If it's satisfied they will allow an insert. If it is not satisfied they will reject it. It can be done for, it can be done only for the history department. That's the answer to that quiz question. So we are talking about the privileges, granting privileges to that particular individual users. So is there any possibilities to grant such kind of privileges based on the domains, based on the fields, for each and every user? By fields you mean the value of a field? A domain I'm talking about. We are talking about views and create view on the selected domains. So when we are using granting and all we are permitting insert, update such kind of privileges to each and every user. Is it possible to grant such kind of privileges along with that particular field, on particular domains? On particular means? On the type itself? For example register number like that. For example faculty ID like that. Sequence number. Yes. Most SQL implementations have privileges for every object which is in the database. Even for the privilege of creating a foreign key. The privilege to use a sequence number. What is the sequence number? So there are many applications which need a sequential count which ensures that no two tuples have the same value. And the sequence number is a facility provided by many databases which lets you say use this value and it automatically gets incremented when it's used. So when you insert a new tuple you can say that this column should take a value from this sequence number and each time you add a new tuple a new value is derived from that sequence number. So there are privileges on that. There are privileges on views. So I don't have time to get into the full set of privileges but the SQL privilege system is fairly expensive. It's very good except for the fact that it does not have a web application user. It only has a notion of poor database users. So it's useful in that context but not in other context. Sir, typically updation of a view would be in regard with addition of rows or columns. No, no, not that way. So I think you may send us to what I meant by update a view. By update of a view I meant insert a tuple into a view relation or delete a tuple or update a tuple. So I'm altering a view to change the view definition. So altering a view is effectively the same as dropping the view and then recreating it. You can treat it like that. So because there are no tuples stored with it normally. So you can just delete it and recreate it. Well, of course there are some issues. If somewhere other view uses this view you can't just drop it. So the alter view in fact can take care of. So you can have a alter view. I think the SQL standard does support it. Dropping it and creating it may affect other views. So ideally you should have an alter view which simply changes the view definition without affecting other views which depend on this. I think you should check this out. I have not actually tried this on Postgres but you can try it out. Difference between with class and view. Difference between the with class and view. So as I said yesterday the with class is like a temporary view defined in the context of that one query. It does not go into the database schema. A view is stored in the database schema like other relations. That's the main difference. Does a view take a memory byte separately for the view data? Memory part. Memory byte. Memory byte. Memory memory. Memory. Exactly. What is the data for this view stored? Is that the question? Is there any mapping from the original table or view is separately existed in a the default is that the view data does not exist at all. All that is stored in the database is the view definition. That means view is the mapping. That text is what is stored and when you use the view it is expanded inline. That is the default. However it's good to ask this question because many databases today support what is called a materialized view. Let me just write the term here. So what is a materialized view? It's a view for which the data has been computed and is stored. So if I use the view I'm not actually computing the data. It's already there. I'm just looking it up. Most databases today I think PostgreSQL doesn't yet support it. Maybe it will in future. But the commercial databases do support it. Oracle SQL server DB2 and so on. This is in contrast to what I just told you. I just told you that view definition is stored. No data is stored. Now I'm saying in a materialized view data is stored. So there are some issues here. Supposing I'm storing the data and now I update the underlying relation. So I have a view defined in terms. So let me give you an example of the most useful so look at the query at the bottom. View of department salary totals. Now supposing I have a lot of instructors which is unrealistic but let's say that instead of instructors I have a view like this which is total sales of each item in a big retail shop. We have a lot of big shops like BigBetire, Chroma, Ezone, whatever. We have a lot of these. Now they have a lot of sales. Each shop must be selling tens of thousands of items a day. If you aggregate across all the shops of a particular retailer they may have maybe millions of items sold each day. And then if you aggregate across a year you have hundreds of millions of items sold in a year. Now supposing I want to know across all these shop how many copies of a particular thing were sold. How many copies of the database system concept book did Flipkart sell? Now I can go through all the sales data and select out rows, add it up. It could take a lot of time if I run on 200 million records in there. But this is a kind of query which is very natural. In fact many sites will show you some information like if you go to Amazon it will say this book is ranked they don't tell you the actual raw number but they give a rank which is for which you have to anyway compute the raw number. So the idea is that you pre-compute these things. So this result you pre-compute and store. So now if I want to know the sales of a particular item I just look it up. You also build an index on this result. So I just look it up. So what they mean by build an index if you are not familiar I will come to it. But it lets you very quickly get to a particular row of that table and return that answer very quickly. So such views which do aggregates are very good candidates for materialization. So the issue is that if the moment one more item is sold the view is now out of date. Because one more copy of the book was sold the view recorded it as 100 copies now the 101st copy has been sold. What to do about it? There are many possible answers. For the sales do I really care about the up to date sales as of this minute? Probably not. If you tell me the sales up to last week was this much I am happy with that. If you tell me the sales up to last night was this much in others I may be happy with that. So what does this mean? I can re-compute the result once in a week or once in a day or whatever and store it. But there are some applications when I need up to date results as of now I want the results. In which case the moment I update an underlying relation I add one more sales item I need to update the count of total sales of that item. As part of the same transaction this update is done automatically that is done. How do you do that? In earlier days a standard we are doing that was by using trigger. You could define a trigger on sales every time a sales occurred it will find a corresponding row in the total and add it to the total. It turned out that this is actually a pain to keep track of do it properly. What if there was a sale done by mistake it was deleted so you have to undo that. Deletions have to be taken care of. What if a sale was updated? Instead of 5 it was updated to 3. So for each such thing in search delete, update and so on you have to write this trigger code carefully to update the total sales. Expecting users to do this correctly is a pain. If they make a mistake they are giving wrong information to the user. So most databases found that initially started off by using triggers for keeping these aggregate results up to date but later they provided separate syntax which is actually much easier for the user. So you could simply say instead of create view you could say create materialized view and then the database automatically stores that result. There are other syntax which other database is used but with the equivalent result that the view is computed and stored and can be used. And you can even create an index on the view which lets you efficiently access tuples based on certain conditions. So you no longer need triggers for that for those databases. In fact it's a lot more efficient if it's done by the database. Triggers are slow. The database directly takes care of this much, much faster. The question is how many views can you create on a table. That depends on the implementation. There's no standard answer to it. So go look it up for whichever one you're using. But I think it's a fairly large, I don't think it's a very tight restriction. I've not heard of anybody creating so many views that they run into trouble but yes, there may be limits. So there's this area called online analytical processing which focuses on giving users very very fast access to aggregate views, aggregates in different ways. So the aggregate we saw was a salve, some of salve grouped by department. There may be other things that you want. I want the salve per year. What was the salve this year, what was the salve last year, the year before. I want to see how the salve trend is going. So it's a different group by year, whether a year in department or just year across the whole institute, across all department. So there are many combinations of group by which one might ask for. So an OLAP system is designed to very quickly give you answers to such aggregate queries. So OLAP, let me write it down here. Again, there is material in the book on OLAP. So you can go read it up. OLAP is short for online online analytical processing. So analytical is basically data analysis for analyzing data. And the core operation analysis is statistic. Some count, media and whatever, the usual aggregate. So it's focused on very quickly answering aggregate queries over very large amounts of data. Now how do they do this? How do they answer queries very fast, essentially by using materialize. That's the core underlying technology. So one way is to materialize every possible aggregate view on a particular set of on a given relation or a given set of relations. Every possible query that an analyst may ask, grouped by every possible combination, you pre-compute everything and materialize it. So this is what initially people did. But then they realized that this can become very, very large. You don't want to store so much data. So then they said well, can we materialize certain things and then compute others from it. For example, supposing I have computed group by department, year from this, supposing I want to get the same sum but group by department. What do I have to do? I have to take the salary for each year, if I have this group by result, I take this and add it up across all the year. And then I will get the total salary of the, well, this is not salary, salary is just a six number, let's say sales. Sales by something else, instead of department, item, year. So I have this, now I can, let me just change this. So I want the sales group by item, I can simply take the previous one and add it up. Now how many topics do I have to add up? Typically, you know, these shops have been around only for five years. Even the oldest companies have been around only for a few hundred years and of course they don't have such old data. So if you add up maybe 20, 30 years, I can get this. It is quite fast. So they reduce the amount of things which they pre-compute and store but they can still give you results very quickly. That's the idea of OLAC system. So that's a big business in itself. Many companies need this kind of analysis. Why do they need it? They need it for forecasting what items to talk, how much to keep in stock, what items to manufacture. This is a huge, huge commercial interest in doing this because it affects profits of companies severely. If you produce too many cars, this happened to Maruti. They've produced a lot of cars, I think Altos or something. They were not selling but their desire and swift were backlogged. There was not enough capacity. And it's hard to go build a new plant for it. If they could have predicted this, they could have ahead of time they could have switched factory from producing, let's say Altos to producing swift and desired. They didn't quite anticipate this. It's hard to predict in general. How do you know the future? Like they say it's very hard to predict the future. So mistakes will happen but the goal is to minimize the chances of making such mistakes by using analysis. If you can see the trend, you could have maybe seen a trend that over the year, consumer preferences are slowly shifting from smaller cars to larger cars. If they had anticipated this trend, they could have caught it. But of course sometimes trends are changed by what the company does. Maybe the company created the trend by creating a nice product but they anticipate it's such a roaring success. But analysis is very, very important for this. So there's a huge market for OLAP online analytical processing system. They're essentially databases with extra stuff in there for deciding what to pre-compute and some of that is stored in memory, some of it is stored on disk and so forth. Now last, a few topics on SQL. There's a notion of transactions. We'll come back to transactions a little bit later, very briefly in the implementation issue. But if you first we need to understand what is a transaction. You have all done transactions at a shop. You go buy some item, you pay some money and you exchange it. You give the shopkeeper money, the shopkeeper gives you the item. That's a transaction. Now there are typically at least two parts of a transaction. One is taking the goods, the other is paying the money. If you paid the money and didn't get the goods, you would be very unhappy. If you got the goods and didn't pay the money, you would be happy but the shopkeeper would be very unhappy. So both should happen or neither should happen. You can walk out of the shop without buying anything. Or you can pay the money and get it. Either of those is acceptable. But one happening, the other not is not acceptable. So in the context of a database, similar thing happens. If I want to transfer money from one account to another, I want to create a deposit cash. I am handing over cash. It should be recorded or the person should give it back. So there are transactions which involve the external world and the database. I go to an ATM machine, I withdraw cash. Now, either I get the cash and my account is debited or neither. If my account is debited but I don't get the cash, I will be unhappy and vice-versa. Now implementing transactions in general is a complex task. It is done routinely but it is non-trivial. So for ATM, what all can go wrong? The machine might jam. It has counted the bill. It's pushed it out. Now it's jammed. You're pulling at the nodes. It doesn't come out. It's rare but could happen conceivably. What then? Or the machine could go bonkers. Power goes off. Some glitch. It has counted the cash. It has not dispensed it. What happens then? Now maybe your account is debited still by mistake. So what is done? In the real world, handling making sure that transactions are atomic. Since both parts happen or neither happens. It's actually quite hard when you have the external world. It gets fairly complex. In ATM, it goes to the point where the ATMs have a little camera and if a customer says, look, I tried to withdraw cash I didn't actually receive the cash. He complains to the bank. You have debited my account. And the bank will look at that camera and say what actually happened. And if they see the cash coming out, they'll say you're lying. We'll report you to the police. Or if the cash didn't come out, they'll say, okay, we will credit it. So it's fairly complex. It involves recovery. You should have a record of what happened. That camera image, the video, is a record of what actually happened over there. And based on that, you either refund the amount from the debiting of the account or you say, no, it's okay. Now all of these have counterparts which are entirely in the database. So external actions are more complicated. That requires a case-by-case handling. But there is a direct counterpart inside the database where you want these two updates to happen in the database on neither hand. For this, there is a standardized way of handling it internally. So that is part of what it means to be an atomic transaction. There's also an issue of concurrency and so on which we'll come to later. But the focus here is atomic transaction. So in the SQL standard, it says that you can run a sequence of statements. But only when you say commit to work will all those updates be reflected in the database. Till then, you can always say abort or rollback work, which means everything that was done up to that point to rollback. So one of these two should happen. If neither happens, the database crashes for whatever reason or you go away and the session is terminated, then the default should be rollback. That's the standard. In real life, though, this was inconvenient and most implementations don't do it that way. What they do is, by default, as soon as you run an SQL command, it's done. So if I want to update two things, I'm transferring money from one account to another, I update this, I update that. I update this, if the crash happens, the other won't get updated. That's a problem. The database is inconsistent. And the way to do it correctly is to tell the database don't automatically commit. So by default, most databases automatically commit, as soon as you run an SQL statement, it's committed. Done. But you can tell the system, don't do that. How do you tell it? It depends. There's an SQL 99 standard, which says begin atomic and everything in between will either be completely committed or rollback. It can't be intermediate state. But many databases don't support this. There's another alternative. If you're using an API, JDBC and so on, there's a way to say turn off auto commit. So by default, most databases have auto commit turned on. But if you want two updates to happen atomically, you turn off auto commit, do the two updates and then say commit or rollback. One of the two. So how is this implemented? We'll briefly look at it later. Here is, I just want you to be aware of this idea of what is this one. Now to wrap up SQL, we'll come back to integrity constraint. Somebody asked me about cascading integrity constraint. We already saw the not null primary key foreign key. There are other constraints like check predicate, which can be any predicate. I'm going to skip these because you've already seen most of these. Check is a simple thing. Just look at the bottom here. Check semester in, fall, winter, spring, it's one way of doing it. So this has a section with a semester. Now this was partly done to illustrate this feature. A better way to have done it might have been actually to have a semester master table and then the semester is a foreign key referencing that table. That will force it to be one of the values in that table. But then if I want to add a new semester, I can easily add it. Here, I can't add a semester without going and saying alter table, modify constraint, alter constraint. The syntax support is very non-standard. Foreign keys we already know but the point I wanted to talk about here is cascading action. So here is a variant of the earlier course declaration which says department name is there and it is a foreign key department name that references department on delete cascade, on update cascade. What does this cascade mean? If I delete a department, automatically all courses in the department should be deleted. If I update a department name, automatically the department name of all the referring courses should be updated. That's what this is. There are alternatives. I can say on delete set null, on delete set default. If I want to retain that information without losing it, I can still do that. For example, I have a certain transaction and I am recording which employee did it. I want to delete the employee totally. I might still keep the transaction in set null. It turns out, actually deleting data is a bad idea in many situations. I may want to keep historical data. So this should be used very rapidly. Cascade should be used rapidly. More often you want to keep the historical data. There are variants people work on. A department is closed. No more students come in the department. However, there are old students who are in the department. I don't want to change the department. I don't want to set the department to null. None of this makes sense. So then how is that handled? Usually by maybe creating a view called active department and there's a table called department which includes historical department and when I want to see what departments are there, currently I will see the view active department. How do you know which is active? I will set a flag with the department saying active is white or true. So then the view will say departments where active is true. So then I will see only active department. Otherwise I will see all the clutter of old departments which are closed. We have quite a few like that. There is a certain turn. More than department programs. So I have a degree in something or a dual degree in something that is closed but students did graduate with that. So that program should still be remembered and not deleted. But no new people should be added to that program. Okay. Now there are some details here about integrity constraint violation during a transaction. Now why would this happen? I think I am going to skip this for lack of time. I will urge you to go read these slides later on to see what is going on. Deferred checking and so forth. The sums up on other built in data types. So this is useful. The SQL standard has a lot of time related data types. So date is one. Time which is time of day is another. Time stamp which is date plus time is the third. And finally there is something called interval which is a gap of time. It is not like, you know, interval is not necessarily saying from May 6 to May 8. It is two days less than interval. These are all types which are valid in SQL. The support for these types varies. I think Oracle does not do date and time. It only has time or had. This keeps changing. But this is what the SQL standards specify. And there are many more variants which I will not get into. And the last topic section is index creation. I mentioned this briefly. What is an index? It is a secondary structure which is used to efficiently retrieve data. So here I have created a table student. I want to look up students based on their ID very efficiently. I do not usually look up on name. So if I have a lot of students, if I give an ID should be able to zoom in on that tuple very fast. If I give a name I do not mind scanning the whole table searching for people with that name because that is rare. So how do I efficiently retrieve a particular tuple given an ID? So there are data structures for indices. Many of you probably know about it. If you are not familiar with it we will get back to it later. But the SQL database is typically have a statement like this. Create index can give it a name on relation with a set of attributes there. Indices are covered in great detail in chapter 11. So there are a few other things large objects, authorization, privileges which I am going to skip. Many questions now? So the question is are assertions supported in Postgres? The answer is the last I checked nobody supports assertions. At least nobody supports any useful form of assertions. So this is one of those things. Sometimes the standard bodies wait for demand for a feature to be big enough before they standardize it. And then what happens is meanwhile each database implements it in its own way. So this happened for a while then they said wait we do not want to do this let us anticipate what people might need and what databases may support and provide a standard for it ahead of time. Assertions were one such they anticipated a need for it but most databases never ended up supporting it. So it is there in the standard nobody supports it. Can you repeat the question? What will happen if we go for a materialized view that does not contain the prime attributes of the parent relations over which it is defined and we are going to insert some data into it. So materialized views often for example aggregate view do not have a primary key of the underlying failed relation. That is not an issue. What will happen if you go for inserting some data into that materialized view? So that will be reflected in the original relation. In what sense? If you will insert some data a tuple in the materialized view will it be reflected in the original relation? So supposing I have a materialized view and I insert into the materialized view no no. So typically updates on views first of all are restricted and the same rules will apply for updates on materialized views. So it is very restricted where you are allowed to do that. The main use for materialized views is where it is read only not with updates. Updates on views were more for views intended for security that was a motivation. Whereas materialized views the primary motivation is querying their aggregates. It cannot meaningfully translate an update on a materialized view to an update on the underlying relation. So most of the time it is not supported. You cannot update them. Typically materialized views will not be updateable. So there is a couple more topics on related to SQL indirectly. Which is how to access SQL from not from a PSQL, not from PGA admin and so on but from a programming language.