 Okay. Welcome to the second session. In the morning, today we have already covered aggregate functions I believe. So, let me move on to the next topic, but I would like this to be interactive. So, if there are any questions or any doubts you have, please do not hesitate to ask me. Yeah, go ahead. Sir, drop command is used to drop the table. Yeah. How I can use that command to drop a particular column, because I have worked on a MySchooler or something. So, how do you drop a single column of a table? So, you can use an alter table drop column, but not all databases support it. So, on many databases, your only option is to copy the data minus that column into another table, drop this table, then rename the other table back. And of course, in between, you have to deal with foreign keys and so on. So, yeah, that's it. Sir, which database means, for example, this support? So, it depends on I, I can't give you a list, but some databases allow you to do an alter table drop column. Some others don't. MySQL, since you mentioned it, I don't think it allows you to drop a column. So, let's move on to nested subqueries. Again, I think most of you would be familiar with this. So, I will go through it quickly. If there are any questions, please stop me. So, in SQL, subqueries can be nested in one of several different places. The part which almost everyone is familiar with is nested subqueries, which are nested in the wire clause. However, you can also have subqueries nested in the from clause and in the select clause. And you probably seen this examples here. So, the first one is a query which finds courses offered in fall 2009 and spring 2010, using our university schema, which you have been seeing since this morning. So, the relevant relation is the section relation, which says which course is offered in which semester, which year and the section information. So, using this, we want things which course, which is offered in fall 2009 and in spring 2010. There are many ways of writing this query. We can use intersection, we can use join, but in this particular case, we have used a subquery, which says check if the course ID for the course, which is in fall 2009 here, is also present in the set of course IDs, which run in spring 2010. So, we are using the in clause here with a subquery. So, we can use not in clause also similarly. This one says offered in 2009, but not in spring 2010, where we have flipped this to not in. Now, this particular query we could also write using an intersect. We could take the two parts and intersect. This we could also write using the except or the minus clause in oracle, it is called minus. Yeah. It was fairly early. I said subqueries have been around from very early versions of SQN. Each and of n, since it would be done with the natural join. No, this particular case of in can be done easily using natural join. And the reason this one is particularly easy is because this subquery is independent of the outer query. But, we will see in just a minute that you can have what are called correlation variables that is some attributes from the outer query can be used in the inner query. Now, those are harder to write using intersect. You can write it. It is not impossible, but it is harder. We will see that in a minute. So, this was just a very simple example which could be written in multiple ways. Does that answer your question? So, moving on. So, take this. Find the total number of students who have taken course sections start by this particular instructor. Now, how can we phrase this? We can have an outer query which is the students. And we can use a nested subquery to check if that student has taken a course taught by this instructor. And so, that is what this particular one is doing. So, here is the takes relation which records which students have taken which courses in which semester year and so forth. So, what we are doing here is select count distinct ID. So, we are not double counting a student. That is the query distinct students who have taken a course taught by this person. So, first let us look at the subquery from teachers where teachers dot ID equal to 1 0 1 0 1. So, this is all the sections that have been taught by this person. And now a section is uniquely identified by the course ID, section ID, semester and the year. This four attributes together form a primary key of section. So, if you just to remind you of that, the same course may be offered in different years or different semesters. So, but even within a single semester and year, it is possible that there are two sections of the course and the section ID uniquely identifies which section this person has taught. Different teachers may teach different sections. So, this uniquely identifies a section and in takes again this student has taken a particular course in a year semester section. So, we check if this whole thing is present in here. This is similar to the previous query. The only difference is now instead of one attribute, you have a whole tuple of attributes and you can say in this. Again this can also be done using intersection as we saw. Now, there are other operations too for comparing sets. Now, if you note that all of these operations so far have had a singleton value on one side and a set on the other. In particular in the left hand side, you had a single value and then we said in or not in. Now, there are other comparisons in addition to in or not in. In particular, we can have greater than all, greater than some or variants of those. Here is a version of this query which ask for names of instructor which salary greater than some at least one instructor in biology written without using a sub query. So, this particular one can be written as there are two copies of instructor T and S and S department name is biology and we want to make sure that T salary is greater than S salary and we are going to select distinct T. So, a particular T will be in this output if there exists an instructor S in biology whose salary is less than this person's salary. This is the correct query, but it turns out it is also an inefficient query because you are going to compare potentially every instructor on the left with every instructor in biology. In contrast, you can write this in a cleaner manner like this. Select name from instructor where salary is greater than some select salary from instructor where department name equal to biology. So, what is the great this sub query is going to give a whole set of values and greater than some or it is can also be thought of greater than any make sure that this is greater at least one value in there. And how can that be done efficiently? The database query optimizer can actually take the minimum value in there and compare with just the minimum value. So, it is easier for the optimizer to handle this particular case. Any questions on this? Yeah. Is it any implication of writing the condition in another way means T salary is greater than S salary. Yeah. Writing distinct second and writing S department name equals to botany writing first. Okay. So, the question I repeated is there are two conditions here. One is on T salary greater than S salary and the other is S department name equal to biology. It actually does not matter which order you write it. You could have flipped it. In SQL the order is irrelevant. You are going to select things with satisfy both. So, in the back end which condition will be checked first? First condition or the second condition? That depends on the optimizer. In this case most probably it cannot guarantee it. The optimizer has a cost model and it explores alternative plans and picks the one which is cheapest according to its estimates. But most probably here what would happen is it would select biology S instructors from biology take that sub relation and then take a cross product of that with the other copy of instructor and then check for greater than. So, moving on here are some more. I think most of the current set of databases support greater, some greater all. Now, here is a variant which says find instructors who salary is greater than salary of all instructors in biology. Now, this query is almost the same except instead of greater than some we said greater than all. So, it has to be greater than every single one of them. Again, this particular one you could have written by taking the maximum salary and then say greater than that. Then there are these two. So, far we have seen a single value and a set. Now, this one is just a single set and we can check if the set is empty or not. So, let us see some examples. So, find courses start in both these semesters can be written as follows. Now, here we have done something new. So, let us go over this little slowly. We are taking one copy of section S which is in fall 2009. Now, in the sub query, the sub query exists and here is a sub query. There are two conditions in the sub query. First of all it is on section and the first condition is that this copy of section is in spring 2010. But, there is one more condition that S dot course I D equal to T dot course I D. So, note that this S is from outside. This here is really from this S while this T is from here. So, this is an example of a correlation variable. So, this S here is called a correlation variable. It is defined in the outer query, but used in the inner query and this kind of a sub query which uses a variable from outside is also called a correlated sub query. Now, what happens when you evaluate this? For each section here, you will evaluate this sub query using the current course I D. Now, clearly if this particular section here in fall 2009 which in the same course also ran here, then that tuple in section would satisfy these two conditions and this one with respect to the outer one and therefore, exists would be true. There is at least one tuple and it would get output. So, those are correlated sub queries and although again in this particular case, we could have written it easily without correlation. There are cases where correlated sub queries make querying a lot easier. Now, here are some more examples of sub queries. Fine, all students have taken all courses offered in the biology department. So, this one is going to use a not exist query. Some of you have probably seen this kind of query. In SQL, you would like to be able to use universal quantification. What is universal quantification? Something is true for all something else. Unfortunately, SQL does not support universal quantification directly. Therefore, we have to jump through a few hoops to using not exist. So, let us see how we do that. So, select students. This is a copy of student where not exist. Now, what is going on here in this sub query? The sub query actually has a select from where except select from where. Now, what is this one? It is selecting course I D from course where department name is biology. So, this one is getting all courses in biology. This one is removing from the except clause here. It is removing from this set. All courses from takes where this S dot I D equal to T dot I D. So, we are taking all the course on biology, removing all the courses taken by this student. If there is a course left over, that means the student has not taken the course. If there is no course left, that means the student has taken all biology courses. So, the condition here is not exist anything in here will imply the student has taken all biology courses. So, this is a standard template not exist and then you take the set of all things and then remove from it the set of things which this person has taken. Now, note that we cannot write it using equal to all. Equal to all will insist that it be equal to exactly every one of those values. So, this requires a subquery like this to be written. You cannot write it without a subquery in SQL. You can, but it is a lot more complicated. So, that is an example of subquery in the where clause. Now, subqueries can also be used in the from clause. Technically, in SQL, they are called derived relations. So, here is an example where we want to find average instructor's salaries of all those departments, where the average salary is greater than $42,000. So, what I am going to do is first compute department name average salary using a subquery over here from instructor group by department name. So, what do I have here? I have department name and the average salary. Now, on top of this subquery, I am going to use this in a from clause, just like I would have used a relation name and then check if average salary greater than $42,000. Now, many of you are familiar with the different way of writing this query. How would you write it? Without using a subquery in the from clause, this particular one can be written. Yeah, you can use a having clause. So, in addition to group by, we can say having average salary greater than $42,000 right here without this. But, there are other queries where this is obviously going to be useful. Here is another way of writing this same query where what we have is actually basically it is the same thing except we have done two things. One is we have taken this subquery and given it a name department average and we have done one more thing. We have renamed these attributes over here. So, that is just a renaming construct. Again, this exact syntax supported by different databases varies. This particular one does not quite work in Oracle for example. Let us move to another very useful construct which is the with clause. Now, let us take a quick poll. How many of you have used the with clause? Very few. So, this is something which really helps you to write complex queries and I would strongly urge you to try this out. All databases today support it. So, what is the with clause? It is basically a way of all of you have used views I assume. So, the with clause is a way of creating a very temporary view which exists only for the purpose of this query does not actually get created in the database. It is a local view. Think of it if in programming language terminology when you write a program you do not write everything as part of the single main function. You break up your program into pieces, subroutines or functions and then the main program uses those functions. The with clause lets you structure SQL queries in exactly the same fashion. So, these can be thought of as the equivalent of functions of views in the case of SQL. So, what is this particular query doing? Find all departments with the maximum budget. So, we realize that we first need to know what is the maximum budget. Then we need to find departments with the maximum budget. So, what we have done is create a view relation max budget and we have given a name here for the attribute. We have called it value as select max budget from department. Now, in general view defined this way may have many tuples with many attributes. So, general relation, but in this particular case it is actually a special case. What is going to happen? Since we said max budget from department how many tuples will be there? Only one and it has only one attribute, but it is still a relation. It is a relation with a single tuple, single attribute and that relation has been called max budget and the attribute has been called value. Now, we want to select departments with that budget, with that maximum value. So, we have defined this with clause. So, now we can define the main query as select budget from department, max budget. Now, you may be tempted to not write max budget here. You may be tempted to write from department where department budget equal to max budget or value because in this case you know it is a single value, but that is wrong. It is not really a single value, it is a single relation with a single tuple with a single value. Since it is a relation we have to use it only in the from clause here and we are doing a join with it and the join condition is department budget equal to this max budget dot value. Clearly, only departments with the maximum budget will get selected. You could have written this query in different ways, but by breaking it up like this it hopefully is a lot cleaner. It is a lot easier to understand. Any questions on this? Yeah. The question is, will this create a view called max budget? In effect, it is a view. However, that view is available only within this query. It is not available outside of the query. They are not being stored. The view is not stored. You cannot access it outside of this. So, in general you may have many things here. You can say with something as another relation as. So, you can create a number of views. Each of those views here can access the previous views. And in the end you will have a single query which will use one or more of those views. Any questions? Oracle supports it. Postgresql, db2, sql7. Everybody supports this. Again, there are small differences in the syntax. So, on the specific database you have to look up the manual to see the syntax variation. So, here is a more complex query. Find all departments where the total salary is greater than the average of the total salary at all departments. Now, what is the query? First of all, understanding this query itself is hard. So, what we want is the total salary for each department. Now, the average across all this and we want departments whose total salary is greater than the average of the total salaries. So, here is how we can do it. The first view here says department total as select department name sum salary from instructor group by department name. So, for each department we are getting the total of the salary, sum of the salaries. Now, this one department total average is selecting average value. Now, this thing is called value here. So, average value from department total. So, this is the average of the total salaries. Now, the main query uses the department total and the department total average here, where department total load value greater than department total average dot value and select those department names. So, again if you did not have a with clause, you could have written this query, you could have used subqueries, but there are two issues. One, it is harder to read. The second is sum of these would get repeated. So, if you notice this department total is used here and it is used again here. This is one of the major reasons for using functions. You can use the function multiple times. That is what we have done here. We have used the same view multiple times in this query. Any questions? And finally, the last form of subqueries. This one allows you to use the subquery in a select clause. In fact, in other places also, but the key thing to note is that this subquery here is used in a place where we expect a single value. Now, what is this subquery doing? Select department name from department and then this subquery. Select count star from instructor, where department dot department name that is this department is equal to instructor dot department name. This instructor is department name as num instructor. So, this whole subquery is result we are calling num instructor, but there is a difference between this subquery and the previous ones. In this position in an SQL query, we expect a single value. We do not expect a set. We do not expect a relation with multiple attributes. But if you see this syntactically as I told you before, syntactically this subquery is actually a relation with a single tuple with a single attribute. So, it seems that the type is mismatched, but actually what SQL does is it converts the type automatically here. Because it expects a single value, it will first check that this result has a single tuple and of course, we know it has a single attribute because we just did count star. So, it takes the tuple and extracts the value from it and uses the value in place of the query result. So, what we have done is for each department, we have counted how many instructors there are in that department. Now, we could have done this in a much simpler way. Yes, what is the simpler way? We could have said select you know department name comma count star from instructor group by department name. Select count star from instructor group by department name select department name comma count star. Is that actually equivalent to this query? No. So, let me just put in that query here to contrast it. This particular query is going to miss out any department which does not have any instructor. So, supposing you created a new department, it does not have an instructor yet, then that department will not occur here at all. Whereas, here as long as it is in the department relation, it will appear here and what will happen to the sub query? It is yeah, the sub query result will be empty and count star for an empty result will be 0. So, that way we do not miss this particular case. Yeah. Left. Left join. It can be done using left auto join. So, what this query is going to do? The natural left auto join. If a department has an instructor, it will be paired with those instructors, the join. If a department does not have an instructor, it will still appear here, but with the fields from instructor set to null. Now, this ID field is coming from instructor. So, when I do count ID, count will actually eliminate null values. Count star will not, but count will eliminate null values. Therefore, if a department has no instructor, the ID will be just that single value null and then the count will be 0. So, this is another way of writing this query, but this one will give you a different result. Any questions? So, we will make a small detour into database updates and then we will come back to querying again in the subsequent chapters. So, again I hope most of you are familiar with this. I will go over it quickly. So, we can delete from instructor will delete all tuples because there is no condition. It is better be careful with this. If you forget the where clause, you are in trouble. Some databases may warn you or some databases will actually keep a backup of the relation. So, you can restore it. Yeah, you can roll. Actually, the transaction is committed, but Oracle will actually keep a backup copy which there is a way to access it, even though it is all gone, but they have found many people doing this and getting to trouble. So, they have a solution for that, but not all databases will be so nice. This one is delete all instructors where department name is fine and so in general, there is a predicate here and all tuples in instructor which satisfies this predicate are going to be deleted and this predicate can also have a sub query. So, what is this fellow doing? Delete from instructor where department name is in select department name from department where building equal to Watson. What is this doing? It is deleting instructors whose department is in the Watson building. Now, I cannot apply this predicate without a sub query because here I only have instructor. I have department name. I do not have the building in which that department is. So, if I want to access that I am forced to use a sub query here. So, it is essential here. There is no way around using a sub query. Any questions? You can have more complex queries, delete all instructors whose salary is less than average salary of all instructors. So, here I have a sub query, select average salary from instructor. What kind of a sub query is this here? I am using less. I am not using less than some, less than any. I am using less which means this is a scalar sub query. It is going to be treated as a single value and we know that it will be a single tuple. Therefore, we can extract a single value from this which is the average and I am going to delete from instructor where the salary is less than the average. So, all sub average salary people are removed. In a way close of the selector from then it is called a scale. If you use it in a position where a single value is expected then it is a scalar sub query. Now, how do you know it returns only a single tuple? We know it returns a single attribute syntactically we can see that. If there is a aggregate function we can easily see that it will return only one. But you are allowed to use scalar sub queries without aggregates. What happens is at run time if it returns more than one value then there is an exception and the query fails. If it returns no values then it is taken as a null value and the normal case it should return a single tuple whose value is taken. Now, this does something funny. Supposing we execute this by taking one instructor after another take the first instructor compute the average. See if it is less delete that take the next instructor again compute the average. Now, there is a problem as we throughout instructors the average will keep changing. So, if SQL actually behaves this way there is a problem depending on the order in which it chooses instructors the final result is going to be different it is not deterministic that that would be a big mistake. So, SQL is not defined like that. So, what is done is whenever you have a condition here in a delete or insert update clause these conditions are checked first for all tuples in which table instructor table. Then nothing will happen there is no tuple. So, it this query will not even run I mean this there is no tuple to be deleted it would not run. If there is at least one tuple for each tuple it will execute this condition check if the condition is satisfied, but it would not do the update or delete insert yet. It will first check the condition for all tuples you will find which all tuples have to be deleted in this case and then in the end it will go and delete all those tuples. Therefore, the order in which it considers it will not matter. Now, actually optimizes are more clever where possible they will kind of delete on the fly, but the semantics is that it should be like I described. The actual implementation can be optimized as long as it generates the same final result. Insertion we all no doubt seen the basic form of insertion just give the values for each attribute or you can specifically list the attribute names and give values for those. If you omit any what will the value be it will be either the default value if you specified a default or it will be null or if the attribute cannot take the null value there will be an exception. Or we can have a sub query here instead of giving a single value I can have a sub query and the results of that sub query are inserted here. So, in this particular case I had all instructors to the student relation with total credit set to 0 in the same department. So, they are added back here again this query is fully evaluated before it is inserted. Otherwise, if you do this insert into table 1 select star from table 1 once you have inserted you have a new tuple. So, if you take that into consideration it will get inserted again and again assuming there is no primary key declaration. If there is a primary key it will get violated and the query will fail anyway and then you have update. So, this one says give a 3 percent update to those who are above 100,000 and everybody else receives 5 percent raise. So, we can do the following update instructor set salary is equal to this times 1.3 where salary greater than 100,000 and then the next one says where salary less than or equal to 100,000. But this is risky if by mistake you put this first everyone who is less than 100,000 will get updated. So, if somebody was 95,000 and something they will go above 100,000 and get 3 more percent. So, I will get a double increment which is wrong. So, this is dangerous a better way is to use the case statement update instructor set salary equal to case when salary is less than 100,000 then add 5 percent then set it to salary times 5 percent otherwise salary star 1.23. So, this is an case expression which returns a value depending on which condition is satisfied returns this value or this value and that value is assigned into the salary attribute. And then there are cases where you need to use a scalar sub query in the update statement. So, this in the previous slide this was just a case statement here we have a whole sub query. So, I am updating student setting total credit equal to some sub query here. Now, note that this sub query is actually using s dot id here. So, it is a sub query which is running for this particular student and what it is doing is it is taking sum of credits and sets this over here. Now, note that in this particular case if the student had not taken any course this will be empty. So, this sum of credits would be null. So, that is what this says if you wanted 0 instead well there are several ways of doing it one way is instead of just some credits here we will say case when some credits is not null then some credits else 0. So, that wraps up the first chapter.