 Now nested subqueries which had already been mentioned, so very simple queries, so find courses offered in fall 2009 and spring 2010. There are many ways of writing this query, this is just an example. So select distinct course ID from section where semester is fall and year 2009 and that course ID is also present in select course ID from section where semester is spring and year is 2010. So this is an intersection. What if you want things which are in fall 2009 but not in spring, not in spring 2010, not in. Here we said in, now we want those which are not offered, so I can say not in, that is the same thing. What if I want to say it is offered in this or that, can I do it using an nested query? So offered either in this semester or in that other semester. Then nested query actually does not make sense there. Then I will do a union query. Okay, so when you have a nested query we saw the in clause, there are things which you have not outused, there is an exist clause, not exist clause, right on the next line, you ask the question exactly the right time. So look at this particular query, this is an example of a correlated query, subquery. So let us take the same one as before, it is the same query we want. Courses taught in both fall 2009 and spring 2010, but I have written it differently. What is the difference? This part is the same, select course ID from section, maybe I should have had a distinct where semester is fall year 2009 and this time what I have done is exist, select star from section st. Now note I have used this as, why? Because I am going to use this outer relation in the inner one. So what is happening here? Section as s over there, down here I have, this is the subquery, section st where semester is spring and year 2010, apply to which one? Apply to the subquery, that is the closest flexically occurring one. Note that semester occurs here and here, because of the coping rules, if I do this semester will apply to this section. But now I am doing something else, s dot course ID equal to t dot course ID. So I want this course ID to be equal to that course ID. So what am I doing here? Conceptually for each outer tuple which is I am going over all the sections in fall 2009, for each of those I am looking at evaluating the subquery. But in the subquery I am referring to s dot course ID. So what is that? It is like a function call where I have passed in the value. Well it is not strictly speaking a function call in the syntactic sense, I am reusing an outer variable but the effect is equivalent. It is like running this subquery with a parameter from the outer query which is s dot course ID. Now for a particular outer tuple, let us say that course is 101, the subquery is run. So select star from section where semester is spring year 2010 and 101 equal to t dot course ID. That query is run and then you see if that result is empty or not. If it is not empty, exist will be true and then 101 will be output. If it is empty, well exist fails and 101 is skipped. So why correlated subquery? Because outer table variable here is used in the inner query here. So this particular s here which is taken from outer, I mean I renamed it here because section was there in both but renaming is not mandatory. If I had section here and some other table here, I could have said section dot course ID here. In that case, section is the correlation name, correlation variable. In this case, since I have renamed it as s, s is the correlation name or correlation variable. They are just two terms used to refer to the same thing. In and exist, you can rewrite one as the other, we just showed how to do that. So when you have an in clause, you can take that in condition and turn it into a correlated condition just as we did. So it depends on the database but I cannot guarantee it will be the same but this is a simple enough case that I think most databases would come up with the same plan but I cannot guarantee it. You can try it out. So how do you do that? In the lab, we will see how to generate the plan. How exactly to see what is the evaluation plan depends on the database. The syntax varies. We will do it for post-question but there is an equivalent for Oracle or SQL server or any other database. So you can actually try this out and see what happens. The semantics is the outer query is done and for each tuple of the outer query, the inner query is evaluated. Same thing will be for correlated query. Same thing will be for- Yeah, that is the semantics. Now the actual implementation, the optimizer might figure out that you do not have to do it this way. You can do it using a join. So this particular query, you can actually rewrite it as a join or in this particular case even an intersection. So the optimizer might be able to figure this out but the semantics is that for each outer tuple, outer query is tuple, the inner one is evaluated. So the inner query is evaluated. You take one outer query tuple, without checking the inner condition, you take outer query tuple, run the inner query for that tuple, see the result and then in the exist clause. Performance will be degraded, sir. For every outer tuple and there is a need to compare all the inner tuple. Yeah. So doing it in the correlated fashion is often slower than doing it as a join. Sometimes it may be fast or equal but in general it is slower. So most databases take some effort to take queries like this and turn them into joins. They will try. Where possible they succeed, you will likely get good performance. If they fail, why did they fail? Is it impossible or they were not clever enough to do it? So that is the point. Sorry. If you can write it easily without subqueries, try it without subqueries. The relation that is taken in the outer loop, it checks for the matching condition in the, with the values in the inner relation, even that is also like nested only, no sir? No. So it does not have to do it that way. There are very good join algorithms. We will hash it, hash join, merge join, merge join, all these things. So let me give you some intuition since you asked that question. One way to do a join efficient, instead of trying every pair of tuples, you sort the two relations on the common attribute. So let's take a query with a join. Okay, let's abstract this example. Let's say there are two relations, S and T. What is the join, what is the condition that the course IDs must be the same? Okay. So now I will sort S on course ID, P on course ID. Now I can step through those. If there is a match, I'll put it. If there is no match, I know that, you know, because of the sort order, supposing the course ID is 101 on this side. On this side, I have 102. I know that 101 cannot occur any further because it's sorted. So I can throw it off and continue with the next tuple here. So there's a very easy way of merging the two lists to get all the matches. And databases often use sort merge join. There are other ways. There's something called hashing, which is also very efficient. So in the main course, I'm planning to have some lectures on data mining. But in this abbreviated thing, we are not doing data mining, partly because I think in many universities, data mining is a separate course. Here also we have it as a separate course. So I may have some lectures on data mining if there is a demand for it, simply because there is no other ISP on data mining. But maybe we should have an ISP on data mining. We won't cover it here. It's an important area. Everyone should know about it. Okay. So I'm going to skip some slides here and point out different ways of places where you can have subquery. So far, whatever is the subquery, it was in the where class. But SQL allows subqueries in the from class and in the select class also. So let's look at those two. Both are useful. So in this first one here, say find average instructor salaries of those departments where the average salary is greater than 42,000. So it's natural to organize this query to first find the average salaries of departments and then to impose the condition. Now how did we do this? We did this before. We did it with the having clause. But having clause is not essential. I could instead rewrite it like this. So what am I doing? I've taken this first part which is computing average salary grouped by department name, this query, starting from here and ending here. I have put it in parenthesis, not essential, but I put it in parenthesis and put it in the from clause. This is now a relation which I can use as a relation in the outer query. So note also that I have taken this average salary and given it a name avg underscore salary. If I didn't do that, I don't have a name for it. This has given it a name which I can use in the outer query. I haven't checked this particular query on PostgreSQL, but one of the TS who tried out a similar query reported that PostgreSQL didn't like this syntax. In PostgreSQL, if you want to give it a name, you have to give it here, so you can give it a name here if you want to rename it. So try it out if it doesn't work, there are ways to rename these things at this point here which is shown here, this query. So select department name average salary, this query as department average, department name average. So what all have you done? We have given a name to the relation and we have given names to each of the columns. Here department name is given the same name. This didn't have a real name, it now has a real name. So if this syntax doesn't work, use this one. And this is another very useful construct. Instead of doing the same query that way with nested queries inside from clause, it gets very confusing if you do that too much. So a much better way of doing that is using the with clause. How many of you here have used the with clause? Anybody who has used the with clause, please raise your hand, one, okay. This is something which is really, really useful and is now supported by pretty much all the databases. This is something which you should teach your students to be using. So if you take the previous query, we created an nested query, put it in a from clause, try reading this. If I didn't tell you what this meant, I gave you this. You take a long time to figure out what on earth is going on. It's a bad way of coding. If you're very clever, you can figure it out eventually. But that's not how you should be writing programs. How do you write programs in any language? You just write one main function which goes on and on and on. No, right, you will tell your students, don't do that. Break it up into functions which do meaningful stuff. So when I read the main program, it's calling one function or another. If I understand conceptually what does this function do, I can understand the main program. Then I can go in and see how that function is implemented. This notion of functions was not really there in the SQL language in this sense. There are functions for other reasons, but this was added sometime in the 90s. And within a few years, all databases provided it. So today, everybody provides it. So here is how you do it. This is a slight variation of the earlier query. It says, with max budget value as, select max budget from department. What is this doing? It's clear that you're trying to find out the maximum budget across departments and I'm giving it a name. Now, I could now go further and say, so this is just a function definition. And now that function is used in this query, select budget from department, max budget. But this is not a function. What is it? It's a relation, in the SQL line, everything is a relation. In the Java or C line, everything is a function. Single values are returned by function. Here, everything is a relation. SQL has a different notion of something called a table valued function, which is somewhat more general than this. But this is usually good enough. And what this is doing is, it's defining a relation called max budget. In this case, that has just one tuple with one value. But in general, it can have many tuples, and that can be very useful. So what is happening here? I'm treating it as a relation. It is a relation, in fact. And I'm joining it with department. What is the joint condition? Department.budget equal to max.budget.value. So what is this doing? It's finding, well, actually, this is kind of silly. It should have been, there's a mistake here. The department name should have been in that select class. It got omitted by mistake. So select department name, budget will give you all the departments which have the maximum budget. This is a common scenario. I'll say find the department with the maximum budget. That's badly posed in English. What do I mean, the department with maximum budget? What if there are two departments with maximum budget? If I say find a department with maximum budget, which one? So when you write queries in SQL, you have to be clear what you want. When you specify it in English, you should be unambiguous. And this one says, find all departments with the maximum budget. So it is unambiguous. It's clear. So it's going to find all of them. Any questions? Is there a big operation? No, no, no. This has nothing to do with the division operation. This is the closest you can think is a view definition. This is, in fact, a view definition. You're all familiar with views, right? So the views are also equivalent to this in one sense. The only difference is when you define a view, it goes into the scheme of the database for everybody to see. And if you write a lot of complex queries, your database will get cluttered with all kinds of views, which you don't want to put in the database. You just want the view local to your query for the purpose of this query. So the width clause essentially defines a view which is available only locally. So you can write fairly complex queries. I have some example here. We don't have too much time. So I'll let you read this offline. But it's really useful for creating big queries. So the syntax here is a width department total. Forget the details. Don't try to understand the query for the moment. But I'm finding some of salaries per department. And then the next thing in the width clause. So think of this as one view, department total. This is the next view. Department total average is doing something with department total. So the second view is defined in terms of the first view. And the final query is using both the first view and the second view. So that's the way to think of it. You can create multiple views, local to the query, and use them more than one. So this view, department total is used here and, sorry, defined here and used here and here. It's used twice. That's exactly why you use functions, right? You don't want duplication. That's one reason for using functions. The other reason for using functions is to break up your code into readable parts. So both of those are satisfied. And the last subquery part is what is called a KLR subquery, where, in this case, I have a subquery in the select clause. Select department name, comma. Now here is a subquery, which is also given a name. So what is this doing? Just read it and figure out what it's doing. So for each department, I am finding all the instructors who are in that department. So note that department.department name refers to this outer query, inner and outer. The outer is this one. This is the inner one. So I am finding all instructors in that department, and then select count star. How many such instructors there are? What will happen here? If a department has 10 instructors, you will find 10 things in there. So the department name will be output with 10. What if a department does not have any instructor? This set will be empty. And count on empty set is 0, which is what you want. That is the correct way of doing it. Now there is another way of writing this query, which is not quite correct. We'll come to that in a moment. But before that, note that how many tuples can this subquery generate? This is an aggregate. It will generate exactly one tuple, which is what we want. But SQL does not insist on this. You can write a query which could generate more than one tuple. And how many tuples it does generate may depend on the database state. Supposing I say, select ID from instructor where name equal to something. So for most people, the names are unique. You'll get only one ID, and then it'll work. Now you come to a particular case where a name is shared by two people. Then you'll get two IDs. And then that select query suddenly is returning two values where one value is expected. Because this is called a scalar subquery. Why is scalar? Scala means single value, whereas vector or table is multi-value. So if it returns two values where one value is expected, what happens? Runtime exception. So the database query interpreter will say, sorry, this gave two values, multiple values where I expected one, it'll cause a failure of the query. So you should use that with K. So now, yeah, so let me wrap up the basic SQL, and then I'll come back to this thing. Actually, we'll come to it in outer joints. I think I'll postpone that. So let me wrap this part of the session with modifications database. You can delete tuples. You're all familiar with this. I'll skip the details. Delete from instructor where department name in some subquery. So this is deleting multiple tuples. You can also insert value. This syntax you're familiar with. This is another syntax. Insert into student, select something from something where something. You can get a whole set of tuples and insert it. You're familiar with this. And then there are updates. Here there are some variants. What is this doing? Increase salvy of instructors whose salvy is greater than 100,000 by 3%. And for others, give 5%. What is this doing? Two statements. Update instructors that salvy is value time 1.03, where salvy greater than 100,000. And then another one where salvy is less than or equal to 100,000. This is a little dangerous because if you flip the two, what will happen? Someone who is less than 100,000 may go above 100,000 and get one more increment, which is wrong. So a better syntax is this way. Update instructor set salvy equal to. And this is a case statement. This is a generic thing in h square. You can use it in many places. What does a case expression do? It returns one of these values depending on which one here is satisfied. In fact, the first one which is satisfied is taken. When salvy less than 100,000, then return salvy times 1.05. You can have multiple when clause. Whichever is the first one which is true, that one is returned. And the fallback case is else. Nothing above is satisfied as the else one is satisfied and return salvy times 1.03. So now the ordering doesn't matter. By the way, all these slides are there on the moodle. And finally, here's an update with a scalar query. This is the last thing I want to do now before lunch. So what we are doing here is update students set total query is equal to. Now, here is a subquery. Why is it a correlated subquery? Note this, update student s. And in here it says, where s? So first of all, it's a correlated query. It's using something from the outer. It's computing something and returning it. What is this computing? For each student, we are running this scalar subquery. It's going to find takes natural join course where s.id equal to takes.id. So for that student, what course is he has taken? The takes relation. But the takes does not have credits. We join with the course relation to find the credits. And then we are summing up the credits. But we are not summing up all credits. We are only summing up those which are not f, f meaning fail. And it is not null. Null meaning we don't have a grade yet. So we shouldn't consider those. So all the remaining ones, the credits are summed up. So we are updating the total credits value with the correct one. Any questions? So this is a scalar subquery in the update class. And there is one slight glitch here. If somebody has not taken a single course, what is some of an empty set? I mean not a single course. What I mean is not got a passing grade in any course. It's been null. If you want it as 0, you can stick a case statement in there. So instead of some credits here, I can say case. When some credits is not null, then some credits else 0 and. So this will replace this piece here. But in Oracle, there is an nvl function which says if it's null, then the next one. The SQL itself has a thing called coales. So it has a list of values. It'll take the first non-null value in there. That is standard as well. So I could have said coales, some credits comma 0. So coales will take the first argument. If it is not null, it'll return that. If it is null, go to the next one, which is 0. Yeah, maybe one or two questions before we break. Yeah. Anything a column that is already present in a table? So alter table in SQL gives you the option of deleting columns, but it's not supported by all data. So sometimes you may not have an option of dropping a column. You can only drop the table and create it again. Why is that not provided? It's a implementation convenience. It's irritating, but that's how it is. OK, so I'm going to stop here.