 next topic which is nested subqueries. This is an important part of today's lab exercises. So basically in SQL you can take a query and stick it inside of another query. Now there are actually several different places where you could stick a subquery and what that subquery means depends really on where you stick it in. So let us see this through examples. We are going to start with subqueries in the where clause. So what is that mean? What does it mean to stick a subquery in a where clause? A where clause is a predicate. So it has to return true, false or unknown value and if you evaluate a subquery it returns a set of values in general. So how do you get a true or false value from a set of values? So there are several different ways. I can check if an element is in the result of a subquery or is not in the result of a subquery. I can check if the result of a subquery is empty or it is not empty. Those are the most common uses. There are a few more constructs we will see. So let us start with the first one. Find courses offered in fall 2009 and in spring 2010. We saw how to do this earlier using intersection. We can even do this using join but here is how to do it with a subquery. In fact there are several ways of doing it with subquery. Here is one way of doing it. This is a select distinct course ID from section where semester is fall and year is 2009. So that takes care of those which run in fall 2009 but we also wanted to run in spring 2010. How do you make sure? So we say and course ID in, select course ID from section where semester is spring and year equal to 2010. So this subquery is going to return a set of course ID values and as long as the first course ID is in this set and we also made sure that it is offered in fall 2009 it will appear in the output. Why select distinct there? Because a course may be offered more than once in fall 2009. You can have two sections. So if you do not do a distinct it will appear twice in the result which is why we have a select distinct there. So that was a simple use of a subquery. Now here is a more complex thing which is much harder to do if you don't have subqueries which uses a not in. So this is the same query except we want to find courses offered in fall 2009 but not in spring 2010. So how do you handle this? Well with a simple subquery instead of in we just flip it to not in. So it is a very small modification of the previous query the in became not in. So it is in fall 2009 but it is not in spring 2010. How do you write this query without a subquery? It is possible in fact you can do set difference which is find those which are offered in this minus those which are offered in that and we have seen this before. But this is another natural way of writing it. So here is another query which says find the total number of distinct students who have taken course section taught by a particular instructor ID. 1 0 1 0 1. So now we need information from two relations to answer this query. Don't read the SQL query yet but the first step is we need to know who took what course section and we need to know who offered that course section and particular we don't need to know all course sections. We want course sections that offered by 1 0 1 0 1. So it is clear that we need information from the text table and from the teacher's table. We have to combine this information. So the goal is to find students who have taken course section. So the outer level would be the text table so we can get the student IDs. Also it should be clear that a student may have taken two courses taught by 1 0 1 0 1 instructor. We need to count that student only once therefore it should be clear that we need a count distinct of ID from the student. But now we want to restrict it to students who have taken a course with 1 0 1 0 1. So how do we do that? In this case we have a sub query which says this person took the course section that other guy 1 0 1 0 1 should have taught that course section. We could do this using a join but in this case we chose to use a sub query. So what is the sub query? We have to make sure that that section was taught by this instructor. So what we have done here is course ID section ID semester year that uniquely identifies the section. Because you can have different courses running in the same semester year, the same course running in different semester years or the same course with two different sections. So these four attributes are required in our schema to uniquely identify a section. So to make sure the student took a section taught by this instructor we have to check that these four values, this tuple of four values is in the same you select the same thing course ID section ID semester year from teachers this time where teachers dot ID is 1 0 1 0 1. So this is a set of values which is all sections taught by this person and we make sure the student took that section and that is there in the result. Of course you can do it using a join. Now if you have set up for the quiz question let us go back to the previous question and then there is one more question after that. So just hang on before you answer the question please read the options again. The average of 1 1 3 3 0 is 1 2 8 by 5 1 0 4 options have you started it? I think there was some confusion for this question people have already answered although I thought it was not enabled. This was earlier question sorry this part I hope you can see the chart. The chart is for we have already discussed the answers this one we have discussed the answers. So what this shows us is how many people got what answer. So most people said it is not equivalent and rather most people said it is equivalent only if r dot is declared as not null. So most people did get the answer correct as specified. Quite a few have got option 3 which is or 4 which is wrong and a few people have also chosen option e which was not even an option. So now let us get back to this question let us enable the question please. Your time I started here just wait another 1 or 2 seconds to make sure all the centers are recording. Now go ahead and enter your response please I hope you can see the question please enter your response a b c or d. So I hope all of you have answered the question by now should be time up by now ok. Let us move on to the next quiz question but do not answer the question yet I am just showing you the question while the results for the previous question are being tabulated. So let us first read the question and we will enable it in a little while. So do not answer this question yet. The question says the above query this is using sub query can be rewritten using natural join aggregation group by order by. So what is the question here it wants to find students who have taken course section taught by a particular instructor. So the question is what construct in here would you use to write this without using sub queries. So do not answer it yet till I tell you it is time to answer can we enable we will see the response to question 7 bar is good ok. So here majority of the people have answered 2 which is the correct answer although a few have chosen 8 by 5 and null. So to repeat how does SQL define the result when you have a null this particular one is the average. Now as I said average like everybody else all the operators eliminate null values first except for counts that which is special. So if you eliminate the null values this set which is left multi set which is left is 1133 it has 4 elements not 5 because null is gone. Now what is the sum 1133 adds up to 8 8 divided by 4 is 2 which is the correct answer. Now if you counted the null when you did the count but you did not use it with the sum you will end up with 8 by 5 which is option C that is not correct. A few of you thought that and that is wrong. And a few of you chose null which is not how SQL defines it. Although null makes sense in certain situations generally when you are doing aggregates you do not want the fact that you do not know 1 or 2 values to you know result in aggregate which is itself null that is throwing away useful information. So that is not the right answer either. Good question 8 please do not answer it yet till I tell you to go ahead. Now that is fine. Now you can answer the questions now the timer has started you have 50 seconds left. I think time should be up. So let us just go back to the screen make sure it is timed out. Now just a few more seconds you have last chance to enter your results before time is up. So let us get back to the query I actually mentioned before we came to the quiz question that you can answer this by using a join. So what you do is join takes with teachers and restrict it to teachers.id is 10101. Now can you do a natural join? If you do a natural join what happens? Takes has an id teaches also has an id and you will not actually be able to you will be finding people who take the course which they teach. So there is a problem if you use natural join as is. You can use natural join if you first rename attributes of a relation before doing natural join. So you can use the as clause I forgot to mention this earlier. What you do is just like you can rename a relation. You can use the same syntax with some modifications to rename attributes of a relation. So let me write the answer here. Ignore the part which is on top. So if I have a relation r want to say from r and I want to rename the attributes of r. I know the attributes of r appear in a particular order. So let us say I have the in this case let us not use r. Let us use takes and teaches. Takes let us say we do not bother to rename teaches as t. So that renames a relation but we can also rename the attributes. The problem here was the attribute id is going to get equated whereas what we want is only the attribute course id to get equated. So what we could have done is as t the id attribute we could change to i id and we list the names of all the other attributes. So if you remember the order of things in teachers it is instructor id, course id. Course id of course does not change course id, section id, section, semester, year. So we have renamed that relation. So now we can actually do a natural join. So here we can say natural join of these two. So now if you see takes has id as the student id. This has now been renamed to i id. So there is no clash between them before you do a natural join. So you can actually write this query using a natural join provided you do this step. But if you do not do this natural join is wrong. So coming back to the options we have can you do it using aggregation? It does not really make sense here. We do not want to aggregate. Group by again does not make sense. It is really aggregate. Order by again does not make sense. So the best answer here is natural join provided you do the renaming as we showed it. If you just do the natural join without renaming you will get the wrong answer. So I should probably have given e option which is the join using class which would have been a more clear answer to this question. But since I did not provide that the best answer in this case is natural join with assumptions. So we can check how many people gave what answer. Are we set up to see that? More centers are up but fewer people are answering. Please please answer the questions even if you do not know make a guess. So people have chosen natural join as the right answer that is the majority. A few people have chosen aggregation in group by but that does not really help in this way and nobody has chosen order by which is also correct. So audience poll still continues to win. But let me repeat everyone must answer these questions. It is not optional. If you are not answering it that means you are not thinking. If you are not thinking what is the point attaining this course? My goal is to make you think. Just listening is not enough. It has to be two way. When I teach this course with people physically sitting in front of me I ask questions. Those are not necessarily multiple choice questions but I will not proceed until someone answers it. So please do use the clickers. If you do not use the clickers you know we are recording who has been assigned what clickers. So at the end of the course we will know which person or pair or triple of persons did not use the clickers and that is will be noted. So please make sure you answer all the questions as they are asked. Now coming back we will move on to the next use of subqueries but before we use the subquery let us take a particular query and see how to answer it without subqueries and then with subqueries. It will be clear that subqueries make it a lot cleaner. So the question is find the names of instructors with salary greater than that of some at least one instructor in the biology department. So how do you do this? You have to match each instructor with some in fact with all the instructors in the biology department and check if for at least one of them the salary is greater. We actually saw the same query earlier when we saw the rename operation and that can be used here directly. It is the same query. So it is a self-join of instructor with itself on salary t dot salary greater than s dot salary but it is a little unnatural first of all to write the query like this. Second, those of you who are familiar with algorithmic complexity will realize that this query matches every row in t potentially with every row in s. So if it is executed naively it is going to take quadratic time complexity. So if you had thousand rows in instructor it is going to take a million of comparison operation. If you have ten thousand rows hundred million so it is not very scalable. So in contrast let us see the next version of the query which is select name from instructor where salary greater than some select salary from instructor where department name is biology. This can be done a lot more efficiently. So the subquery select salary from instructor where department name equal to biology can be run once it is not going to change. You can do it once collect the answers. Now for each instructor you have to check if the salary is greater than some salary in there. Now most SQL implementations are clever enough to realize that the only salary which matters when you do greater than some is the minimum salary. So you would actually just store the minimum salary and for each outer row they will check if it is greater than the minimum salary of instructors in biology and this whole thing can be done in linear time. One scan to find who are all the instructors in biology find the minimum salary. One more scan of instructors comparing each of their salaries with this minimum salary and then outputting it. So if you express it using the greater than some not only is it easier to understand what you are doing your goal is not to do a join and it is also better for the SQL engine. By the way the word some can be replaced in SQL by the word any although that causes confusion. If you say greater than any some people interpret that to mean it is greater than every one of this which is not the intention here. So although SQL allows some and any as interchangeable I would encourage you to use some to avoid the confusion by the keyword any. So that is that plus now moving on find the names of all instructors whose salary is greater than salary of all instructors in the biology department. Can you hence write this using a join query like we did before? It turns out you can't do it directly. It is a more complicated because it will it has to be true for every instructor in biology and a join does not allow this kind of universal quantification. To write it you can actually do a series of steps which include taking except set difference at some point along the way to do for all. But this particular query can be written a lot more easily using the greater than all construct. So we say select name from instructor where salary greater than all select salary from instructor where department name equal to biology. So greater than all makes life quite simple. Now again you should realize that the SQL engine would probably find the maximum salary in biology and check that this each instructor's salary will be output the instructor will be output only if their salary is greater than the maximum. Now why can't you write this directly? Can you directly say where salary greater than maximum or salary greater than minimum? In fact you can do that except that the minimum and maximum are now single values. They are not sets of values and we will see how to do that in a moment. But before we see those kinds of subqueries which are treated as returning a single value. Let us wrap up a couple more things on subquery return sets of values. I mentioned earlier we can check if a subquery is empty or not. You can do this using the exist construct and not exist. Exist R is the same as the result of the subquery is not empty. Not exist means it is empty. So how do we use it? We are going to see our favorite query find courses which are taught in both the fall 2009 semester and in the spring 2010 semester. Now how do you write this query? Here is the third way of writing. We saw two ways join and in queries. Now here is the third way. We are going to the outer part of the query is the same. Select course ID from section SS where semester fall year is 2009. That has not changed. The subquery has changed. I want to use exist. So I want a subquery which will return an empty set if this particular course does not run in spring 2010 and it will return a non-empty set if it does run in spring 2010. So the subquery must know something about the outer query score. So in fact the subquery now if you look down here it says exist. Select star from section ST where semester is spring and year equal to 2 0 1 0. Now both of these refer to the nearest one but if you want to be explicit you can say t dot semester is spring and t dot year is 2010. And this is one more class here which says s dot course ID equal to t dot course ID. What is going on here? S is a rename of a relation which is in the outer query. It is being used in the inner query. So this s is called a correlation name or correlation variable and this subquery which uses at attribute value from the outer query such subqueries are called correlated subqueries. If they did not use any outer query attributes they are not non-correlated subqueries. The ones we saw so far were non-correlated. This one is correlated. So now how do you evaluate such subqueries? Obviously to evaluate the subquery you need a specific value of s dot course. So what is happening is I will take each section in the outer level. I will evaluate some as there is fall year is 2009. If these fail I will skip it. If they are satisfied I will then evaluate this sub query which selects sections which are in spring 2010. But additionally that course ID t dot course ID is equal to the course ID of the current outer level thing. So think of it as a nested evaluation. I am going to have a loop at the outer level which goes over each outer level record which satisfies the outer conditions. Then I will take that one record evaluate this subquery with that particular s dot course ID value. Now it may turn out that this subquery result is empty or it may be non-empty. If it is empty then exist will fail. That means that particular course ID is not being offered in spring 2010. If it is non-empty it means that same course ID which was offered in fall 2009 is also being offered in spring 2010. So it is in the answer. So that is how we use correlated subqueries and to understand them you have to think of an outer loop which invokes the inner subquery repeatedly once per loop. I hope that is clear. So now let us see a use of not exist. Now this is a more complicated query. This corresponds to universal quantification in the predicate logic. So what is this query? Find all students who have taken all courses offered in the biology department. Now SQL by itself does not have a clause equivalent to this is true for all something. But it has an exist. Now if you are familiar with predicate logic you know the two basic construct exist and for all. Now if you do not have for all it in predicate logic you can write it using exist. How do you write it? If you want to say everything satisfies p for all x and something p is true. You can equivalently write it as not there exists something in there for which p is false. If it is something is true for everyone it means it means the negation of it is not true for anyone that is the same thing. Since SQL has an exist construct we can take a for all and turn it into not exist not that is the structure. That is again easier said than done. So let us see how we write this query and help you understand what it is doing. So I want to see to make sure that this student has taken all courses. So for this student I want to make sure that does not exist the course in biology with this student has not taken. So that is where the not exist comes. So the not exist has to range over something. So in this case it ranges over the biology department courses. So not exist the course with this student has not taken. How do you make sure this student has not taken the course? For that you use the accept clause. So here is the query select course ID from course where department name is biology. This is all the biology courses except select t dot course ID from takes as t where s dot id equal to t dot id. Note that this takes as t and this condition is using a correlated variable from here. So this one is running for this particular outer level student. Each time the sub query is invoked it is going to be with a different student and so it has to be evaluated or fresh. For a particular student it takes all biology courses it removes from those courses all the courses which the student has taken. Now supposing the student has actually taken all the biology courses what will happen that set will become empty because he has taken all the biology courses. But if this student has missed a particular biology course what happens? That course will appear in this one but not in this one down here and therefore the result of this accept sub query is non empty. So not exist fails and that student will not appear in the result. So it is a little complicated but I hope you understand it think through this. There are some lab assignments in the afternoon which are based on the same thing same concept to reinforce this. I should also note that you cannot write this query using the equal to all or any of the other variants. So equal to all will insist that it is equal to all the values that cannot be used to express this condition. So let us now move to sub queries which are used in a different context. So far we looked at sub queries which are used in the where clause using in or exist or not in not exist constructs. Now here is another kind of sub query which is referred to in SQL as a derived relation. So derived relation is basically relation in the from clause which is itself a sub query. So where I used instructor or teachers or whatever I could replace it by a sub query. So here is an example. We saw this query before. Find the average instructor salary of those departments where the average salary is greater than 42,000. We saw how to write this using the having clause. First compute the average then the having condition is applied. I can do this without having as follows. So this one says select department name average salary from. Now here is the difference. The from clause has in parenthesis here a sub query. Select department name average salary as AVG underscore salary. The rename is important because otherwise I cannot refer to this average result from instructor group by department. So this sub query is actually going to give me a set of tuples which becomes the input to the outer query. Now what does the outer query do? It simply filters these tuples to remove those whose average salary is less than or equal to 42,000. It retains those which are greater and those are output. That is the same thing as what the having clause does. So having clause is not strictly speaking required but it is useful syntactic sugar. Now you could have also written this query by renaming. I showed you before the break how to rename individual attributes of the relation. So this second query here is showing you how to rename average salary. Here we renamed it inside the select clause. In the second one we are renaming it outside similar to the renaming we did earlier. So we have this query. The second here average salary does not have a meaningful name but here we rename it as department average, department name average salary. So now we have that name and we can use it here. So these two queries are really the same thing. Just the renaming is done in slightly different ways. Now there is yet another construct which is there in the SQL standard which is called the lateral join construct which can be used to write this. I would not spend much time on this because PostgreSQL does not support it. We cannot use it in your labs but if you are interested what the lateral construct does and more important I want to tell you what you cannot do if you do not have the lateral construct. So first let us see what is the lateral construct. Now look at this thing here. It says select blah blah from instructor I1, lateral select average salary from instructor I2 where I2.department name equal to I1.department name. So what is this doing? This sub query is using I1.department name. It is using an attribute which comes from a relation earlier in the from clause. Now the point I want to make here is you can do this refer to an attribute from an earlier relation in the from clause only if you use lateral. If your database supports lateral you can add this lateral construct and then the second sub query can use an attribute from the first sub query. So conceptually you iterate over the results of the first sub query or the relation there. For each of them you generate the results for the second sub query join them and so forth. I want to get into the details but what is important to notice if you do not have the lateral clause you cannot write a query like this where a sub query here uses an attribute name from an earlier or any other relation in the from clause. A sub query can only use attributes from a outer query. It cannot do it from another thing which is at the same level of below. So in any case lateral is not supported we will ignore it. Now you can write such sub queries in the from clause but SQL has a very very useful construct called the with clause which was not there in earlier version of SQL. I think 92 or so it was introduced but it was implemented widely quite a bit later but it has been around now for a while and it is a very very useful construct called the with clause. So the with clause is basically a way to define a kind of it is called a view we will see what are views in detail later. If you know what is a view a with clause is like a temporary view which is defined only in the scope of a query. If you do not know what is a view do not worry we will see what the with clause does. So here is query size find all departments with a maximum budget. So what is that query doing? How is it written? It says with max budget value as select max budget from department. So what is this doing? It has a sub query select max budget from department and it is giving it a name it is giving that result a name. We could have stuck it in the from clause but by writing it with the with clause ahead and giving it a name our subsequent query looks a lot cleaner. So this is a sub query which gives us the maximum budget across all departments. Each department has a budget this is finding the maximum budget and now what does it do? With this max budget as defined select budget from department, max budget where department.budget equal to max budget.value. Now note this value is simply an attribute there is nothing special about it is just the name which we used here to give a name to this attribute max budget. So what is it doing? It is creating a relation in this case the relation has a single tuple in general it can have many tuples and this query now joins department with that relation which we have defined earlier and the join condition is department.budget equal to max budget.value. So it is going to select those departments which have the max budget. Now this is actually a common kind of query which you will see. One kind of query says find the maximum value, find the maximum budget, find the maximum cpi of students. That is easy select max budget from department gives that. But another class of query says find those departments which have the maximum budget find those students who have the maximum cpi and so forth. It is a very common kind of construct. How to do that? This is a very clean way of doing it. So the first sub query in the with clause finds the maximum. Then the second query again takes the join of that with the original thing in this case department and selects all those who have the maximum. There may be more than one department which has the same budget which happens to be the maximum. All of those will get selected. So that is a simple use of the with clause. You can also use to construct more complex query. In fact if your query is becoming complex I would highly recommend you break it into pieces first logically and each of those pieces can be defined using a with clause and then they get combined as you go down. So take this query. Find all departments where the total salary is greater than the average of the total salaries at all departments. Now what the hell does this query mean in English? First let us understand what it is. To understand it the first step is to find for each department I want to find the total salary that is add up the salaries of all instructors in that department. Now the second part of the query says the average of the total salaries at all departments. So what this means is we will find the total salary for each department take the average. So this is the average salary average across all departments of the total salary and finally we want to find departments whose total salary is greater than that. So note that the total salary of a department is required multiple times. First to find the maximum then to output those departments which have the sorry to find the average of the total and then to find those departments which have a value greater than the average. So we are going in steps. The with clause defines two relations. The first one is department total, department name value as select department name some salary from instructor group by department. So this is getting the total salary for each department but actually here only for department which have an instructor. If a department does not have an instructor this it is not going to appear here. It is not very meaningful either for this query. Then we have the total salaries for each department. The second one says department total average is select average value from department total. What is this doing? We have the totals per department is taking the average. So if CS total salary is 100,000 and mechanical total salary is 200,000 the average of the two is 150,000. I should also note that the average of the total salaries is not the same as the average of the instructor salaries. If you do not understand that why I do not bother for now but here it is finding the average of the totals. Now finally I want to find those departments whose total is greater than the average. So now I again use the department total relation which we computed before. So select department name from department total department total average where department total dot value is greater than department total average dot value. So this part is using the same construct which we saw in the previous slide to find all those with the maximum value. But the input relation is not the original instructor relation but the input relation is now the department total relation which we defined here. So I hope this query is clear. So here is a simple use of a scalar sub query. So what is a scalar sub query? It is basically a sub query used where you expect to see a single value. So far sub queries in the wire clause could return a set of values and we had to use in or exist. Sub queries in the from clause can return a set of values and that is dealt with as usual in the from clause. But now this is a new type of sub query which is used where a single value is expected. So here is an interesting use of it. Select department name comma, now here is a sub query. Select count star from instructor where department name department dot department name equal to instructor dot department name as num instructors from department. What is this query doing? Now first of all as is standard in SQL you first evaluate the from and where clause then the group by clause if applicable. Then you will evaluate the select clause. Now in this particular case by the time you are evaluating the select clause you are doing it per department. So for each row in the from clause you are going to evaluate whatever is in the select clause. Now what is in the select clause? There is a sub query. So this sub query note that this sub query is actually using an attribute department dot department name which comes from down here from the department in the outer level query. So what is it doing? It is going to find out all instructors whose department name matches that particular department name and do select count star on it. So that finds out how many such instructors there are and outputs that which is renamed to num instructors. So this is a way of finding out how many instructors there are in a department. Now we saw an earlier query which was a much simpler query. It said select department name comma count star from instructor group by department. Let me see if I have that query here. We took instructor group by department and then select department name count star. Now supposing we had a department which had no instructor what would happen in these two cases. In the old query there is no instructor in that department. That department just does not appear in the result. So any department which appears would have a count of at least one. Here what happens? Supposing a department has no instructor what is going to happen? The select count star is going to find no instructor. So the count star is going to apply on an empty set. What is count star for an empty set? As I told you earlier it is 0. So it is going to output the department name with the count 0. Whereas the old query simply threw away the department name. So that is the basic difference between these two queries. And you can see why a scalar subquery is useful. There is a way to write this query using outer joints. We will see that later. If you have a question please flag it. I am going to look up the people who have questions and after that I will come back to scalar subqueries. So it is a very short one. So I am going to take all your questions. Let us start with DOE ACC Srinagar. I have a question. Do you have a question? Sir can you hear me? Yes please go ahead. My question is can you explain the difference between the database null values and the null values we use in the programming language? Because when it comes to comparison they are not matched equally matched. Can you explain the difference between the null values? That is a very good question. What is the difference between the null values in the programming language like C or Java? So in an imperative programming language like C or Java null values are applicable only to pointers or references. And they indicate that that particular pointer or reference does not point to anything meaningful. However in these languages null values are not applicable to primitive types such as int, float or double or other such primitive types. They only apply to reference types. So that is the primary difference between these two. In SQL every type including int and float and numeric and time and so on. Every type implicitly includes the null value. In the other languages only reference types have a null value. But for those types for a string for example in C++ a null means there is no string. In SQL a null means there is no string but conceptually it could also be interpreted as I don't know what it is. Whereas in a programming language it's more straightforward. It's just not there. That is no string. So there is a slight semantic difference but it's not a big difference. I hope that answered your question. If you have a follow up question go ahead and ask. Sir is the ask a value same for the db null and the null that in programming language? It has nothing to do with ASCII. The pointer is simply an address of a memory location in imperative language and the value 0 is treated as a special value where you cannot store any meaningful data at 0. In a database null internal representation does not concern you. Don't worry about how it is represented. There are various internal representation. We will see those later but that's not at all. It's the conceptual value null. So thanks. Now let us take a question which came on chat. So one of the questions was can you explain unknown and null values with some more example? What's the difference between unknown and null? In fact they are a little bit similar except that unknown is for a truth value. You have true false unknown whereas null is for everything else. Other than that the Boolean operations and or not are defined for unknown. Null value is not an element in the Boolean domain. Maybe that's one difference. Their null corresponds to unknown but otherwise they are fairly similar in what they mean semantically. Let's see what are the questions. The next question is maximum function gives first maximum salary of a department but what if you want to find the second highest salary? What do we do? That's actually a very good question. How do you find the second highest salary? One way is to stack up the queries. First find the maximum then write a query to find the maximum which is not equal to this maximum. You can have a series of with clauses doing this. It's rather inefficient but it is possible to do it this way. However obviously if your goal was to find the 50th maximum this would be a very painful way of doing it. So what most people would do is they would fall back on a programming language interface and say get me values ordered by whatever sequence I want and in the programming language they will step through the values in a loop till they come to the 50th value and output that. So it is easy to do if you are writing a program on top of SQL to do this. The question is if you want to query in SQL to find the 50th value how do you do this? It's actually fairly painful with the constructs we have seen so far but SQL language has been extended significantly over the many years and from 2003 onwards there have been several new constructs to deal with aggregates and windowing and ranking. So there is a bunch of constructs. If you have time we will see it. It is there in chapter 5 of the book and if time permitting we will cover that also tomorrow but there is a way to assign a rank to each value. We will also see that ranks can be computed using other kind of SQL queries it is there in the chapter. So it is possible in a not really a horrible way but still slightly complex way to find the nth highest but SQL has a ranking function which lets you get the you know let's say marks of students output in with a rank. So the top mark gets rank 1, the second mark gets rank 2, 3 and 4 and so forth and now you can say find the thing which has a rank 50 and that gives you the 50th largest. So that's how you can get second, third and any number of largest values. Okay now moving on to more questions if data contains a single quote how to execute it using concatenation. So if you are writing the SQL query yourself you can type two quotes in the string and two consecutive quotes are treated as a single quote. However if you are doing this using JDBC or ODBC or some other thing doing string concatenation is actually you know that there are problems serious security problems introduced and there is a clean solution which we will cover later we are going to cover this in detail. So I will hang off on that question for later. A few more questions why we still sticking to super key as use of super key adds complexity for no gain. Super key is a concept now it helps you understand you know what is not minimal and so on. Candidate keys and primary keys are what you actually deal with when you use SQL. Primary key can be declared in SQL. A candidate key is a super key you know also can be declared in SQL using a unique constraint. I have not got into it so far but we will see it later. So I even if you do not use it directly it is an important concept for to understand what uniquely identifies a tuple. Next question is when we take a join of multiple tables is there any significance for the order of tables we specify in the from clause or for the order of join conditions in the class. This is again a very good question. So I can say a select star from ABC or I can say BAC or CBA many different orders are possible. Does this matter at all? The answer is it does not matter. So if you look at the semantics of SQL it matters only in the following simple way. It defines the order in which the attributes appear in the query result. So if you say ABC the attributes of A will come first followed by those of B followed by those of C in left to right order. If you say BAC the order may be different. However this is a very minor reordering issue you can if you say select star from ABC. If you do not do select star if I say select A dot X B dot X then I do not care whether I write it ABC, BAC, CBA and in fact the query execution engine internally will figure out the most efficient order for doing the joins. So you just list it in whatever order don't care about it just list it the optimizer will take care of it. Similarly the conditions in the where clause should I first say A dot X equal to B dot X then B dot Y equal to C dot Y or should I first say B dot Y equal to C dot Y and then say A dot X equal to B dot X. The answer is again it is irrelevant the order of the parts which are ended together in the clause is irrelevant. Again the optimizer will do a whole bunch of clever stuff and you don't have to worry about it. As a special case of this let me write on the white board here A from select something from ABC where A dot X equal to B dot X and B dot X equal to C dot X. Now it should be clear that all the X values are being equated so I could have also written it as A dot X equal to C dot X and A dot X equals B dot X. So see the slight difference in this here I am equating A dot X with C dot X there I am equating over here I am equating B dot X with C dot X. So there is a slight difference in this but it doesn't really matter. The conditions are actually the same thing and it doesn't matter which way you write it in SQL the answer will be the same it is irrelevant how you write it. Now last question from this list what is the command for inserting a block of data to insert multiple tuples in a single command. It turns out that the SQL language itself does not provide you any direct way of inserting multiple tuples in a single command. So you have to say insert, insert, insert one after another. Most databases however provide special commands to do bulk load in fact if you do this insert, insert, insert it is okay if you are doing 100,000 inserts. If you are doing this 10 million times there are a lot of overheads doing it this way. So if you want to load really large amounts of data most databases have some special commands which let you take a file containing the data you want maybe in excel format or in what is called comma separated value CSV format and then load it into the database using a special command. That's not part of the SQL standard that is database specific. Another few questions have come up. One of the questions is what is optimum join or sub query. I interpret this says which should you use if you can use a join or a sub query which one do you use. I have confused you perhaps by giving you examples of both but not telling you what is the right way of doing it. In SQL there is often not a single right way of doing it but if you are using a database like PostgreSQL it is generally preferable to use a join if the sub query is basically in or exist sub query you can write it using a join. If the sub query is not in or not exist that is hard to write using join. So then use a sub query but if you can do it using a join you should probably do it using a join because the implementation of PostgreSQL is able to optimize certain sub query executions very well and do it very efficiently but if you have more complex sub queries PostgreSQL will run a little slowly. Now for the same sub queries you may find that Oracle or SQL server or DB2 may actually run it very fast because they have figured out how to take even complex sub queries and run them efficiently but pretty much all databases will figure out how to do joins very efficiently. So if it is simple enough to write a join write a join do not use sub queries unnecessarily because they can be slow but where they are needed use them. One last question if you want to find the salary of a person greater than the salary of every employee in a particular company should we use all or simply compare it with a maximum salary. So we can certainly say salary greater than all salaries in that particular department or we can use scalar sub queries which are just coming up. So this is a good lead into the next few slides where we can write the same query by using greater than the maximum value. So let us see how to do that.