 Let us look at a few of the chat questions. One of the questions is why cannot we use the wire clause after the aggregate function group by. So, after the group by again have a wire clause. That is just syntactic detail. I mean the having clause over there is equivalent to the original wire clause except that you can use aggregate functions over there. You cannot use an aggregate function in the original wire clause. The aggregate function can only be used in the select clause and in the having clause. That is just a detail in SQL. Now, the having clause is not really essential syntactic in SQL. We can actually nest the sub query inside of from clause and that is actually coming up in the next few slides which shows how to get rid of the having clause all together if you wish to. Somebody else had asked for queries to be written. Somebody said count the number of teachers who are teaching in computer science department. That is for teaching a computer science course, but they are not from the computer science department. So, this is a good exercise. Try it out on your own. So, we have courses in CS which are being taught. So, what I want is instructors who are teaching a computer science course, but they are themselves not from the computer science department. How would you do this? There are many ways of doing it. You can use an accept clause. So, first find the IDs of all instructors who are teaching a computer science course. Be careful not to equate the department names there and then I can eliminate from that all instructors who are in the computer science department. So, we can say accept. Now, in place of accept, I can equally well say not in. So, somebody else had actually asked this query further down. What is the difference between accept and not in? In fact, any accept query can be rewritten using not in by just taking all the columns of the first query and saying not in and then the second part of the accept. So, A accept B can be written as select star from A where all the columns of A together not in select star from B. I do not know if this was completely clear, but it is straight forward to rewrite accept as not in, but rewriting not in as accept is more tricky, because not in can look at just one attribute or two attributes not all attributes. So, the other way is much harder, but this way is straight forward. So, now let us come back to live question. We now have Kongu engineering college Terindwara Sir, how to copy the schema of a table without using create command? How to copy the schema of a table? Copy the schema of a table without using create command. What you mean is, can I create a new table using the schema of an existing table? Sir, is that the question? Sir, not the content, but just not the content, I need only the schema. So, in Oracle, I think there is a syntax which is create table like another table and in some later versions of the SQL standard, this was introduced as part of SQL. I do not remember the exact syntax, but I do not know if all databases support this particular syntax. So, you need to check your particular database system to find the exact syntax, but if you are using Oracle, there is a I think create table like another table. So, that will create the same schema, but not the content. Now, let me also mention, if you do not care about the integrity constraints, you can say create table as select star from another table, which will copy the content, but you can also include a where condition where false. So, let me write it down for it to make more sense. I hope you can read this. So, the where condition is basically false, so that no row is output and what is happening is create table T 1 as select star from T 2, where 1 equal to 2, 1 equal to 2 is false. So, no tuple actually gets output, but it is an empty table and the schema is copied over meaning the attribute name and the types will get copied over to the table T 1, but one thing which goes wrong with this is the integrity constraints do not get copied. So, then you have to do that separately, but if you do not care about integrity constraints, then this is an alternative, but the other syntax is which I had mentioned was something like create table T 1 like T 2 or some variant of this syntax, which some database supports. Any other query? Whether we can use sub query instead of table name in select statement. So, the question is in a select statement or in the from clause. So, in both the from clause and the select clause, we can use a sub query and I am going to come to that in just a moment. So, far I focused on sub queries in the where clause or the next topic is sub queries in the from and select clause. So, I will answer that question in the next few slides. Any other questions? The next one, what is the difference between in all and exist? In or is it? All, A, L, L all exist. Want the difference between in all and exist. In is used when you want to check if a particular value say there is a outer query with some table T. So, T dot A in and then I have a sub query here. So, the sub query may give multiple results and I am checking if T dot A that particular value is present in the sub query. All is used to compare a value that same T dot A with all the values, but all can only be used in conjunction with the comparison. So, I can say less than all equal to all, not equal to all and so forth. And finally, exist is used without any left hand side. So, exist simply check if exist sub query whether the sub query is empty or not. So, that is different, but these two in and all there are a few cases where you can use one in place of the other. For example, not in. So, want to check if a value is not in something, how do you check that? It must be different from all the values in that other table. So, we can say not equal to all sub query T dot A not in this is equivalent to saying T dot A not equal to all sub query. So, again there may be some issues here with null value, but if you ignore null values then it is clear that if it is not in there it will certainly not match with any tuple. So, and vice versa. So, these two would be equivalent if you ignore null value. Now, what about with null values I will leave that as an offline exercise. Sir, my question is that does natural join supported in all DBMS query languages by default like in Oracle SQL server? The question is is natural join supported in all the databases. I believe now all the major databases support natural join and all the join constructs outer join and so forth. Post-christian certainly does, I believe MySQL also does now. I have not used it recently, but I believe it has it now. I may be wrong, should cross check this and the other databases do support it. Oracle SQL server, DB2, Psybase, I am pretty sure all of them support it. So, yes you can use it. All the variants of join using, I will show you the full set of variants including outer join in the next chapter slide coming up, but yes all of them support it. We have Sishila Dhanchen. Good afternoon sir. My question is not regarding queries, but regarding schema sir. In our academic most of the time what we do is we use MySQL or Oracle and most of the time we create databases and tables directly, but as we are doing this in workshop we are using a schema in post-christian. So, is this a good practice that we also teach students to use schemas and create the tables or are there any advantages and any performance disadvantages like response and all. If multiple users are working on the same schema with multiple tables. Thank you sir. So, this particular question talks of schemas and tables and so there are different meanings for schema. So, most of the time when we talk of schema what we are talking about is the set of attributes for a relation and then the set of relations in a database, but in the SQL sense there is a hierarchy of catalog schema and then relation. Now what does this mean? In when you have an account on linux system you have a directory and all your files are within your home directory. Now in SQL you have relations. Where do these relations reside? Is there a flat namespace where everybody's file relations reside in the same namespace. So, if I create a relation R and you create a relation R there would be a conflict. So, the SQL language defines this notion of I think the three levels database catalog and schema. Database is the whole database. Catalog is the part of it. It is like one level directory down and then schema is one more level of directory down. So, your schema is like your home directory in linux and you create your files in your home directory. Similarly, here you create your tables in your home directory. Of course you cannot have sub directory here. It is flat. That is three levels. So, that is what schema means here. And in the context of creating schemas in PostgreSQL you are seeing the nitty-gritty because we showed you oracle system administration where you need to be aware of this. In other cases it kind of happens implicitly. So, when you create a user and you connect as that user even in PostgreSQL implicitly a schema is created for each user and you do not know about it, but it is actually there underneath. I think this is what you meant by schema. So, all databases have some variant of this notion. They may use different terms for it. They may hide it from you most of the time, but they all have it. We will take one last question. We have Piyusha. Hello. Good afternoon sir. Sir, I have solved the question. First question is in group by close, if we use group by close in our query then we can only select the attribute which we use in group by or any attribute that with aggregation function. Is there any way we can select another attribute from table and that we do not use in group by? So, the question is when we have a select query with a group by and aggregate the only attributes which can appear in the select clause are those which are in the group by clause or are aggregated. Can we use any other attribute? Is there any way to do it? So, there are two answers to this. First of all, if you have any attribute which can take multiple values in a group and if you select it in the select clause, which value do we choose? For a group the aggregate should only appear once. So, if I choose an attribute like let us say a group by department and find the some salve one instructor. ID varies within a group, a particular department may have five different IDs. If I select ID, which ID do I select? Because the group should appear only once. So, there is no way of determining which ID to select. So, if you want a ID or any ID, well some you can maybe select min ID or some such to pick one ID arbitrarily. But it does not make sense to select ID when there are multiple values. Now, some databases will be a little lenient and say that look supposing we know that the attribute has a unique value. Then even if you do not put it in the group by list, we will allow you to include it in the select clause. Which is actually kind of pointless because if you wanted it in the group by list, go put it in the group by clause what prevents you from doing so. And if it has a unique value in the group, let us say along with department I wanted something else, I do not know department budget. Given a department, the budget is unique. So, group by department budget will give exactly the same result as group by department. So, there are no two different budgets for a department. So, if I wanted the budget attribute to be output for some reason and I said group by department comma budget, then sure I can use the budget attribute, it is there in the group by. Some databases will let you say select department comma budget with a group by department alone because they know budget is uniquely determined. But do not depend on it, add the budget to the group by and you know it will be the same result. I think I will stop there on the questions, sorry to the many others who have indicated you have questions in the interest of time we need. So, the next construct is subqueries in the from clause. So, here is a query where we have a select department name average salary as we have given it a name here from instructor group by department name. So, this is the same query we saw before, but now we have stuck it as a subquery inside of a from clause. And we have an extra condition in the outer wire clause which is average salary greater than 42,000, does not this look familiar. So, this is exactly the same query we wrote with the having clause, but now instead of a having clause we stuck the first part as a subquery inside of the from clause. And then the outer query is wire clause has whatever was the having condition. Well, if the having condition use an aggregate we have to compute it in this select clause and then we can have a condition on the aggregate here. So, what this shows is having clause is option you can rewrite it like this, but it is still convenient. And this just shows another way of writing the query which illustrates the following. When I have a subquery in the from clause I may need to give it a name. So, I can give it a name. So, I have this subquery starting from here and ending here and I have given it a name as depth average, because in this case I do not have a name for that subquery for that relation. I am treating this as a relation there is no relation name, but here I have given it a name department average. And moreover I have given names to the individual attributes. So, department name I have kept the name as a AVG salvi here I renamed here I renamed it in the select clause. In the second one instead of renaming it there I renamed it here. When I say as department average department in comma average salvi. So, this is one way to use subquery in the from clause. Now, here is another interesting situation which let me first explain this to you and then tell you why it needs extra construct. So, this is something using a construct called the lateral clause. So, what is this doing select something or other will come back to it from instructor I am giving it a name I 1 lateral and then here is a subquery. And here what is happening is select average salvi from instructor I 2 where I 2 dot department name equal to I 1 dot department name. What is this subquery doing? It is taking a correlation variable that correlation variable is I 1 dot department name it is coming from here. Now, if I do not use the lateral clause this is illegal in SQL. So, what does this mean? I can have multiple subqueries in the from clause I can have relations in the from clause, but a relation name from here is not accessible in the subquery it is not allowed to it is not visible inside the subquery by default. However, SQL standard introduce this keyword lateral which is that if you put it in you can use this thing on any subquery which occurs on the right hand side. So, if this was instructor I 1 I 1 can be used inside of a subquery which appears to it is right provided you use the lateral clause. So, it is part of the SQL standard, but it is not supported on many databases. Some databases supported some offer alternative first SQL currently does not support it, but in the next release 9.3 it is planned. So, the beta version of 9.3 actually supports the lateral clause. So, it can be useful in some situations where I have a subquery which takes a parameter and this gives me a set of parameter value. So, I want to execute the subquery for each value from this set of parameter value. Now, moving from subqueries this is a new topic called the width clause. The width clause is actually a variant of views if you are familiar with views a width clause is basically a way of defining a temporary view whose definition is available only to the query in which that width clause occurs. You are not familiar with views does not matter let us see this example. This example says width and I am giving a relation name here I am giving it a name max budget value is simply an attribute name it can be anything here as select max budget from department. So, what is this doing I am creating a temporary view which contains the maximum budget across all departments. In this case there would be only one tuple because there is no group by and there is a max in general the view can have many roles. So, that is a relation which we have just created it has a single attribute whose name is value. Now, we have just defined the view I can have a sequence of views defined here we will see that finally, I have a query select budget from department comma max budget. So, now I am joining department with max budget where department dot budget equal to max budget dot value. What is value here it is this attribute name which actually corresponds to the maximum budget across all departments. So, what is this query doing it is created a view and then it is joining the view to find the department with the maximum budget. So, why did we write it like this we could have written this in other way, but the point of this query is to show that you can break up a large query into pieces by creating temporary views using a with clause and this is really useful to structure large query. If you are write complex queries by using sub queries within sub queries and so on very soon you will not understand anything. An alternative is to have a sequence of things create table as query 1 and then use that table in the second query and then use that table in the third query, but here you are telling the database system to actually create the tables and go in that order which may be unnecessary. May be in order to answer the final query I do not have to create the entire first table I only need a few rows from that I can pick those rows in this example does not illustrate it, but there are such cases. Therefore, it is best if you define intermediate tables using the with clause intermediate view and use them in your main query. So, let us take some examples. So, here is this example find all departments where the total salve is greater than the average of the total salve at all departments. The first step is to find the total salve in every department. The next step is to find the average of the total salve across all departments and the last step is to find departments total salve is greater than this average of the total salve. Now, you could write all this with deeply nested queries, but it will be impossible to understand, but what we have done here is use the width clause to structure it nicely. So, the first width clause is department total, this is the total salary in the department, department name value which is the attribute name. Select department name from salary from instructor group by department name. So, this gives me for each department which has at least one instructor, the sum of its salary. So, that is the total salary. If a department does not have instructor, it is not going to appear here. So, if I want to include those, I have to do something more. The next step is to find the average of these total salaries and this I do here. So, I am defining department total average value as select average value from department total. So, I have taken this view and I am using it here and finding the average across all the values here. And the main query finally, does the join of department total that is this view with department total average that is this view. On the condition that department total dot value is greater than equal to department total average dot value. Well, I have made greater than equal to. So, read this as whose total salary is greater than equal to the average of the total salaries in all departments, small glitch in this query. To match the English, this should have been greater than not greater than equal to. So, I hope this sequence of things makes it clear what this query is doing. So, if you have to write a complex query, by all means use the with clause. Now, some years ago the with clause was not supported by most databases may be 10 years back, but today pretty much every database supports the with clause. So, by all means use it. And the final category of sub query is scalar sub query, which are used where a single value is expected. So far, all our sub queries could return multiple values. So, we said in or exist or we put it on the from clause. But now, here is an interesting case, where I want to find out the total number of instructors in each department. So, what have I done here? Select department name from department. And then in the select clause, I have said select count star from instructor, where department dot department name that is this department department name equal to instructor dot department name as num instructor. So, the result of this query is a scalar. So, I am using it in a place where a single value is expected. And indeed this will generate a single value, because count star is an aggregate, it will generate only one value. And that value is given the name num instructor. So, I hope this query is clear, it is going to find the number of instructors in each department. Now, we could have equally well written this query by a group by on instructor group by department. But there is a subtle difference. So, first of all what is that other query? Select department name comma count star from instructor group by department. That seems like a much simpler query, why do we write this complex query? We think about it, what if a department does not have any instructor? If I did a from instructor group by department name, a department which does not have an instructor will not show, it will not be in the result. But maybe I do want to know such department, maybe rare that I have such a department, but if my application demands it, I should show those departments. What should the count be? It should be 0, this query actually deals with that. So, what happens here? If a department has some instructor, those instructors will be counted. If a department does not have instructor, what happens here? This sub query will generate no results as input to count. So, the input to count is a empty set. What is the count star on an empty set? It is defined as 0. And so, this will show the department name with count as 0. Now, in this case count star made sure only one result can be written. But you can actually write queries which do not have an aggregate. And then at run time, in some cases they may return one value, in which case it is fine. But if they return more than one value, it will return an exception. Because at this point, this is a scalar sub query used in a place where a single value is expected. If the sub query returns two values, it will crash. So, that is a nice example of a scalar sub query. And we are going to see scalar sub queries coming up when we do modifications. So, the last topic for today is modification of the database. In this case, deletion. So, the first query says delete all instructors, delete from instructor. The next one says delete instructors in the finance department, delete from instructor where department name is fine. That is straightforward. The next one is a little more complex. I want to delete instructors who are associated with the department in the Watson building. That is the instructor's department is in the Watson building. So, I am deleting from instructor, but I need a condition on department building. So, the delete has a sub query, delete from instructor where department name in, select department name from department where building equal to Watson. So, this sub query is going to return all departments whose building is Watson. And then this one deletes instructors whose department name is in this set. Is this clear? You can have more complex query. This one says delete all instructors whose salary is less than the average salary of all instructors. This is an idiotic query. Why would you do it? But still for illustrative purposes, let us take this. So, this query is written as delete from instructor where salary less than select average salary from instructor. Now, supposing we execute the query as follows. We take, go through the instructor relation one at a time. We look at the salary, select average salary from instructor. If it is less, delete and keep doing this. Something weird happens. The first time is average of all instructor's salary. Supposing the first instructor's salary is less, that instructor is deleted. If I now compute average salary again, it is going to be different. So, now, depending on the order in which I consider the tuples of instructor, I might result in different final states. Why? Because after deleting the first instructor, the average changed. So, the second instructor might satisfy the new condition, but may not have satisfied it initially. If I considered instructors in the other order, that second instructor would not have satisfied the condition initially and would have not been deleted, but may be is deleted now or the other way. So, there is a problem. You do not want the result to be dependent on the order in which we consider tuples. Therefore, the SQL semantics is very clear on this. It says that you must first compute the entire subquery here on the initial state of the instructor's relation and then test individual tuples. I cannot have this subquery changing state as the delete happens. So, think of it another way. Actually, a better way to think about it would be, I run this whole query, but I do not delete yet. All I will do is, if I find an instructor satisfies this condition, I will mark it for deletion, but not yet deleted. So, I will do it in two rounds. In the first round, I will scan all instructors and mark things which need to be deleted, but not actually delete. And then, I will go back again and delete all those which are marked for deletion. This will not be order dependent, because the deletion happen at the very end, not while the query is running. So, that is essentially how SQL defines the semantics. The last two modification topics, insertion we have seen before, just insert value. There are some variants of this syntax which let you give the names of attributes. So, you can reorder them. You can even drop some attribute, in which case null value will be taken or you can explicitly add null here. Do not put it in quotes, then it will become the string null. If I just say null, that is a valid value which I can insert into a relation. So, here total credits is set up. This is another example of insertion. If I insert into student, select something. So, all the tuples in this sub query result are inserted into student. Again, there is an issue here. Here, the insert into student was from instructor. Now, supposing I did the following. Insert into table 1, select star from table 1. Again, SQL says the semantics says first I will completely evaluate the sub query to find what all tuples are to be inserted and then I will insert them. Supposing I did not do this, there is a problem. What I mean is, I do this as I am doing select star from table. I am scanning through table. I find one tuple in table 1. I insert it into table 1. Find the next table, tuple inserted into table 1. I keep doing this. Now, if table 1 had a primary key, immediately the primary key would be violated and this query would stop because of a integrity constraint violation. But supposing it did not have a primary key, then the first tuple that I find, if I inserted into the table, it would allow it to be inserted. There is no primary key violation, but there could be a problem. The question is what is this problem? Think about it for a minute. I have a table. I have several rows. I am scanning down one at a time. I am here in this table. I am in the middle of the scan. I am taking this tuple and adding it at the end of the table. When I come to the end, I find this tuple which I had inserted earlier. I am going to find that and I will again insert it. As I scan further, I will find the new tuple. Since I have found it, I again insert it. What is going to happen? It is an infinite loop. I keep inserting more and more rows into this table and this query will never end if I execute it this way, which is very bad. So, the SQL standard says you cannot do it this way. You must first compute all the things to be inserted without doing a single insertion and in the end insert all those rows which we have just computed and then there is no ambiguity. Finally, update. Here is an example. Update instructor. I want to increase the salaries by 3 percent if the salary is over 100000 or 5 percent otherwise. Socialism at work. Those who are paid higher get a smaller increment and so what we do here is set salary equal to and there is a case expression. When salary less than 100000, then salary is power 1.05, else salary is power 1.03. So, this case statement returns a single value depending on this condition returns this or this and then the update happens. The last slide does an update with a full sub query. So, here what did I have? I had a case expression, but it was not a sub query. The last one is an update with a sub query in here. So, let us explain what this sub query is. It says, recompute and update total credits value for all students. What is total credit supposed to reflect? It is a number of credits which they have cleared. They have taken the course and they have passed it. So, how do we compute this? I already have that information. It is redundant in the sense that the takes a relation already has whichever courses the student has taken. So, I can compute it as follows. For a particular student, this sub query computes the total credits they have passed. So, the update statement has a sub query set total credits equal to this sub query. I am giving a name student S. In the sub query, what is going on? Select some credits from takes natural join course. Why join with course? Because course has credits. Takes as the course ID, but the credits is in course. So, takes natural join course where S dot ID, student dot ID equal to takes dot ID. So, this restricts it to courses taken by this particular student. Furthermore, the grade is not F and is not null. So, it makes sure that the student has passed the course. For only such courses, it sums up the credit and then it assigns it to this total credit set. So, you can actually execute this query. Try it out today afternoon in your lab and it will set the total credit. There is just one glitch here. If a student has not yet cleared any course, they are in the first semester, they have not cleared a single course or they have finished a semester, but they failed every single course. In either case, this would be the empty set. Some on the empty set would be null and so the total credit should be set to null. Maybe this is not what I want. Maybe I want it to be 0. In which case, I can instead of some credits, I can have a case statement. When some credits is not null, then some credits else 0. In fact, there is a simpler way. The SQL standard has a function called koalas. So, what does the koalas do? It has a number of arguments. It goes through one at a time. Say first argument, in this case some credits. If it is null, it goes to the next one and it outputs the very first one which is not null. So, if some credits is null, it skips it and outputs 0. If it is not null, it outputs it actually. So, this is actually equivalent to this one. So, you can try this variant also. I think PostgreSQL does support it. So, that is the last slide for this session. Just give me a moment to review the lab for today. So, today's lab has two parts. So, this is assignment 2A and 2B is nested subquery. So, let us start with 2A. It has aggregation, grouping, ordering and so on. So, these are all things which we have covered today. There are some easy queries. Then there are some harder queries which have labeled as optional. If you have time and you are able to do it, by all means go ahead and do it. If not, it is ok. If you do not have time to do this, no problem. Then 2B is using nested subquery to do various things, to both do queries and to do updates. So, all the topics which we have covered today are there in today afternoon's lab. Tomorrow's lab will include other SQL queries. Now, I intended to cover a little bit more of chapter 4, which was auto-join. I did not actually get round to it. What I wanted to check is that the assignment does not require auto-join. You can write some of these queries using auto-join, but it is not mandatory. If you know about it, go ahead and use it. If you do not know about auto-join, that is the very next topic, but we have run out of time in today morning session. A little bit behind time. My original goal was to cover it. If you are interested in asking questions, by all means stay back and ask. So, let us start with. Ram Meghe, Badnira, we have a question. Sir, my question is the expressive power of relational algebra is equals to the expressive power of TRC or the expressive power of safe TRC. So, these are questions with respect to the expressive power of the various things, algebra, calculus and so on. In this version of the course, I did not focus too much on that. What we have done is in earlier days, the course used to be more theoretical and address issues of expressive power and which are equivalent. That is how the relational database theory evolved in the early days. People said that, hey, there are so many ways of writing queries. There is a calculus, there is an algebra. What if there are things you can express with one which you cannot express using the other? So, one of the initial results which showed that the different forms of the algebra all have the same expressive power was viewed as a very powerful result. But, to some extent it was cheating. What do I mean by cheating? How can the Turing award winner, the person who essentially brought relational databases to the state where it is today brought it to the notice of other people and led a movement in some sense. How can he be cheating? Well, it is not really cheating, but the point is that maybe they are all equivalent, but it turns out there are some important things which cannot be expressed using any one of those and that very important thing was aggregates. So, min and max you can get by using plane relational algebra, plane tuple relational or domain relational calculus. But, if you want some and average and count and stuff like that, it is actually not expressible in first order logic. Some of these require second order logic. If you know about mathematical logic, they are not expressible in first order logic. They need second order logic. Now, a bunch of mathematicians who are involved in this said that first order logic is a standard thing and if it is supported, we are in good shape. What they did not realize is in the practical world, there are second order things which are actually important. So, that equality was not so important in practice because none of them was good enough in something. But, if you go back, the extended relational algebra is more expressive than the relational calculus. Now, QB ended up adding some features for aggregates which is essentially extended relational calculus. But, at this point it is not something we need to worry too much about. So, we will leave it at that. Let me add to one more part of the previous answer. There was a question about safe relational calculus. So, it is possible to express things in relational calculus which are not safe in the sense that you could generate something with the infinite answers. So, there is a subset of it which is safe which guarantees that it does not have infinite answers. So, the equality is actually between relational algebra and the safe version of the calculus. The original version of the calculus can actually give infinite answers which you cannot actually get using relational algebra. So, that is a minor point which I missed in the previous answer. The next one is Sri Buddha college Kerala. Do you have a question? Please go ahead and ask. What is the main difference between in and exist keywords? What is the difference? That means in terms of execution is there any major difference between in and exist keywords in terms of execution. That is a good question. Let me repeat the question. What is the difference of any between in and exist in terms of execution? So, any query written using the in clause is actually not very hard to rewrite it using the exist clause. So, the in required some attributes from the outer thing to be present in the result of the sub query. Now, I can actually add some extra conditions using it will become a correlated sub query even if it was not. I can add those as extra conditions in the sub query and replace the in with exist. If you want I can write it out in detail, but it is not hard to figure this out. Take a simple in query and turn that in condition into an extra equality condition inside the sub query and replace the in by an exist. So, they are in some sense equivalent. Now, in terms of execution well that depends on the execution engine. A naive execution engine might do it in one way, different one might realize that these are equivalent. In fact, they may translate both of these to some of the form which does not actually require a correlated execution. It might do it using a join or an anti join or some other such method. We will see this later on, but the bottom line is the exact way it is executed depends on the database system. But the correlated evaluation or the simple semantics that SQL gives would be more or less the same for in or exist, but the actual optimized execution could differ depending on the database. I think this will be the last question. Amritha, Amritha Nagar please go ahead. Sir, there is just a very basic question from yesterday's session. Okay. We discussed about candidate keys. Yeah. And we discussed that candidate key is a key if it is minimal. Yeah. And we also defined minimal as no subset of a set is a super key. In such a condition when we said that there is no subset also. Yeah. Pardon? There is a small correction when you said a candidate key is a minimal super key that is it is a super key and no subset of it is a super key. Every subset of it is not a super key meaning that it can occur more than once in a relation. Yeah. So, continue with your question. How can the key become a primary? Because the primary key is supposed to be the one which is unique and the most minimum. It is the minimal subset. So, first of all, let me just remind you of what this is. Super key is any set of attributes which uniquely identifies tuples across all instances, not just in the current instance, but in any valid instance a super key should uniquely identify a tuple. A candidate key is a super key which is minimal. In other words, if you drop even one attribute from a candidate key, it is no longer a super key. That could be duplicates. Now, a primary key is a candidate key which is you pick one of the candidate keys as a primary key. So, a primary key has to be minimal. It should not have a extraneous attribute in the relational model sense. In SQL, you can declare anything as a primary key. SQL does not know whether it is minimal or not. So, in SQL, when you declare something as primary key, all that it is ensuring is that it is a super key. It has no idea whether it is minimal. That said, normally what you would do is, you would not add extraneous attributes to a primary key. It is a bad idea to add extraneous attribute. So, it is in your own interest to not throw in extra attribute. And in other words, choose as a primary key only, even in SQL, choose as a primary key only something which is a candidate key. That is, it does not have unnecessary attributes. You could choose any one. Which of these, supposing there are two candidate keys, which one of these do you choose as a primary key? Now, this actually depends on the further domain knowledge. So, typically you choose as a primary key, a set of attributes which are not going to change. So, let us say that within a particular address, you cannot have two people with the same name. So, then name comma address might be a candidate key. If you drop the address, name could be duplicated. If you drop the name, you could have duplicates on address. So, name comma, assuming address is a single variable, name comma address is a candidate key. However, we know that address is likely to change. People move and sometimes names also change. Therefore, this is a bad choice as a primary key. Instead, you would choose something else like, you might even create an artificial thing. You know, student ID, roll number is an artificially created thing, but it is important and that is what you use as a primary key, not name comma address. Does that answer your question? Yes, sir. Just one more question. As far as the relational algebra goes, we have the usual select and project and the standard stuff which is going about. Other than what was basically which was there in the book itself, is there any further advancement or research being done in that area? So, in terms of expressive power, if you originally aggregation was not part of the relational algebra and that was added as an extended relational algebra construct later on. Now, in terms of actual implementation, they have lots of new operators which can do various things. They may implement a division operator internally. There is an anti-join operator which we have not discussed, semi-join operator. We have not discussed these. So, implementations tend to have many, many more such operators, but is it an active research area? Not really. I mean, now it is like programming. You know, you can create new operators on demand for specific applications. Does that make it a research area? Sometimes, sometimes let new operators so useful and novel in some sense that you can think of it as research, but more often than not it is a matter of programming and implementations do create new operators which they use, they implement using maybe some new algorithm and then the optimizer is designed such that given an SQL query, it can generate a plan which uses that new operator. So, this happens all the time. Every database keeps adding new operators and keeps adding rules to the optimizer to generate plans with the new operator.