 Good morning everyone. I hope all the centers are tuned in. Today we are going to look at more join related constructs in SQL and then move on to other constructs such as net sets of queries, aggregation and so forth. Joining section and course because the query wanted the course ID, semester, year and title of each course. The title was there in the course relation while the other attributes were there in the section relation and therefore we had to join them. So we saw how to join them by putting a condition in the where clause which required that section dot course ID equal to course dot course ID and we listed the relations in the from clause. So this is roughly where we left off and you would have done a bunch of problems in the assignment which required joints. So moving on to the next slide this we covered yesterday. This was a quiz question. Moving on the next construct I want to cover is natural joints. We saw natural joints briefly in the context of relation algebra and somebody was asking this question yesterday also. SQL certainly includes a natural joint construct and the construct is illustrated by the query below which joins instructor and teachers. So this time the teacher's relation has an instructor ID or in this case the attribute is called just ID and instructor also has an attribute called ID and note that these are the only attributes which have the same name in common between the two relations. So the query which you see here say select star from instructor natural joint teachers. So what does natural joint do? It first of all takes the two relations and finds what are the attributes which are in common. So the only attribute in common here is ID. Now this is something you should be careful about because you might see that you want to join an ID but by chance there may be some other attributes in the two relations or if you are joining multiple relations you have to be even more careful. There may be some other attributes which you did not intend to be the same but they happen to be the same name. Natural joint will make them all equal and you may get the wrong result if you use natural joint carelessly. So this is a construct which has to be used very very carefully and there are variants of which which are safer as we will see in a little bit. So the basic premise is the natural joint takes the attributes which are common and then it matches rows from the two relations which have the same value for the common attribute. In this case the ID attribute. So here the natural joint does what we expect. In the teacher's relation for each tuple which it has an ID value it matches it up with the corresponding tuple from the instructor relation. Note the schema of the result which you can see below here. The ID attribute appears only once. Why did that happen? Although it was present in both relations if we did a Cartesian product ID would have appeared twice. Once from instructor.id, once from teacher.id. But natural joint has already equated them so they have to be the same. There is no point having the same attribute twice. Therefore the natural joint operation whether it is in relational algebra or in SQL removes the duplicate attributes and stores ID exactly once. So the schema over here has the attributes ID, name, department name, salary, course ID, section ID, semester and year. So those are all the attributes from the two relations and you can verify that the rows in here correspond to things that match. So that is the natural joint operation. As I said there is a danger with natural joint that unrelated attributes with the same name in the two relations may get equated incorrectly. So for example let us take the query which says list the names of instructors along with the titles of courses that they teach. So now what are the relations that we need for this? If you have the schema diagram along with you you can refer to that and you will notice that the names of instructors appear only in the instructor relation while the titles of courses appear in the course relation. Now we cannot directly join instructor and course. The connection is through teachers which has both a course ID and an instructor ID. So we have to join instructor, teachers and course. And what would we be joining on? The joint between instructor and teachers would be on the ID as we saw already. The joint between teachers and course would be on course ID because that is the course information for a particular course that is being taught. So this is what we want to have equated. So if we carelessly write this query as select name title from instructor natural joint teachers natural joint course what goes wrong? Something goes wrong. You actually may run this query and you may get a result which looks correct. In fact it would be correct in most cases but there are cases where this query gives the wrong result. So you have to be very careful with the natural joint constraint. So what is wrong here? To understand that you have to carefully look at what all attributes are in common between any two of these relations. And if you look more carefully we see that there is one more attribute name in common which is department name. It is present both in the instructor relation which indicates which department the instructor in is in. And it is also present in the course relation to show which department a particular course belongs to. If you do a join natural join on this you do get a result which seems correct. But there is a problem. There is a case where this fails. And this is a case which is not very common but does happen. Now most of the time instructors would be teaching courses in their own department. And then the department name of the course would match the department name of the instructor. But this is not a requirement. In many places instructors do teach courses from other departments when the courses are related. So in IIT Bombay for example we have faculty from computer science teach courses in our center for technology alternatives for rural areas sitara. So here you have a faculty member who is in computer science department but is teaching a course in sitara. Now what happens to that particular combination? Now the course ID matches the instructor ID matches. So you would have expected to find the name of this instructor along with the title of the course in the sitara department. Unfortunately the department names of the instructor and course are forced to be equal and that test fails. So although we have an instructor teaching that course that instructor vanishes from the result. And if you test this query you know you may create a few sample data sets and test it actually seems to give the correct answer. But if you took the trouble to create a case where the department names don't match then you would realize that there is an error in this query. So since it's rather difficult to detect such errors you should be very careful when using natural join. Now how do you do this correctly? We can do it correctly by not using natural join at all. We can join all the relations and explicitly list the conditions to be satisfied or we can do part way as is shown in the second query where we do a natural join of instructor and teachers. There everything is fine because only common attribute is the ID. But when we join with course that is when there are two attributes in common one should be equated one should not. So if you observe the query it says instructor natural join teaches comma course. So the comma course forces the cross product between the first part and the second part. The first part is the natural join and then the cross product with the course and the where clause condition specifies the matching condition and here the matching condition is only on course ID. The department name also comes from both sides but that is not there in the condition. Therefore this result is the correct result unlike the previous query. Now there is another alternative syntax which we can use in place of natural join where instead of equating every attribute in common SQL allows us to specify exactly which attributes are to be equated. This is a much safer thing than using natural. I would recommend this rather than natural join and the construct is shown at the bottom. So what is this construct? It says select name comma title from the first part is the same instructor natural join teaches that did not need any change. The second part has changed into join course using course ID. So the join using construct is exactly like natural join but it does not try to match every pair of attributes which have the same name in the two sides. Instead it only equates those attributes which are explicitly listed in the using clause. So here in the using clause we have only listed course ID we did not list department name. As a result the query using the using clause does not try to equate department names and therefore this query is correct just like the second query the preceding query which we saw. So I hope this using clause is clear and in your assignments you can use natural join when you know it is very safe but otherwise please use the using construct. So we have seen so far a few join constructs. Later on we are going to see a few more join constructs including the concept of outer joins but before we get to that there are a number of more basic constructs which are widely used. So I would like to cover those and later in chapter 4 we will get to outer join operations. So the next construct which we want to look at is the rename construct. Now why do you need the rename construct? There are many uses for the rename construct. One use is if you have an expression in the select clause then what is the column name which is output. If you have an attribute name so take the first query the query say select ID name. So in the result of the query which is a relation the names of that first two attributes are obviously going to be ID and name there is no problem. What about the third attribute salary divided by 12 what name does the system give to it? Now it is not defined in SQL the system can give any internal name it wants you do not know what it is. So if you want to give it a meaningful name to be printed out as part of the result you can use the as clause to rename columns. In this case we decided to call it monthly salary to so that it is clear what it is the original one would have been the annual salary presumably and when we divided by 12 we get the monthly salary. Now I should note that certain databases allow the renaming without the as clause. In fact SQL makes as clause optional so even if you just said salary divided by 12 space monthly salary it should work. But certain databases do not support the as clause explicitly but it is always there implicitly if you just say salary divided by 12 space monthly salary that would mean the same thing. So in Oracle for example you should just take these queries and drop the as construct and leave everything else as is and it will work fine although if you use the as construct Oracle would reject the query as a syntax error. So that is one kind of use for renaming where we rename attributes or columns. There is another very important need where we have to rename an entire relation and there are multiple uses for this but one place where it is fairly essential is if you want to join a relation with itself why would you want to join a relation with itself. So here is a query which says find the names of all instructors who have a higher salary then some instructor in compsci. So what we want to do is for each instructor we pair it up with every other instructor from compsci. In fact every instructor from compsci and check if this instructor has a salary which is greater than the other instructor salary if so we output it. So what we want to do is compare every instructor with every other instructor and that requires doing a cross product followed by a selection on two copies of the instructor relation. Now in SQL supposing you write select something from instructor comma instructor there are two copies of instructor but how do you distinguish them they both have the same name. If I say instructor dot salary which one the first instructor or the second instructor that is not clear. Therefore SQL allows you to give an explicit name to two instances of a relation which occur in the from clause by renaming them using the as construct. So in this query we say select distinct t dot name from instructor as t instructor as s. So what do we have here we have two relations in the query now t and s which are basically the instructor relation the query is not actually making physical copies of the relation although internally it may create temporary copies if required but conceptually they are equivalent to turning them into two separate relations and called t and s which are the same as which have the same contents as the instructor relation. So what does this query do first it takes a cross product then it says where t dot salary greater than s dot salary and s dot department name equal to comp sign. So what is that doing it sparing up every t tuple with every s tuple first of all it is eliminating all such pairings where the s tuple is not comp sign. So now clever evaluation engine will actually before doing the cross product it would actually filter out all s tuples which do not satisfy s dot department name equal to comp sign but that is a topic for a later chapter. So now let us say it pairs it up with every comp sign tuple now there is one more condition which is t dot salary greater than s dot salary. Supposing the t dot salary for a particular t tuple is greater than at least one of the s dot salaries then with that particular s tuple that in other words with that particular instructor in comp sign who has a lower salary this t tuple would match both the conditions in the wire clause. Therefore it would appear in the result paired with that s tuple. Now of course the result does not actually output the s tuple it is only outputting t dot name. So that particular name will come in the output now why do we need a select distinct over here that is because a particular t tuple may have a salary of let us say 100,000 and there are five people in the comp sign department which salary less than 100,000. Now you will note that that particular t tuple will get matched with every one of these five instructors in comp sign with a lower salary and therefore t dot name would appear five times in the result and since we do not want that we have put a select distinct there to remove the duplicates. So that is an example of a renaming in the from clause. Now again in certain databases such as oracle the as clause has to be removed and if you just say instructor t comma instructor s that has the same effect. Now a scale is stored used to store character strings very often and naturally you need to do certain operations on character strings occasionally. For example you know you may want to match name supposing I know that there is some employee whose name contains some particular string let us say Fatak but how do I find which of the employee names matches Fatak? I cannot say name equal to Fatak because the name may be stored as db Fatak. So it is not exactly equal. So I want to pattern match that is I want to find names which contain the string Fatak. Now how do I specify this? It is an important kind of query is a very common kind of query therefore SQL has a way of specifying this and it has a very simple pattern language which has two special characters percent and underscore. Percent matches any string while underscore matches a single character any single character. So here is an example of a query this time instead of Fatak it is looking for any name of an any instructor whose name contains the characters dar. So what did we do? This query says select name from instructor where name like percent dar percent. So what are the two percent doing? It says there can be anything preceding dar there can be anything following dar and it does not matter what is there. So for example Sudarshan would appear as a result because it has something before and something after. What if a name was just Darshan something? So dar is right at the beginning then too it would be a part of the answer. Percent can also match the empty substring it does not have to have any characters there. So this query will find anything which has the substring dar. I should note that the like operation in the SQL standard is case sensitive. Therefore it will not match something which where the D is capital as stated. However many databases support extensions to this. For example in PostgreSQL you can say I like which is case sensitive like and if you use I like then it would match a string where D is capital or D is small it does not matter. Now if that is not supported what do you do? Well there are operations which can map strings to upper or lower case. So in SQL for example there is the upper operation. So that operation is the function upper. So if I say where upper of name like percent and then I put capital Dar percent then it will match regardless of whether the string initially had lower case or upper case or if I said lower name like percent Dar percent. Again if I use small Dar there it would still work because if the name initially had a capital D the lower function would convert the capital D to a small D and then it would match. Now what if I want a string which actually has percentage. So I have some statistics and I want to find an occurrence which has let us say 90 percent. So how do I match the percent? So SQL has a escape character which you can explicitly list. So you can say like 100 backslash percent escape percent. Now note that all strings in SQL are enclosed in single quotes. Therefore quote 100 backslash percent quote is in single quotes as also is quote percent quote. So now this will match the string 100 percent literally rather than look for anything which contains 100 followed by any other characters. And there are a number of other functions in SQL. Another very commonly used function is string concatenation which in SQL you perform by using the vertical bar twice a double vertical bar. So that concatenates two strings. So if my database has first name and last name and I want to output it as a single column I can say first name double vertical bar last name and they would come out as a single column concatenate it together. There are other functions for example finding the length of a string extracting substrings if I want to pull out certain characters from a string and so forth I can do that. So there is a variety of string functions in SQL. I would not list them here but your manual for the database will tell you what functions it has. Now let us move on to another topic which is important which is how to order the results of a query. Now as I told you earlier a relation in the relational model is unordered. There is no particular meaning to the order in which the tuples occur. But when I output results to the user the order is very important. Why? Let us say I have a list of students with their IDs which I want to output onto a screen and now the user wishes to scan through this to look for specific IDs. If the IDs are sorted obviously it is a lot easier for the user to find those. Therefore SQL allows a way to order the results of a query when it is output to the user. There is another utility for this which we will see later but for the moment it is just to order the results in order to present it to the user. So SQL has an order by clause which is applied at the end of a query. It is listed at the end of a query. So here is a query which says select distinct name from instructor order by name. So the names come in alphabetical order and by default the order is ascending. So the least name is order up to the highest from A up to Z at the end. So that is the ordering. Now if I want to change the ordering I can say order by name descending. If I want to order by multiple attributes that is if I want to order by department name and among all the instructors in the same department I want to sort them alphabetically. I can say order by department name comma name. Now what if I want to order department names in descending order while instructor names are in ascending order. I can do that too. I can say order by department name descending, instructor I can say ascending or if I leave it default it is ascending. So I do not need to specify that. And descending and ascending can be abbreviated to DESC and ASC. So that is how you order. Now note that the order by clause is applied after everything else is done in a query. So we have seen select from where clauses they will all be applied to get a result and at the very end the rows in the result are sorted based on the order by clause. Now to take a short break here is a quiz question. We need to set up the quiz. So do not answer it yet but you can read the question meanwhile. I have a couple of questions from yesterday which I thought I can answer now. The first question is from Manipal Institute which says I believe relational algebra is categorized as a procedural language. Could you please let me know why it is so when we just use a single statement to access the data. This is a very good question is relational algebra procedural or is it declarative? Now as in many things in life everything is a shade of gray. It is not all black and white. So what is procedural and what is declarative has a lot of shades of gray in between and certain people look at things in the middle and say the glass is half full while certain others say it is half empty. So that depends on whether you are an optimist or a pessimist. So an optimist in the relational camp would say relational algebra is purely declarative while a pessimist would say hey it is procedural because it tells you do this step followed by this step followed by this step. And both the points of view have definitely some degree of correctness in them. Relational algebra is not purely declarative because it does tell you to do a sequence of steps. First do this join then do this projection and so forth. However, it is still fairly declarative because it does not tell you how exactly to do the join or the projection and so on. And there are many different ways of doing the same join or projection as we will see when we look at query processing. So it is certainly declarative up to this point. In fact, it can be treated as much more declarative because it is fairly easy for a query optimizer to rewrite a relational algebra query in a different way which is equivalent. And therefore, even though the programmer thought they were specifying a particular order of operations, the evaluation engine can completely change the expression but still get the same result. So what is happening is even though the you know the person who wrote the query may have wanted it procedural. But in fact, most programmers who write in relational algebra do not assume that the query will actually be executed in this particular sequence because they know the optimizer may change it. In fact, nobody writes relational algebra queries to be honest. Everyone writes SQL queries. But what happens is that it is translated into relational algebra and then it is optimized. So if you went to the intermediate level, anyway it is going to be changed. Therefore, you can't read relational algebra as being essentially declarative. So depending on which camp you are, you can call it either way. But it is clear that if you write a query in C or C++, it is certainly procedural. It is not declarative. If you write it in relational calculus which we have not covered in this course, they certainly don't give any procedural element. They are very declarative. Relational algebra is somewhere in between. What about SQL? SQL is again, it is based on relational algebra. So it is somewhat procedural but nobody would really call it a procedural language. SQL is certainly declarative. So I would go on the side of saying both of these are essentially declarative languages. So I hope that answered the first question. Now there is another question which is from Amrita Bangalore. The question says in the create table takes schema, you said that section ID need not be a part of the primary key because at a time a student can register for one section. But I feel semester and year need not be part of the primary key either because given ID and course ID, it can be in a particular semester of year. So can ID course ID be a primary key? Now in a world where no student ever fails a course and everybody passes and therefore they take the course only once and no one can re-register trying to get a better grade for that matter. In such a world you could say that ID course ID is a primary key but in reality students do fail a course and they may need to take it again. So then ID course ID would not be a primary key because we want to record that the student took it in autumn 2010 failed and will be taking it again in spring 2011. So the second one will be the same as the first if you just take ID course ID but once you add semester year it will be a primary key. But it is much more clear that a student cannot take the same course twice in the same semester in parallel that does not make any sense. So I hope that answered part one. There is another part of this query in a query where we find the course ID sum etc. where we join two tables section and course. We explicitly mentioned section dot course ID to resolve it. Can it be course dot course ID? We have a course ID in section and in course. Both the tables have the same attribute name. Now if both the tables are present in the query in the where clause if you simply say course ID you are not being clear about which course ID you meant whether you meant section dot course ID or course dot course ID. If both of these are in the from clause in the where clause or in the select clause you have to be explicit. So you have to say one of these two section dot course ID or course dot course ID. Now if you equate the two. So in the where clause I say where section dot course ID equal to course dot course ID. So let me write it out. I do not know if you can get this screen up here. Select star from section comma course where section dot course ID equals course dot course ID. If you run this particular query you will observe that the select star will select all attributes. So it will actually have two copies of course ID one of which will be section dot course ID the other will be course dot course ID. Now we know that we have actually equated them. So they are both going to be the same. So why have two copies? So instead of select star if we struck that out and instead we list explicitly what we want. So I could have said select section dot course ID comma whatever else I want. Now why did I choose section dot course ID? Well I could have equally well said course dot course ID. Now this is going to be entirely equivalent because anyway both of these have been forced to be equal in the where clause. So I could use either one. It does not really matter. I hope that answered the specific query which you had. Please see the question. Do not answer it yet until the clickers are enabled. But first please press the S T key on your clickers. So the clickers wake up and are ready to answer the question. We are giving you one minute for that. Please press S T key on clicker. So I hope you have read the question by now. You can omit the order by if you omit the order by clause. The options are there is no default ordering. The results may turn out to be ordered. Both the above are true and neither one nor two is true. So those are the four options. Press A B C or D to answer the question. I think you had enough time. So let me move on to the next slide. We will see the results only after two more minutes. Now let us move on to some more syntactic features of SQL. These are syntactic sugar. They are not really essential but they are useful. So the first construct is something called the between operator which lets you say that a particular value should be between two specified values. So if I say that salary between 90,000 and 100,000 that means it is greater than or equal to 90,000 and it is less than or equal to 100,000. We could have written it having two comparisons and an end. This is just a simpler way which is useful in some context. Another piece of syntactic sugar is instead of listing A equal to B and C equal to D, we can say A comma B equal to C comma D. So here is an example of that. We have instructor and teachers. Now I want to find instructor teacher's address where the IDs match of course and the department name is biology. So this is a fairly silly example. Nobody would write it this way but there are other examples where it is more useful. But here I could write the query as where instructor.id comma department name equals teachers.id comma biology. So it is a pairwise equality. So the instructor.id will be equated to teachers.id and department name will be equated to biology. So that is the tuple comparison. Now we will see later that there are much more meaningful uses of this when we see subqueries. The next topic is duplicates. Now as I told you before if you do not declare a relation to have a primary key SQL will happily let you have duplicate tuples in the relation. That is usually a bad idea but there are certain situations where duplicates are useful. So for example if I want to find the average salary of instructors. Now I want to get the salaries of the instructors and then find the average. We will see how to do that in a little bit. But note that there may be two instructors with the same salary. If I eliminate one of the salaries, one of the copies of the same salary then the average is going to differ. So I do want to keep both copies. Therefore there is a good reason to have multi-sets in SQL which have potentially have duplicate copies. So that is a semantically meaningful reason. But SQL started dealing with duplicates for another reason as well which is to do with efficiency. So the idea was if you want to eliminate duplicates as in relation algebra does there is a cost to it. How do you eliminate duplicates? You have to find if there are duplicates. Now given a large relation how do you find if there are duplicates? Well the standard way of doing this is to sort the relation and then any duplicates would be adjacent. So you could look at adjacent tuples if you are equal drop one of the tuples. So this way we can sort and then find duplicates. Unfortunately sorting is a very expensive operation and if every query required a sort queries would run very slowly. So SQL made duplicates elimination an optional feature. So what it says is if you say select distinct I will eliminate duplicates. If you just say select I will keep the duplicates. So this was SQL stand initially which made sense from a efficiency viewpoint. There are places where the programmer is clever enough to realize there are new duplicates although SQL is not. So even without sorting and eliminating duplicates there will be no duplicates in the first place. There are other cases where there may be duplicates but it is okay the programmer realizes it does not matter really. So this is how the situation was until people started making use of the results of the query on something above it. So if you put an aggregate on top of this then you will realize that the number of duplicates matters. Now the number of duplicates it turns out can vary if you use different evaluation plans for the same query. The original intent was let us keep things cheap by not eliminating duplicates. The problem was depending on how you evaluated the query you may have a different number of duplicates. So now a database system might say look I will make sure that whatever the plan I will get the same number of duplicates. But now Oracle may make a decision, IBM may make another decision, PostgreSQL may make a third decision and the same query may give different results on different database. Therefore SQL standards committees decided early on that if you are going to allow duplicates in the result we better define exactly how many copies of duplicates will be present. So that is what we are going to see. To understand this instead of dealing with the whole SQL language it is easier to deal with relational algebra and define the number of duplicates in a multi-set version of relational algebra. And then the SQL semantics in terms of how many duplicates generate will be defined in terms of relational algebra. So the two-step process makes life simpler because it helps us avoid explicitly dealing with each of the syntactic components of the SQL language. So let us start with the selection operation in relational algebra. So if I am given a relation R1 with which may have duplicates and I apply sigma theta on R1. So what is sigma theta? Theta is a predicate which filters out rows in R1 and only outputs those which satisfies the predicate theta. Theta stands for any predicate. It could be you know name equal to. I had duplicates in the input relation R1. The multi-set version of selection says that I will just go through the relation and for each couple regardless of whether it is a duplicate or not I will just apply the selection and output it. That is simple, efficient and that is exactly the semantics of sigma theta. If there are duplicates it just preserves them as is. It does not change anything. The next operation is the projection operation which outputs specific columns and eliminates other columns. Now projection is interesting because it can introduce duplicates. We may have a relation which differs on overall the tuples are not the same. There are no duplicate tuples. But if I project on a single column for example if I project instructor on department, instructor relation does not have duplicates. But if I project it on just department surely computer science has multiple instructors and there will be duplicates in the projection result. The basic relational algebra version of project eliminates these duplicates. In contrast multi-set relational algebra keeps all these duplicates. It does not try to do anything. It does not waste time removing duplicates and it is actually very straightforward. It just goes through the tuples one by one projects out the columns which are as for and outputs that tuple without bothering whether it introduced duplicates or not. So, that was easy. Now let us go to cross product. So, R1 cross product R2. Now what are the duplicates in there? In fact here too what it does is the cross product operation outputs every tuple in R1 paired with every tuple in R2. And it continues to do that. There is no change in that operation. But we need to understand what happens in terms of the counts. Now you may implement the cross product in a different way or you may optimize the query and do various things to it. But how many copies of a duplicate tuple will be there in the result if there are duplicates in the input. So, to understand this supposing tuple T1 occurs C1 times in R1 while T2 occurs C2 times in R2. So, the first one occurs maybe 2 times in the first relation and same tuple sorry the matching tuple in fact for a cross product any tuple T2 it does not even have to be matching appears maybe 5 times in the other relation. Now each copy in the first relation will be matched with 5 copies in the second relation. So, how many copies do we get overall? 2 copies in the first relation each match with 5 or 2 times 5. So, in general you will get a product of the number of duplicates in the 2 relations. So, if there were C1 copies of T1 and C2 copies of T2, T1 T2 will appear in the result C1 times C2 number of times. So, that is also straight forward. Now what about the join operation? How do we define the duplicates? It turns out join is not a basic operation join we know can be expressed in terms of a cross product followed by a selection. Therefore, the number of duplicates in that does not have to be explicitly defined. It is implicitly defined by saying the join is simply cross product followed by a selection and we have already seen how to count duplicates for selection and for cross product. So, if you did not follow that fully here is an example with duplicates. So, R1 has 2 tuples 1a and 2a while R2 has 3 tuples and the schemas by the way R1 has the attributes A, B and R2 has attributes C this 1 attribute. So, R2 has 3 tuples 2, 3, 3 note that 3 appears twice over here there is in fact no duplicate, but if you did a pi on B of R1 what do you get? The value 2 appears twice although R1 did not have duplicates the projection has duplicates. Now if you take a cross product of that with R2 what do we get? Well the A value appears twice here and over here 3 appears twice therefore, you will get A3 2 into 2 which is 4 times. Similarly, 2 appears once here and A appears twice here. So, you get 2 into 1 which is twice A2 A2 followed by 4 copies of A3. So, that is the result for relation algebra. Now what about SQL? I have been saying that SQL queries can be written in terms of relation algebra. I am not going to do it for all SQL queries, but for the simplest form select from where the translation is very very simple. Query of this form select A1 through AN from R1 through Rn where P stands for any condition it can be an AND or it can be a complex condition it can be anything which appears in the wire clause. Now this SQL query is defined it is semantics or it is equivalent to the relation algebra query which does the following. The relation algebra query first takes a cross product of R1 through Rn then it applies the predicate P which is in the wire clause that is we have selection sigma sub P applies the predicate P. Finally, the attributes which are listed in the select clause appear in the projection. So, project on A1 through AN sigma P of R1 cross R2 up to cross Rn. So, this defines the meaning of a SQL query and now we know how many duplicates there are for relation algebra. So, we know how many duplicates that would be in the SQL query result. Now there are some details which I am omitting here what if the SQL query had an expression those details are fairly straightforward and I am not going to spend time on those. So, that was the duplicate issue. Now duplicates do have an impact on query processing if you did a select this thing the SQL can do whatever it wants as long as it removes all duplicates at the end. If you did not use a select this thing then SQL has to make sure the number of duplicates is correct.