 First, let us do cardinality since that came up first. So, here is a set of student role numbers and here is a set of courses. So, now, each student is mapped to the courses that they have taken. So, now, how many students could be mapped to a course? Is there any limit? Maybe an upper limit, but surely there can be multiple. Now, how about this? Can students require to take only one course? We can take multiple courses. In this case, this student has taken two courses. So, what can you say about this relationship? Many. Yeah, it is a many to many. So, each student can be mapped to many courses. Each course can be mapped to many students. So, that is many to many. That is one type of cardinality relationship. So, this relationship in this case is the registers for relationship between student and course. And student has a role number course as a course code and you can have grades here. So, the question here was cardinality and that is covered in this slide. So, the cardinality is number of participating elements in the set. One to one. Can you give an example of a one to one relationship student to hostel? That is actually many to one. When you take student to hostel, many students can be mapped to one hostel. Hostel and room number. In IIT, students share rooms. But let us say that each student has only one room or you have two of you are sharing a room here. Let us pretend you all got individual rooms. Then it would be one to one person to room. Many to many we just saw. And many to one, one to many just depends on. It is the same thing. It depends how you look at it. So, if you take U to room numbers, it is what? From U's course participants to room number, it is many to one. So, many of you can be mapped to one room. But each of you can be mapped to at most one room. So, two may share a room. But one cannot have two rooms. So, if you look for participant to room, it is many to one. Or if you look from room to participant, it is one to many. But it is not many to many. And you saw how to convert a relationship into tables. Did you cover this? So, this is the corresponding table. Student, role number, course code and grade. As this slide says, in our database example, we use customer name as the identifier. But in reality, we would not do that. That would be a customer ID. So, that is a super key. Now, did you see candidate keys? Yeah. So, candidate key is just a super key which is minimal. You do not have extraneous things. Yeah, sure. Customer ID plus phone number is unique. But then, customer ID itself is unique. So, why add on phone number unnecessarily? So, it is. And then primary key is something which you choose as the unique way of identifying something in here. So, the customer ID, the branch ID and so on are the unique identifiers which are chosen as the primary key. So, in the book, we tend to distinguish between the relation schema and the relation instance. So, we use capital R to denote the schema that is the attributes in the relation. Whereas, the instance is the actual table with the rows in it. Smaller is the actual instance. Smaller. Right. Right. But if you see in the real world, people fudge this all the time. It is very clear from the context what you mean. Whether you mean the relation schema or the relation instance. So, in the real world, most people do not care. They just use the same name. And from the context, you know, whether they are talking of the schema or the actual set of rows. And a foreign key is an attribute of a table which corresponds to, at least in SQL, the foreign key has to refer to a primary key of another table. Conceptually, what is a foreign key? It is an attribute in this table which appears in the other table. And at least in the SQL version of primary key, foreign key, this value must appear as the primary key of the other table. So, what is that guarantee? It guarantees that you cannot have junk data here. So, if you had a branch table with branch ID and then for a policy, you have a branch ID. If you did nothing, I can throw any value into branch ID in that policy. It did not correspond to an actual branch, but that is dirty data. You do not want that. You want it to be a valid branch. Therefore, what you do is introduce a foreign key. So, pictorially, it is shown here in this schema diagram as an arrow. So, here are several foreign keys. Account has a branch name. And this arrow says that it is a foreign key into branch. So, this ensures the data. So, this is a diagrammatic thing. You can declare it in SQL as we will see. Once it is declared, the database ensures that any value which is stored in this column of any row must appear also in the branch name attribute of a sum row of branch. Similarly, in depositor, account number is a foreign key to this while customer name is a foreign key here. Similarly, for borrower, similarly, here branch name of loan is a foreign key. So, this is the diagrammatic notation. And in SQL, skipping these, let me go to the part where we declare these keys. So, the create table syntax is all right. You used it also. So, over here, we have just declared the attribute names. But in fact, you can declare integrity constraints of which primary and foreign keys are two types of integrity constraints. So, here is an example here. No two customers can have the same customer ID number. This you enforce as a primary key. Because you have chosen the customer ID to be the primary key, the primary key declaration enforces this. The second one is what we just saw. The branch name attribute of the account relation must contain a value corresponding to an actual branch in the branch relation. So, this is a foreign key. There are also not null. The amount attribute of a loan cannot be null. What is null? It is a special value indicating the values unknown. And obviously, you do not want an account whose or a loan whose amount is null. I have a loan. I have no idea how much I loaned. This is crazy. You cannot have it in real life. So, you insisting that you do not allow that value. Of course, that does not prevent someone from entering a garbage value. But, we assume that this at least alerts them to this fact. So, in SQL, there are primary key and foreign key declarations. So, here is this same branch table which we saw before. Create table branch name city assets. And we also add a primary key branch name. You can add all the integrity constraints at the end. So, here is customer with primary key customer name. And as we saw before, is customer name as primary key realistic? Not really. So, really you would use customer ID. And coming to a referential integrity, this is the foreign key. Skip to this slide. Create table account, primary key account number. That part is required. Now, foreign key branch name references branch. So, this part is the attribute name locally. The attribute name here is branch name. So, foreign key, this branch name references. This is the name of a table branch. Which attributes of the table does it reference? And the primary key attribute. Here, there is a single thing which is the branch name which is the primary key. In the schema, you will use for exercises. Instead of branch name, we have branch ID. So, it is the same. So, you will have branch ID references branch. This is a foreign key branch name. It must appear as a primary key with the same name into the branch or with the same name. So, you can actually specify a mapping of names over there. If you do not do anything, I think it is required to be the same name. Although it can infer. If it is just a single column, even if it is a different name, you are just saying that the value here must be presented as a primary key attribute over there. But if there are two columns, then it obviously is not so clear. So, you can actually list it. You can say branch name comma something. References branch. And then open bracket, you can explicitly list the primary key. And ordering can be explicitly defined. So, branch name corresponds to maybe b name there. While the next attribute of the foreign key corresponds to that other one. Otherwise, you do not know. a b may correspond to c d there or a b may correspond to d c there. That mapping you can explicitly specify over here. We have not shown it here, but you can do that. But generally, you will do it like this. This makes sense to use the same name. And depositor has two different foreign keys. Account number references account and customer name references customer. And the other integrity constraints are not null, which we saw. Unique, which specifies that the attributes together guarantee a unique value. So, candidate key, right? And they are allowed to be null. No restrictions on that. Whereas, primary key attribute values are not allowed to be null. Once you say it is primary key, nulls are not allowed. You can even have a check constraint where you put in any predicative wish. Balance greater than 0 for example. So, SQL supports a variety of integrity constraints. That is a good question. I am glad you asked that. So, what does it mean if a foreign key is null? Supposing that I have an account, but I do not know which branch it is. Is that acceptable? Well, you may say that it is not. Somebody else may say, yeah, it is acceptable. If it has a branch, it must be a valid branch. Otherwise, it can be null. So, you want to have this flexibility. So, in fact, what happens in the foreign key is, when you declare it as a foreign key, that does not say it is not null. It is perfectly fine to have nulls for all those values. If you do, all of them must be null. But if it is not null, then it must correspond to an actual valid thing. If you do not wish to allow nulls there, say that. So, you saw the create table command. This is a corresponding drop table, alter table. Alter table, you can add a column or drop a column. Although the drop column does not work on many databases. You have to drop the table and create it afresh. If you try this on your system, sometimes it will object to this. If you have a foreign key into a table and then you drop the table, what does that mean? You have to foreign key into a non-existing table. So, then the system will say, sorry, I cannot do that. There is a constraint which refers to this table. Then there are different actions you can take. You can say drop table cascade, which means any foreign key which references this table is wiped out. The foreign key clause is also deleted, drop. If you do not specify cascade, the drop table will fail. You can try that out today, if you want. Create your own tables and then drop them. Yes, I said you can add and drop attributes. So, you can alter table R, add the name of the attribute and the type, just like in the create table. And just to round off the data types, you saw the basic data types yesterday, intent, where care and so forth. Did you see the data types, the interval type, how to extract your data and so forth. Large object types. Good. So, that ends day one. Let us come back to day two now. Now coming back to more basic stuff, we saw that certain things, which are the results of arithmetic expressions, aggregate expressions, do not have a meaningful name. If you want to give it a name, you can do that. You can say something as something. So, you can see this query. Select customer name, borrow a loan number. Amounts times, well, it was 10, I guess. Amount times 10 as new amount. So, in the output table, this column will have the name new amount. So, that is one example of the as clause. In fact, in many databases, you can even drop the keyword as. Yeah, you can just give that and that still works. Although, I would recommend using it as if the database supports it. Some databases are funny. They will not allow you to use as oracles, for example. This kind of renaming can also be used for other things. Here, we have what are called tuple variables, where you are giving a new name, not to the output in the select clause, but rather to something in the from clause. So, take this query. Find the names of all branches that have greater assets than some branch located in the city. This is the city, branch city in Poway. Let us pretend Poway is the city. So, here, how do you do this? You want to find branches which have greater assets than some other branch. So, you are comparing two branches. The second branch must be in Poway and the first branch must have a greater assets for it to be output. So, we write the query as follows. We have two occurrences of branch in the query, but now we have to give it a new name. Otherwise, you do not know which one you are referring to. If I say a branch dot assets, which branch? This one or this one? So, if you have two occurrences of the same relation, you must give them new names using this branch as t branch as s. S and t can be anything. In fact, in oracle, you can drop the s. In fact, you are not allowed to use the s in oracle. So, you will have to say branch t, branch s with just a space. So, what this has done is this copy of branch has been named as t. This copy of branch has been named as s. So, now, I can say t dot assets greater than s dot assets and s dot branch t equal to Poway. So, s is restricted to branches in Poway and I am selecting distinct t dot branch name, where the assets of t are greater than s. So, all the names of all branches that have greater assets and some one branch at least located in Poway. Is this query clear? This illustrates use of these things. These are called tuple variables. Yes, that is perfectly possible. So, it can, if you have two branches of Poway with distinct assets, the higher one will show up for sure twice. So, let me repeat the question for the others. The question was supposing there is a branch which assets let us say 1 million and there are two branches in Poway with assets of 500,000 and 300,000. So, if you take the cross product here, so like this is standard SQL query. So, it is always a cross product first conceptually. So, every branch is combined with every other branch. Now, what is the condition here? t dot assets greater than s dot assets. So, if you take these pairs that let us say we have a Hiranandani branch and not Hiranandani, let us say something else in Worley and that has more assets than IIT Poway branch as well as another branch in Poway. So, that branch in the cross product will be combined with both of these. That branch in t will come up, the cross product will have the combinations with these two. If you check the condition here t dot assets greater than s dot assets, it will be true for both those pairs and s dot branch t is Poway will be satisfied for both these. So, that t will appear twice in the result and that is why we have done a select distinct to remove those duplicates. So, you can use this tuple variables anywhere you want. Sometimes it is useful to avoid long relation names. So, in this case we have said borrower s t loan as s and now you can say t dot loan number s dot loan number. Instead of typing the whole borrower dot and loan dot. If you have exclude Poway branches. If you have exclude Poway branches. Then what would you do? Under clear branch city not equal to Poway. Correct, you can add that if you want to eliminate things which are not in Poway, which are in Poway rather you can say t dot branch t not equal to Poway. Less than greater. Try it out. It is less than greater than this time. But, database may support not equal also. So, I think I mentioned in one of the slides earlier arithmetic operations. So, you have to look up. It is not shown here, but anyway you will find it in the textbook. It is not in this set of slides. So, to wrap up this slide as I said the key classes optional. Maybe this is a point to try of these queries because we do have some time. So, here are all the branches which assets the branch cities here are Mumbai, Chennai, Kolkata and so on. So, write that query select t dot branch name from branch t branch s where t dot assets greater than s dot assets and s dot branch city equals and we are going to have any results. What is a Mumbai has these two? So, this one in Chennai should come out. How did we get all of these? Let us look at the table again. No, it does not type it separately. Clear up. Ok, Mumbai has 30,000. We get all of those. Yeah, but Mumbai had just a one branch. So, t dot assets greater than s dot assets. Yeah, there is a cross product. Then you will find all those which are greater than some s where the s must be in Mumbai. Oh, here sorry I did not see the first one. This guy Puranee Haveli in Mumbai is the lowest 10,000. The only one which is not there is which is not greater is this fellow that Puranee Haveli itself we said greater than. So, that and this do not match. So, now let us do Kolkata. Now, that is 20,000. Well, even then everything will have something in there. So, all of them will appear. I did not do the distinct which is why you saw the repetition. So, Pauvai comes twice, Ignapur comes twice. So, if I did a distinct, so all these branches. Any questions? So, Kolkata has just a one branch with 20,000. So, all of these Ignapur, Sion and Pauvai. These are the three which have greater assets than some branch in Kolkata. So, now let us move into a new topic which is nested subqueries. So, this is a different way of writing a query in which is actually nested within the wire class. And you use this for a variety of reasons. You can use it to check if something is contained in another set. You can do set comparison. We will see examples of what this means. So, here is a query. Find all customers who have both an account and a loan at the bank. We saw how to do this before, right? How was it? No, no, both an account and an intersection. So, that is one way of doing it. Now, here is another way of doing it. Select distinct customer name from borrower where customer name in select customer name from depositor. So, what is this in clause doing? The in clause takes a value and sees if it is in a set. Now, what is this sub query here? Select customer name from depositor. What is it going to return? It is going to set of names which are there in depositor. So, now I am going to take names, customer names from borrower and if it also is there in the depositor customer names here, then it is output. So, next query is find all customers who have a loan but do not have an account. How do you check for? Do not have an account. Not in. So, here is from borrower you get customer names. Those are the people who have a loan where customer name not in select customer name from depositor. You can actually say in the SQL standard. I am not sure if SQL supports. I think it does. You can say open bracket customer name comma some other attribute in and then the select clause can say select customer name comma something. So, you can have here. You can have two attributes within brackets and here you can just list them select customer name comma something. So, that was not there in SQL initially. It was added later. So, you can write out in post-crystalline to see if it supports it. By the way, you know where to look for if you are using post-crystalline. You know where to look for documentation? Yeah, the home site for post-crystalline.org. If you search on the thing you will find the documentation. Look for the appropriate version of post-crystalline. I think we are using 8.1 if I am not mistaken. The current one is 8.3. What is set up for you here? I do not know which one it is. It is 8. something. Anyway, there is not too much difference between 8.0, 1, 2, 3. Minor differences in terms of SQL features. So, if you look up the post-crystalline 8.0 documentation that should be good. Good question. How do you choose which one to use? So, the logic behind SQL, the idea behind SQL is that there may be more than one way. It is a job of the database to figure out the best way of executing, regardless of how you specify it. That is the idea behind SQL. Now, actual implementations do go to a good extent towards this ideal, but they may not always reach it. In particular, for example, with nested sub-queries like this, post-crystalline will actually transform this into something similar to the other one internally. I will not tell you that, but internally it will do more or less the same execution, whether you give it like this or you do the except or the intersect. However, if this sub-query is more complex, post-crystalline will not be able to figure this out. So, sometimes you will find this is inefficient, whereas if you had written it as intersect, except that will be a lot more efficient. So, nested sub-queries are very useful, but sometimes they can lead to very poor performance, especially on post- SQL. Post-crystalline is particularly bad because its optimizer is not very clever on these things. Other databases, the commercial databases are better in this respect. They are able to do quite a nice job, especially SQL server is very good at handling optimizing nested sub-queries. However, for simple ones, post-crystalline is equally good. So, the ones we are looking at are mostly fairly simple. So, here is another one. Customers who have both an account and a loan at the peri-rich branch, again you could do it by intersect or sub-query. So, this answers the question which you had, can you give multiple ones? You can write this in many different ways. This is just one way of doing it. In fact, this is more confusing way than the other one. It is probably a lot easier to write this by intersect, but just to illustrate the features, what have we done here? Customers who have both an account and a loan at peri-rich. So, first look at the logic. You need to know the branch name which is peri-rich. You need to know the customer name. They are in two tables. So, we have to join the two tables. So, for the loan, you have to join borrower and loan. For the account, you have to join depositor and account. So, there are two sets of joins to be done. So, this part says select this thing will come to that part later. So, from borrower loan where borrower.loan number equal to loan.loan number. So, that is the join to get all the related attributes together with only matching rows. We also want to restrict it to the peri-rich branch. Both the account and the loan must be at peri-rich. So, we also say branch name equal to peri-rich. That is guaranteed here. So, this part is clear. This gives you all borrower loan matching pairs from peri-rich. Now, we want to find customers who also have account at peri-rich. So, what you have done is and branch name comma customer name in this query gets you all depositor account pairs which match on account number. So, from selecting branch name customer name. So, if this is in this, then that means this particular customer the branch name is peri-rich here because we already equated that. If it is satisfied then clearly that customer has an account at peri-rich because here the branch name is peri-rich. Therefore, that customer can be output select distinct the most duplicates. As I said can be written in a much simpler manner, but here we have just used to illustrate this feature. So, now you can also use nested subqueries to do these sorts of things. Set comparison. Find branches that have greater assets than some branch located in Powai, Brooklyn whatever. Here we are using Brooklyn. Same query which we saw before. By there what did we do? We did a join of table with itself. Yeah, it is actually not an equality condition. We had the greater than condition. You can write that like this. Wait, this was the one we saw. You can write it like this. So, select branch name from branch where assets greater than some. So, this is a special clause greater than some means there is something in there which is greater than. Select assets from branch where branch is equal to Brooklyn. Now, between this and this which one would you use? If you wrote this query, which is more readable first of all? First one is more readable. That is a matter of personal preference. To me the second one is more readable. Here we are doing ST greater assets. What is going on? There is no equality condition here. You expect a join to have equality. You start wondering is this query correct? What does it mean? Whereas here it is lot more clear. Select this from branch where assets greater than some thing in this set. So, this is more directly mapping to this thing. If you read this English query and this query, the matching is very close. Whereas between this English query and this, the matching is not so obvious. So, as a programmer you may prefer to write it like this. There is less chance of making mistakes. Good question. How come I have not shown this thing there? Does it matter? Can there be duplicates? Think for a moment and tell me. We have assumed that branch time is the primary key for branch. So, now can there be duplicates? You are not doing the cross product. What we are doing is we take a particular branch and see if it is greater than some something in here. It may be greater than five branches in Brooklyn. We do not care. This where clause returns true as long as there is some branch in Brooklyn which satisfies this. Even if there are five, it will satisfy only once. Where clause is satisfied for this triple. So, each triple in branch will get output in the result only once. And since branch name is a primary key in our schema, there is no need to eliminate duplicates. There are no duplicates. So, if I were writing this query, I would definitely use the second form, not the first form. So, in fact, there are several forms greater than some, greater than all and so forth. So, this informal mathematical notation f, some comparison less than greater than some and then the right side is a set of values. So, relation is equivalent to there is some tuple in this relation such that the comparison holds for between f the left hand side and that particular tuple. So, it can be any of these comparison less than greater than and so forth. So, less than some, five less than some of zero five is false because it is equal to, but it is not less than five. Whereas, for five less than some on this, this is not actual less than syntax. This is just conceptually what you are doing. This is a set. So, this is true. Five not equal to some here is true because there is a value zero. It is not equal. So, equal to some is actually the same as in because equal to some in the five equal to some in this will be true as long as five is over here. So, that is the same as this value in this. Not equal to some is not the same as not in. Be careful about this. So, is it true that five not equal to some over here? Yeah, because it is not equal to zero. However, five not in this will be false because five is in this. So, here is another query which is find names of all branches that have greater assets and all branches located in Brooklyn. So, now, instead of some is a greater than all. This query clear is same as before except we use all instead of some has to be true for every tuple. So, look at every branch in Brooklyn. Make sure that assets is greater than branch in Brooklyn. But for all you have to be a bit careful. What if there is no branch in Brooklyn? What is this? Everything will satisfy. So, just like we saw for some the all thing is defined. I will skip it. It is sort of similar. Not equal to all is the same as not in turns out. It should not be equal to every one of those. But equal to all is obviously not the same as in. It must be equal to every one of those which is very silly thing. And these are the examples. Now, there are a bunch of other constructs. So, we have seen in not in less than some less than all. There is also exist and not exist. These test if the sub query is has some result or is empty. So, here is an example query. This is again same query could have been written in a simpler way from earlier. But now, we are going to write it differently. It is actually little confusing. So, let me show this. Customers who have an account at all branches located in Brooklyn. So, customers we get name we get from depositor as before. In our exercise schema to get the customer name we actually have to join customer with depositor on customer ID. So, you have to equate customer ID. Here the customer name is directly available in depositor. So, we are we can be a bit lazy here. So, now we are looking for that. So, we want to find customers who have an account at all branches located in Brooklyn. So, how do you ensure this that they have an account at all branches in Brooklyn? The for all can you know conceptual notion like this. In some cases you know if you say you can say quite equal to all and so on. So, that previous construct less than all greater than all does not work here. But you can represent it as follows. If there is some branch in Brooklyn where this person does not have an account this person should be eliminated. That is the idea. So, you would have a double negative. So, is. So, the for all clause here like has an account in all branches turns into a double negative sort of thing. There should not be a branch where this person does not have an account. So, how do you find branches where this in this case branches in Brooklyn where this person does not have an account. Well, that is this subquery everything from here to here is this subquery. And what is it doing? First of all you select branch name from branch where branch city is Brooklyn. So, these are all the branches at Brooklyn except those branches where this guy has an account. No, no. This part is branches where s has an account. So, how do you know where s has an account? This is interesting. So, first of all we have to join depositor account to get branch name on join it on account t dot account number equal to r dot account number. But there is more. We need the customer name to be the same as this guy. This guy should we want to find where all this person has an account. So, the depositor relation has account number and customer name. Whereas, the account relation has branch name and account number. So, we have to join these two. We have to join depositor and account to make this happen. But we want to restrict it to this particular customer. So, now note something interesting is happening. We said depositor as s. Then in this subquery we are referring to s dot customer name. So, we are saying s dot customer name equal to t dot customer name t is the depositor. So, what is happening here? We are using a tuple variable from the outer level query. This is the outer level query and this is called a nested query because it is necessary inside that. So, we are using a variable from the outer query in the inner query or the nested query. This kind of variable is called a correlation variable. So, you can give it give this depositor name s and then use s inside here. You cannot do it the other way. You cannot use t outside. The scope of t is only in the nested query. But the scope of s is yeah it is in it includes its subqueries. If this guy it is perfectly possible for you this guy to have a subquery. So, t will be visible in its subquery. So, you can have nested scope. So, what we are doing is using this selection condition where condition s dot customer name equal to t dot customer name to restrict this two accounts belonging to this particular s. So, think of the execution as follows. You do the from clause here and for each depositor s in this from clause. If there were multiple things in there you will take the cross product and for every combination you will check this where clause and when you execute the where clause you actually have to execute these subqueries. So, this subquery you can execute and it gets a set of brand names. This subquery you execute, but now the customer name is fixed because you are doing it for one particular depositor tuple. So, the customer name is fixed. So, this guy is going to give you only branches where that particular customer has an account. So, now the except does what? It will check if there is some branch in Brooklyn. If there is some branch in Brooklyn where this person does not have an account that will be there in the except result. If this person has an account at every branch in Brooklyn the result of this except is going to be empty. Now, what is this where clause doing? It says where not exists this thing. Unfortunately, it will clash with this depositor that is why we have you can otherwise if the names are unique you can use it, but here to avoid the clash we call that s and t. So, now what is going to happen? If a depositor has an account at every branch in Brooklyn this except clause will give a empty result and then the not exists will return true and therefore the customer will get output. But if the particular person does not have an account at some particular branch in Brooklyn then that will not get removed by the except then the not exists will fail and so that customer will not come in the output. Little convoluted, but if you need to write such queries unfortunately you have to write like this in SQL. It is not very common that you typically do not write such query, but once in a way you may come across such things and this is how you write it. So, here is another query customers who have at least two accounts. So, here we are finding customer name by joining these things where the branch name is very rich. But there is a extra condition again this is a correlation where you will t.custom name equal to r.custom name. So, we are finding this will return one occurrence of customer name for every account that this customer has in Poway. And now if you say it is not unique that means the customer has two or more accounts in Poway and these are the customers that we want customers who have at least two accounts in sorry this is very rich. So, far we have looked at sub queries in the where class. You can also have sub queries in the from class they are not actually technically they are not called sub queries they are called derived relations but it is a basic thing is very similar. So, here is an example where we are doing the following in the from class we have this query here this parenthesis ends here. So, the sub query inside the from class is select branch name average balance from account group by branch name we have seen this query before. So, every branch what is the average. So, that is the sub query and now we are renaming it we are saying as branch average open bracket branch name comma average branch. So, this is a variant of the as clause which lets you give a name to the table and to the attributes in there. So, this attribute does not have a name whereas we are giving it a name average balance here. So, now you have effectively this relay this whole sub query is conceptually is executed to give you this relation branch average and now this query says where average balance greater than 1200. Does this query look familiar? We saw it before right how did we write it last time? Yeah we use the having clause now with this construct you do not need the having clause. You can always take that put it as a sub query and then have a where clause here which looks at this. Having clause is not essential although obviously for some queries it is easier to write it using this query be easier to write using the having clause than this way. But this is more general you can write even more complex queries using this which you cannot write with a single having clause. I think this is the last topic for the day which is null values it is possible for attribute of a tuple to be to have the special value null. In fact sql is what is null it basically is value that does not exist the special value that indicates I do not know this value or the value does not exist. The actual interpretation depends on the particular schema and the you know the designer it could mean any of these it could mean when you say phone number is null for a customer does it mean the customer does not have a phone number? It might it might also mean that you do not know what the phone number is the customer may or may not have a phone number. But the actual value null is a well defined value in sql what it means is up to you the designer. Now unknown values are always a problem so how do you deal with unknown values typically in COBOL what do you do for unknown values the phone number is unknown what do you store space zeros so okay so she will store zeros you will store spaces somebody else might store nine nine nine nine okay each of you has your own convention for it. Now you do not know when you look at a value is it unknown I do not know maybe zero is a valid phone number I mean I may have the domain knowledge to know that zero is not a valid phone number but in some other domain where which I do not understand it may be hard to tell whether this is a valid value or it is really an unknown value null is a much better way of doing it COBOL does not have null size plus I know right sql this is one of the nice things about sql null is a value in fact every type allows null integers allows nulls floating points allows nulls character allows null everything allows nulls that's a special value now in if you use c or java you know that there is a null value there but the null value is available only for certain object types but for integers and floating point you there is no equivalent of null in those languages in sql every type allows a null so this unlikely to occur can be dangerous so in our fox pro database earlier the retirement age it was there in some manual record but the people who built the database didn't have it immediately available so they just stored a value unknown but what is the unknown value they stored this was built in the 80s so 1999 looked like a long way off so in fact they stored only two digits for the year they stored 99 so all retirement dates for 99 and guess what why 2k came along and everybody would be fired from my it so you have to scramble and change that to first of all four digit here and then set it to some some other date in the future subsequently it was made the actual retirement date so I mean now the data is correct but yeah it was at least a null value would have been better if it was not available in that system just like in cobalt fox pro didn't have it but sql has it any questions so now how do you deal with null value how do you store it how do you check if it is null so the predicate is null can be used to check for null values you cannot say equal to null yeah yeah if you this is a sql database right I mean it's a feature of the language and the database supports it so when you want to see if something is null you should not say equal to null because of the way comparison with null is defined I'll cover that in the next few slides but if you want to check if a value is null you should say value or whatever r dot a is null okay so for example select loan number from loan where amount is null that will tell you loans where the amount is null that's probably an error situation it should not be null now if you use our table create table if you declared loan to be amount to be not null this cannot happen the database guarantees it if you did not declare it it may happen then you can run this query to check nulls introduce a lot of issues what if I say amount plus five an amount happens to be null what is the result okay so that's how sql defines it now what about aggregates if I say sum of amount from loan and the amounts were 2 4 and null what would some be just ignore it makes it 6 what about average it ignores the null therefore you add 2 4 and divide by 2 not by 3 so it doesn't treat nulls as 0 if you just take some it appears that null is treated as 0 but that's not the case when you see average it becomes more clear that null is thrown out before you compute the aggregate now what if the group has only null as a value all the values are null in the group then what is some null average null so all the aggregate operations except count star ignore the null value when they aggregate so you are aggregating on some value if it's a null value it just ignores it count star is separate in that it counts nulls also counts the total number of tuples in effect whether the value is null or not is irrelevant if you say a count of let's say amount if it is null it is eliminated only count star which is yes correct so now this was arithmetic and aggregation with nulls what about comparison of a tuple if you don't have a primary key declaration you don't declare anything to be not null yeah sure it's a pretty useless tuple yes it would it will count it counts that will count it means this is probably an error situation it should not there's no reason to store such a tuple but if you do count star will count it a good question i'm glad you're asking these questions you're really probing into sql okay so we saw arithmetic and aggregation with nulls what about comparison this is important what if you well you won't say five less than null but if you have go go back here and take this query it's a select loan number from loan where amount less than five what is the amount in null what should you do it should not be considered right so can we assume it to be false okay yeah what to do so in fact it's a little tricky you so here is an example so you should treat it not as false but as unknown and the reason is you can have double negation multiple levels of negation which will get you into trouble so take this query supposing you say five less than a where a happens to be null is an attribute name used in where clause if a happens to be null if you treat five less than a to be false then what happens to not of five less than a it becomes true which is silly he is null you should not say that not of five less than a is true so what you do is any comparison with null returns a special value called unknown it does not return true or false it returns unknown so that you can carry through through and or and so on so what is and if of a true and unknown so let's say the and has two parts one evaluates true the other party values to unknown what should the result be unknown and will be or true of or and unknown or of true and unknown true it doesn't matter what this unknown is whether the unknown is true of all doesn't matter the other one is true so the result of our will be true we end up having is what is called a three valued logic where you define the results of all the Boolean operation and or not on these three values the results on true and false already known so you only have to define the result on unknown so or of unknown and true is true as you can see here or of unknown and false so unknown or false is unknown because it if this value were true it would be true if the unknown were false it would be false so we don't know remains unknown similarly unknown or unknown is unknown unknown unknown means it we don't know whether it's true or false and is interesting so it's a compliment it's opposite of or so true and unknown is unknown however false and unknown is definitely false so regardless of whether this is true or false this will be false similarly unknown and unknown is unknown and not of unknown is unknown and the special is unknown evaluates to true if p evaluates to unknown so is unknown is a special clause okay so all this is fine but now what do we do so we have a where clause we had a null value the null value resulted in an unknown truth value the where clause had and or not operations so you sort of push it through all those and the final result can be true false or unknown if it's true what happens the tuple is output if it is false the tuple is not output if it's unknown we don't know whether to output the tuple or not at this point you cannot carry it any further you have to make a call so we sort of postpone the decision on whether it's true or false but now you have to make a call you can't postpone so at this point you have to make a decision so SQL makes a decision that if the ultimately the entire where clause evaluates to unknown that tuple will not appear in the result so at that point unknown is treated the same as false at the end of evaluating the entire where clause but inside of it it is kept as unknown so that avoids this this kind of problem not of five less than a being different from five less than a five greater than a greater than equal to a so if you without the unknown not of five less than a would not be equal to five greater than equal to you expect them to be the same but because of nulls it could give different meanings so as a result of where clause is treated as false if it evaluates to unknown at the very last step so that ends what I wanted to cover today