 In this session, we will do a quick review of SQL. We will look at data definition languages. We will not discuss course projects here. This is something that you are familiar with. A simple comparison predicate is the standard predicate, value expression, operator value expression where operator is any one of these equal to not equal to not equal to is written like this incidentally. Less than greater than sign, etc. This is the example. 12 star S type n assuming that S type n is student's type n then the annual type n will be 12 multiplied. This is just to illustrate that any expression can be written on this side. Less than some constant. This is a valid predicate. The answer is yes or no. The logical combinations and or and not. All of you are familiar with the standard truth table. True and true, true, true and false, false. True or true, true or false, etc. You are familiar with that. Null values and the three-valued logic. I had suggested that you might want to look at what happens when null values are present because any comparison with the null value cannot return true or false. It returns a new logical value called unknown. Consequently, we have a two-valued logic which is true and false. Now we will have a three-valued logic, true, false and unknown and every predicate will evaluate to true, false or unknown. So any comparison with null will return unknown. Is file less than null? Answer is not true or false. It is unknown. Same thing here. Three-valued logic using truth value unknown apart from truth and false, true and false is therefore required. This is the three-valued logic. Unknown or true is taken as true. Unknown or false is taken as unknown. Unknown or unknown is taken as unknown. True and unknown is taken as unknown. False and unknown taken as false. Unknown and unknown is taken as unknown. Not unknown is unknown. And now the final thing. P is unknown. Suppose some predicate evaluates to unknown. That means it is unknown. This itself evaluates to true if predicate P evaluates to unknown. Now the result of hair clasp predicate is treated as false if it evaluates to unknown. So final decision making. If something, something, then this. Whether it is to be done or not. If it is false, you don't do it. True, you don't do it. The correct answer to the three-valued logic is unknown also. You treat it as false. Simple and common sense but it is important to understand it all. There are some additional operators which we have not discussed. These are not basic operators in the sense that whatever they do can be done by other operators. But these are simplifying operators. So for example you say val between value 2 and value 3. The actual complex predicate is val greater than equal to value 2 and val less than equal to value. So instead of writing two clauses connected by and you just say val between this and that. So select start from student where S date of birth is between 1988, 01, 01 and 1990, 01, 01. That will give you people with that. If you put not, val is not between value 2 and value 3. Like here it is not of val between val to 1. So it's negation of the composite clause. Null predicate is possible. When you say column name is null. This will evaluate to two or false depending upon whether the column has null value or not. So CPI is null. That's a formal predicate. It's a valid predicate. If the CPI is actually null it will return true. If the CPI is not null, any value it will return false. And you can use it appropriately in the decision making within SQL. Similarly column name is not null. When you say column name you could give a composite column. Two columns together because many times your primary key or some other candidate key will be a composite key of various attribute. And you want to know whether the totality is null or not. A key has no business to be null. Primary key is never null. A candidate key should also not have any null because that's an identifying thing. But any combination if you want to evaluate whether it is null or not, saying a, b is null is true only both are null. And not null is true both are not null individual. This is a logical interpretation. You would rarely require these things but it's good to know that such predicate possibilities exist. There is a like predicate which is extremely useful in information systems. Get all names starting with p. All students whose names start with p. So you can say select star from student where s named like p percent in apostrophe. You would be familiar with listing of files in an operating system. So you want to list all files starting with p. What do you say? List p star. So what star works which means zero or more occurrences or any character. That is equivalent to percent symbol and like is the comparison keyword. So because you are actually giving a pattern and you are trying to say any value which is like that pattern should be flagged. So this will match all names which start with p. The percent symbol matches zero or more characters. On the other hand an underscore matches exactly one character. So if you say p underscore two then p12, p52, p32, p72, p72 all will match but pxx2 will not match. You get that? And the combination of the underscore and percent symbol will give you a plethora of way. Basically I don't know how many of you are familiar with the notion of regular expression. Regular expression is a text expression which when you expand or when you find out the implication of these special characters inside that regular expression, you know what all strings it will match. It is a short form to describe multiple strings which can match this. So this is a way of giving a regular expression. Sometimes you want to match a string which itself contains a percentage or a underscore. Now how would you say the moment you say percentage is different. So you put an escape character before such percentage. That escape character is dollar. So dollar is used as an escape character when needed. So when you say where is name like this is I think important. So for example if I want to check whether somebody's marks are 80%. So this is a string character string which is stored as a value somewhere and I want to match this. So if I want to say where marks like can I say okay whether 80% or 20% or 70% I want to check all of them. So can I say marks like underscore underscore percent. No it will match any two character string. So to mention that this percent is not the SQL special percent. I write this as dollar percent. If the percent is preceded by dollar which is an escape symbol, it tells SQL don't interpret this percent in the standard SQL term but treat it as a character. So it will match that as a character. Same thing about underscore you can escape it. What if dollar itself is a symbol? Dollar dollar. So this means that this is an escape character so this thing is actual character dollar. That's easy to understand. Here are some examples of like predicate underscore ABC percent. This will match X ABC, X ABC DEFG. Notice that what this means is there should be exactly one character before ABC and there could be zero or more characters after that. Any string will match this. So it will match this, this but it will not match ABC alone. Why? Because there is an underscore here means that must be one character at least here. It will not match X Y ABC DEFG because there are two characters before ABC. So you can now construct the like predicate very easy. Look at this percent eight dollar percent. Percent means what? Any number of characters? Zero or one occurrences or any number of characters that is what it means. So this will match nothing eight, one eight, rate, blank, one zero. All of these are things which represent this percent but it should end with eight. So it will match all of these. Notice that the first one is a general percent character for string matching in SQL but this percent is an actual symbol percent because they are not. But it will not match 15 percent. Why it will not match 15 percent? Because there is an eight here. Before the percent symbol there must be eight. There is an eight but after that there is a zero so this will not match. Is this clear? I hope you can understand because many of the attributes that you would have would be string attributes and searching in strings is not very easy. So this will permit you to search generic strings and you know how many times you want to you have some approximate spelling you don't know but you can hunt out things from there. Now we come to a very important clause which for information system processing permits us to collect aggregate values. It is not simple average maximum that we have seen how it is done but it is aggregating over different groups of rows which are selected by the SQL query. So whenever we need to summarize data from a table on the basis of some groups for example find hostel wise average CPI of students. If I say select average CPI from SCPI from student I will get the institute wise average. If I say select average SCPI from student where hostel equal to eight I will get hostel number eight average. If I want for each hostel separate average then I have to run the query 13 times if I have 13 hostels. Every time saying hostel equal to 1, 2, 3, 4, 5 the group by clause automatically helps me do that. So the statement that you write in SQL is if you want to find hostel wise average of CPR students select a hostel comma AVG SCPI from student group by a hostel. Let's see what this does. You might want to just note down this query because the next slide actually explains how this query will work. Select a hostel comma average SCPI from student group by a hostel. SQL first selects all rows which meet the predicate. So what it will do? What does the predicate here? Go to the last slide. From student where? There is no where. If there is no where means predicate is true for all rows. So from the student it will select all rows. Next the rows are then grouped on the basis of the grouping attribute. What is the grouping attribute? Let's go back to the previous slide. Group by a hostel. That means take all the rows and essentially sort them hostel wise. But not just sorting. Sorting will give you even 5000 records if they were originally 5000. In these 5000 records which are sorted on hostel internally you will have 13 groups. One group of rows belonging to hostel 1, another group belonging to hostel 2, hostel 3 etc. After that this SQL does something interesting. The aggregate function is calculated for rows of each group. So all rows which have hostel number 1, the aggregate function average will be calculated only for that. All rows which have hostel number 2, aggregate function will be calculated for that. Which means in the group by a clause where we have said s hostel, if s hostel has 13 distinct values, this SQL statement will calculate 13 distinct averages. If the group by a clause had 25 different values, we have calculated 25 aggregate values. One for each group. And finally these will be given out. Because what you have said here s hostel average cp. So the output will not be 5000 rows even if you have 5000 rows. The output will be 13 hostels because they have been grouped together. So there is a single value. And for each hostel number the corresponding average cp. Can you see this powerful thing facility for information processing? Please note that all attributes in this select clause other than aggregate function must appear in the group by clause. After all this is a group by query. So if you want to print for example, let's go back to the previous slide. Does it make sense to say select s name s hostel average cp. What name should it print? It has grouped all the people for a single hostel. There are 300 names. Select name means what? It means nothing. So in a group by whatever you are grouping by is or are the only attributes which can appear in the select state. They only can be printed out. And the aggregate functions of course are expected to operate upon the selected group rows. So every group will result in whatever value you say about the aggregate function. You agree that this is a powerful facility for information processing? Fine. So this is the explanation for how the group by clause works. Internally of course it will sort. Then it will like in a file it will read all the records for one group. Keep calculating the average. Means it will keep calculating the sum and numbers. Divide that sum by number. Keep the average and throw it out. Then it will continue etc. The set operations are very peculiar to SQL. The set operations are not covered in relational algebra. As I mentioned relational algebra operates procedural. So select something on this predicate means select certain rows. It gives you a set. It gives you a relation. But the operation is implied row by row. Look up a row, select or reject. Projection means take a row, project only this. But if you want to do set operations like union. You are familiar with union of sets. Intersection of sets. All the set operations because tuples are sets actually. The set operations are permitted. Set operations typically come from an mathematical foundation called tuple calculus or relational calculus. We have not discussed that. But we will discuss only the SQL implementation and what is the meaning of it. For example if you want to find all students who have a greater CPI than CPI of some student in hostel 8. What would that mean actually? Everybody who has CPI greater than some student in hostel 8. Now obviously if your CPI is greater than the topper of hostel 8. It will also be greater than the bottom of hostel 8. In fact anybody who has the CPI greater than the bottom of hostel 8. Qualifies as per this query. But you do not want to say bottom of the topper. You want to say there exists some student in hostel 8. Where some CPI which is lower than mine. Then I am entitled to come out. Now this selection is a very funny selection. You can actually once you interpret it correctly in terms of the minimum. You can actually use the minimum function as sub query etc. But SQL permits you to write this query. Select S name from student where SCPI is greater than some. Select SCPI from student where S hostel equal to 8. Notice that this select statement will select what? Select SCPI from all the hostel 8 students. Whatever be their CPI they will all be listed. So this is not a single value. It is a set. And you are saying that if this CPI is greater than some in the set. One more whatever. Then that row qualifies. This is an artificial example. But I hope you understand the set comparison notion here. Here are some logical explorations for this. For example suppose I say 5 less than some set. What you see in this is a set. Presumably this set has come out of the inner query. So in this case this is the inner query. As I said this inner query will give you a set of numbers here. Assume that we are talking about some integer numbers coming from a similar inner query. And that is being compared with something on the left hand side with the sum clause. We are trying to see what is the implication. So if I have something like a value 5 on the left hand side. 5 less than some of these. Is true because 5 is indeed less than this 6. Although 5 is greater than 0. 5 is equal to 5. There exists one row for which 5 less than 6 is true. So sum this means true. Consider this. 5 less than some. And suppose this set is only 0 and 5. Then this comparison is not true. Because one value is 0. Another value is 5. There exists no value which is larger than 5. So this comparison doesn't work. 5 equal to some. This is true. 5 not equal to some is also true. Because 0 is not equal to 5. For some this condition applies. That means it is true. Is this clear? This is a set operation you can contemplate later. It's a very powerful mechanism of extracting something. Equal to some is same as in actually. You saw the in predicate earlier. A row number in this. But that was in means equal to. So if you have not less than or greater than. But if you are equal to some as the predicate. Equal to some of this is same as in. However not some is not same as not in. Contemplate on it. I will leave it to you to discuss why it is not so. There is an all clause. This is another set operation. Find all students who have CPI greater than CPI. All students in Australia. Not some. Again if that was true. If we found out somebody who has a greater CPI than the bottom performer in Australia. This time we are saying one final all those who have CPI greater than the top of. Because anybody who has a CPI greater than all students. Must have a CPI greater than the top of Australia. Again this is a query which can be answered using a inner query with the max kind of thing. Which will return a single value and you can have a simple greater than compare. But to show the demonstrate use of all. You can say select a name from student where is CPI greater than all. Select a CPI from student where is hostile equal to it. Select a CPI from student where is hostile equal to it. You can see what it does. The inner query will find out a CPI from student of all Australia. That is a set. And if this CPI which is under consideration in the outer query is greater than all of them. Effectively saying is greater than the max of theirs. That particular thing will get selected. So this is an example of set operation. Unfortunately the examples that I have chosen to demonstrate some and all may appear artificial. Because they can be easily implemented by using something else. But I hope you understand the purpose of the set operation. Here is the definition of all clause. Five less than all of these is obviously false. Because five is less than six but not less than zero and five. Five less than all of this is true. Because six and ten individually are greater than. Five equal to all of these is false. Because it is not equal to this four. Five not equal to all of these is true. Because individually it is not. Again a comparison with in and not in. So not all is same as not in. But equal to all is not same as in. I will leave it for you to contemplate on the correctness or the interpretation of this. All these by the way are explained in the textbook if you ever get to read Sudarshan's book or Kaurth Silver Shard Sudarshan book. We have already seen in predicate. The in predicate need not be used only in the context of a inner query or sub query. It can be used explicitly. For example select student.snm from student natural join rage where rage.ccode is 64 and rage.gray in aabbb. Can you can you think of what it would mean? If you were to write using the conventional predicate how would you write that? Can anybody pick up a microphone and answer? Without using the in predicate and without using a inner query or sub query. If you were to write that how would you write that? I thought it was very simple. We have seen sub queries in the predicates. Find names of students whose CPI is less than the average CPI of the students of hostel 8. So inner query select average scpi from student where s hostel equal to 8. This will give you the average CPI. The next one says select s them from student where scpi is less than this value. So everybody whose CPI is less than the average of hostel 8 will come out including those students of hostel 8 who have a CPI less than their own average. Suppose I change the query. Find names of students whose from find names of students of all other hostels other than hostel 8 whose CPI is less than the average CPI of the students of hostel 8. You understood the query? Hostel 8 has an average CPI. Now I want to compare the students of all other hostels other than hostel 8 whose CPI is less than the average of hostel 8 CPI. What query would you write? Let's get back to the paper. This was the query which we wrote earlier. Now how do you say that get me students from all hostels other than hostel 8? Please remember that this inner query is as if there is a single value for this predicate and the outer query has two predicates. One where scpi is less than something. The other where s hostel is not it. So the outer query will run selecting only those students which are not in hostel 8 and the inner query will give you a value which will be compared with this scpi. So you can see it's simple. That means you can actually formulate some very interesting kind of complex queries rather simply if you apply the predicates properly and use inner queries appropriate. This will require some experience and expertise which we shall develop through some tutorial problems in the subsequent session. But you might want to think of practicing on your own thinking that there is some schema raise some arbitrary questions to each other and see how you can write an SQL query. It will be an interesting example. What does this query do? Select sname, scpi from student natural join reg where grade is equal to a, course grade is greater than it and scpi is less than the average scpi of students of hostel 8 and order them by scpi in descending order. So there is a lot more to SQL even in terms of querying by this. I am covering only a small segment of that which is important and relevant but don't ever think that you have learnt all about SQL. What you have learnt is yes, you can write basic queries for any database table or tables correctly and expect to get results. In real life there are many other operations which are possible but you can actually do a variety of things if you read standard SQL because SQL is an international standard. There is nothing like Oracle SQL or Postgres SQL or Ingress SQL or Informix SQL or MySQL SQL. SQL is international standard. Every product might have some additional variants added to the SQL syntax but as long as you stick to international SQL 99 that query is guaranteed to run with identical semantics on any database. That is one of the reasons why a specific database product might be used to develop an application but it is not obligatory for you to continue to use that database product. You can shift to any other database with a guarantee that all your SQL queries will run exactly as they were running. Unfortunately in an information system there is much more than just SQL otherwise the whole information system could become portable very easily a clock that it is not so we shall see later but at least as far as the SQL queries are concerned because the international standard this thing help. Is that okay? So we now move over from query language to a thing called data definition language. You have seen the data definition language in terms of schema definition. We will now see whether constraints and assertions what these are and how they are prescribed and what is the advantage of prescribing constraints rather than using some procedural programming mechanism to check for those constraints. So what is a constraint? Constraint is nothing but a condition which must be satisfied at all times by data in the table. These are some of the constraints are integrity constraints. Integrity constraint is about data integrity. Remember what we said right in the beginning garbage in equal to garbage out. You have to prevent garbage from going in. Garbage can go in if proper checks are not made. In conventional programming you read the data and you put lot of if condition that is called validation and then only if the data is correct you let it go but if some programmer forgets to put those if conditions wrong data may go in. You want to ensure that the wrong data does not go in. We are looking at how SQL answers such question. To ensure the integrity of data we recognize two types of integrity. One is called the basic data integrity which means basically the value itself must be correct. It must pertain to the domain. In a numerical field you should not be able to insert a character value. The second are called referential integrity. This is slightly more complex because it involves correlation between two tables. We shall see both of them and see how exactly SQL implements this constraint. Next slide we look at the examples of data integrity. This data integrity constraint is to ensure that data in the attribute let us say hostel, CPI and grade is always within the following prescribed value limits or domains. You will agree that for IIT Bombay these are the domains that is hostel should be between 1 and 13. CPI should be less than 10. Grade should be in A, A, A, B, B, B, C, C, C, D, D, D, E, E, F, R. I have written from my memory. If there is some grade missing you may add. Every year we seem to be adding some more types of grades so I don't know but I think let's assume that they are exhaustive. What would this mean? This would mean that if grade is an attribute then it cannot have a value ZZ or it cannot have a value PQ or it cannot have a value A5. It has to have any one of these values only. Imagine that a clerk is inputting the grade and by mistake not by desire the person types AC instead of AB. Is AC a valid grade? No. So what should happen? AC should not go into my database even though the person has made a mistake. Who will prevent this? Either you can write a program which captures the input from the clerk and say oh sorry AC is not valid. But if while writing the program you have forgotten to check this the fellow AC will go in because you have declared the grade to be a two character variable. I mean two character attribute. So you want to ensure that such conditions are stipulated as part of the definition of database itself not left to any programmer's choice and you want the SQL engine to automatically do something about it. That is what the notion of data integrity is. Let's look at referential integrity. What kind of things we want to ensure? This is an example of some sample records of a table called student and a table called range. Okay. What is a table called student? This contains roll number, name, some other things. What is a table called range? This contains the registration data. Roll number, course code, maybe grade etc. which we discussed earlier. Now although these two tables are independent you will all agree that they are in a very great sense they are related. After all I hope to find the records of registration of only those students who are there in the student five. I don't want any Tom Dick and Harry from outside to come and register for CS64. Only those students who have a valid roll number here are supposed to register. Since these are two independent tables if I try to insert something in range then somebody must check whether the fellow exists here or not. This is a fundamental referential integrity because this table implicitly refers to this table. A roll number inside this table cannot not exist in the other table. Every roll number here must be there somewhere in this. You agree? So in that sense this S-roll in the range table refers to this S-roll in the student table. How do you state that? How do you test that even in a programming thing? It's not very easy to say. Here is an example of that. This is first of all the meaning. Group of columns that refer to some other table are called foreign key. Let's go back to the previous thing. This S-roll here is a column. This is not group of columns, just one column. This column refers to a column S-roll in the other table. So it says that group of column which refers to some other table is called a foreign key. This is a keyword. S-roll is a foreign key because S-roll refers to S-roll here. It is called a foreign key which refers to the primary key or to a candidate key in other table. This is the requirement. What does it mean? The S-roll in the first table goes to the previous slide. The S-roll here is the primary key of this table. That is why people often call this a master table because the main data is here and the subsidiary data of registration related to each student is here. In this case this S-roll is not the primary key. What is the primary key of this table? S-roll plus C-core combined is the primary. S-roll is not the primary. But this S-roll refers to the primary key of another table. So this S-roll will be called foreign key. And this foreign key refers to the primary key, this S-roll, in this case. It could refer to a primary key or to a candidate key. That is the definition of the constraint. Now if such a reference exists, we say that I must ensure referential integrity. That means I must ensure that in the second table only those roll numbers come which have a valid reference to an existing primary key in the other table. That is the referential integrity. The referential integrity can be compromised under what conditions? It can be compromised only when you insert, delete or update. Because there is no other way. If you make queries, you can't change referential integrity. If it is bad, it is bad. It is good. But when will the referential integrity be violated? It may be violated because I have inserted, deleted or updated some table somewhere. I want to figure out which table or which out of the two tables which referential integrity is affected because of which insertion, which update and which deletion. Right? So here are some examples. Consider this. This is the registration record. 8, 9, 0, 0, 5, 0, 1, 2. Sunita Sarabha is here. 8, 9, 0, 0, 7, 0, 1, 7. Keshav Nori is here. 8, 9, 0, 0, 4, 0, 9, 3. Muthu Krishnan is there. But I suddenly find 8, 9, 0, 0, 5, 0, 1, 3. HS412. HS412 may be valid. Incidentally some of you can correctly suspect that if this is a foreign key for this S-roll, then a C code itself must be a foreign key to C code of the subjective. That is correct. However, what does this mean? This is here. But there is no corresponding record. In plain English term, it means that some goose petia has come from outside and tried to insert. The correct interpretation is while data entry. Somebody makes a mistake and instead of let's say 8, 9, 0, 0, 5, 0, 1, 2, by mistake 1, 5, 3. But this row has no business existing here because corresponding to this row, there is no entry in the student. Therefore, existence of this row violates the differential integrity between these two tables. Consequently, if anybody is trying to insert this row here, it must not be permitted. Alternately, if there indeed exists a student 8, 9, 0, 0, 5, 0, 1, 3, then that student's record must first be inserted in the student table and then only the registration record can be inserted. Isn't it sensible? It's like institutes saying first pay the fees and then register. Something like that. First get admission and then register. Something like that. Okay? So you ensure that. So you are very clear on one case in which differential integrity could be violated and that is when an insertion is attempted to make in a table which has a foreign key but the corresponding table does not have the corresponding entry, then there is a violation. Any other kind of violation that you can think of? Here is another example. We take the same example in the next slide but we remove that offending entry. So there is no foreign key. You go back to the previous slide. This fellow is not there. So all the remaining data is consistent in terms of differential integrity. Now imagine that for some reason Muthukrishnan leaves the institute or we throw him away because poor CPI 4 or something like that. I should not say this because Muthukrishnan was one of the brightest IIT students, a topper and he just retired as the deputy director of IIT Madras. So don't tell him he will be mad at me. But let's say he leaves the institute. Let's say he passes out with a gold medal or that is not a correct example. Suppose he leaves the institute in between, in between a semester. The semester is going on like this time February. For some reason he decides to go to Chennai. He leaves the institute. When he leaves the institute what will happen? His data from the student table will be deleted. Now if you delete this data from this table, suddenly there is a violation of differential integrity because the poor fellow has registration with HS412 etc. Now this row becomes lavaries. There is no master for this row because that master has gone. So you understand that not only insertion into a dependent table could cause violation of differential integrity because there is no master record. Deletion of a master record for which there are rows in the other dependent table can also cause the correct way to solve this problem is first if Puthukrishnan is leaving, first delete all his registration records and then only then delete this. You agree? So that there will be no differential. So insertion here and deletion here could cause a problem. Insertion into student table can it ever cause a differential integrity problem? No because when I insert in the student table, first time that student is appearing here. There is no record for that student here which is okay. There are several students who for some typical reason may not register in a semester. I will. So this year I have taken a drop. I still exist as a student but I am not registering for any course here. This is perfectly fine. Is there a problem if I delete any number of rows from here? No. I exist already because differential integrity is valid. Maybe I am changing some of my registration. That is perfectly fine. So insertion here and deletion in student could cause a problem but insertion in student and deletion in range will not cause a problem. What about update? I have said similar issue. In this particular case update is unlikely to happen because I will be changing the roll number. Consider this case. 89007017 comes and tells me I have not registered for CS413. Okay. Somebody else comes 89007019 who says I have registered for CS413. You are not put in there. I start updating this. What I will do? I want to type 89007019. That is the new roll number I said. By mistake I type 18. If it so happens that 18 does not exist here or even 19 does not exist there then this update will cause differential integrity to be valid. In a nutshell, differential integrity is nothing but saying that if there are rows in one table in which some of the field refers to a primary key of some other table then there is a differential integrity. Please ensure that only those rows with the primary key in the main table have permission to exist in the other table, not other one. Whatever violates it violates it. Okay. How do you enforce constraints? How do you ensure that this happens? How do you ensure that go back to this slide? Insertion, deletion of whatever kind will never violate that constraint. In conventional programming what will you do? The program that you write which inserts inside any database table a new row or new values. We have not seen how it happens. Or delete some rows or columns. You would put a lot of if statement. If this fellow is not here don't permit him to insert here. If you are deleting a row in the master if that fellow is here show you can't delete that row. But what you are doing is in one table and what is happening in another table. So the algorithm will be complex. In any case implementing that by application program as I said has a danger because there will be 10 application programs which are doing these things. And one of the programmers may forget to cross check these things and one of the application program then we would have a problem. Consequently in 1989 SQL International Body said that we cannot leave the enforcement of constraints to individual programmers. Till 1989 the constraints had to be implemented by writing programming code typically in non-SQL way to cross check these. So these were called embedded C programs, PL SQL programs, whatever way, triggers, whatever. There are different ways of checking for this. Instead SQL said the DBMS schema definition itself describes such referential constraint and the SQL engine takes care of it. Now it doesn't matter what program you write. If in the schema definition you have said this is my referential integrity constraint no matter who tries to insert delete what the SQL engine guarantees by looking at the referential integrity constraint that wrong insertion regions will not be permitted. You will get an error. Don't you think that's a beautiful way of doing it? In fact that is more important because if you build an application system let's say where such checks are done by programs then while validating it is very difficult to ensure they go each line by line saying yes every program has this check. But on the other hand if you are guaranteed that SQL engine does something then by looking at the definition of constraints you can know whether things will work or not. What we are going to see is how do we define such constraints in SQL. The important point to note is that while a programmer may forget to write necessary code in the program to check the constraint and take appropriate action by describing the enforcement as a part of DBMS schema definition a SQL engine will ensure that such a violation will not work. But such notion does not exist in any conventional programming line. This is purely an information system issue which is tackled by SQL. Let's see how you define these constraints. Is this clear? The standard way is to write application programs to enforce constraint that is very dangerous because this may happen, programmer may forget and remember it's not one program which is existing in the development. Hundreds of programs are there. Even if one programmer forgets you will have junk inside the database. You don't want that. How does DBMS schema definition permit us to define these constraints and what exactly happens when the constraints are defined? That's what we shall see. We go back to the first constraint. What is that? Data integrity constraint. We are not talking of differential constraint now. We are talking of data integrity. We want to ensure that wrong data does not enter the database. A constraint can simply be written as not null. This constraint can be defined inside of a table definition itself. Create table student, s-roll, care-eight, s-name, healthcare-thirty, not null. What does not null mean? The name cannot be null. It's a common sense thing because no student will come to my institution with a null name that is guaranteed. And therefore I am making it mandatory that while inserting the data of a student, not null name must be inserted. Of course a data entry operator may cheat by entering stasta, stasta. Stasta, stasta is very distinct. The machine may be fooled. Of course somebody will ask him a question. But this is to ensure that by mistake you don't insert a null value. So this is a simplistic constraint. Not null. Okay. And this is another constraint. So writing something, something, healthcare-thirty, not null is actually a schema definition of a constraint for the attribute s-name. Unique. Select table student, s-roll, care-eight, unique. Unique is a scale constraint to say that these values will be unique. Remember primary keys are always required to be unique. But candidate keys are unique but a scale doesn't know it. We know candidate keys. If we want to define to a scale engine that look this plus, this plus, this combination is a candidate key. Then I should be able to define it. Here you agree that s-roll is unique. But what if we go back to our thing that hostel and room number combination is unique? How will I use the word unique? Because I cannot say sh is unique. I cannot say s-room is unique. But the combination hostel and room is unique. We shall see that in a moment but that can be defined. Here is another way of specifying. Create table student s-roll, s-name, something, something. You define all the columns. And then you start defining data integrity constraint. You say unique s-roll. That is valid. Here is another check condition. The check constraint which is actually a data integrity constraint is a way to say that certain columns shall have a value only permitted by a query or a predicate. So we do not say select but we put a predicate there. This is called a check constraint. For example, in the faculty table I have a salary numeric 9,2. And we want to ensure a simple constraint that no faculty member can earn more than the president of India. President of India as you know recently, poor fellow salary has been increased to 1 lakh rupees. So the president of India, the honorable president of India is 1 lakh. No faculty member should get more than that is the rule. Now when I say check f salary less than 1 lakh. What this means? No matter what value you put in, if the value is greater than 1 lakh, the scale engine will shout at you. Sorry, the check constraint is violated. The way it is written is f salary numeric 9,2. Check f salary less than this. Please note that this comparison f salary less than 1 lakh will be performed every time. The value of f salary in any row of your table is either insulted or updated every time. And this is the responsibility that the square engine takes. You don't have to write any code for that. Do you understand how powerful this system is? What happens if insert or update operation violates the constraint? So what happens if let's say I make a mistake, go back to previous salary. f salary should be less than 1 lakh. And let's say I have a friend in the administration whom I said you should make 2 lakhs. So he writes db fatigue, salary 2 lakhs. What happens? Of course I said the scale engine will not permit it. But what it will say? It has so many constraints. And I might have 200 constraints 10 per table. So out of 200 constraints some constraint is violated. The scale engine can simply say some constraint is violated. Actually that is what it says. In fact it says in a very cryptic fashion. Ideally what I would like to know is out of the 200 constraints which constraint has been violated. So a square permits me to identify a constraint by a name just like a variable name. If I do that then a scale engine uses that name to tell me this constraint is violated. If I don't do that then I will get some funny message. Here is what happens. If an insert or update operation violates some constraint a scale engine will give an error message. It may look like constraint c underscore 156a29xx38p violated. What is this stupid number? This is some internal number that a scale engine has allocated to that constraint when it translated that constraint into its own internal representation of a scale. It has no other way of knowing this. It executes, it checks, the moment check fails it will refer to its number of this number. It says violated. Now you can spend rest of your life figuring out what this means. So a scale says this is troublesome because this error message does not tell us. So a scale permits us to write a constraint name and it helps us in debugging if constraint violation occurs. Consequently we as professional information system people shall never ever write a constraint without giving it a constraint name which is appropriate. Unfortunately even in the field IT professionals who come from old times where constraint definitions were not part of a scale engine and they used to write these constraints as procedures or something you still find code where there is no constraint. Now they have learnt to write constraints but they will not put a constraint name. You have to avoid that if you ever write it. Since you will generally master mind information systems and expect some professional programmers to write queries all that you need to do is look at their schema and find out if there is no constraint name. No matter whether it is done by TCS or Infosys or Wipro or IBM you can tell them boss this is not as per our standard of a scale. You go back write constraint names and come back. See the advantage of constraint names. Create table range. Grade, Cal to. Constraint. Grade underscore value underscore violation. Check. Grade in. A, A, B, B, B, etc. You see we have just seen this predicate but by putting it in the schema definition where I am defining the attribute grade. Now I am saying this is a check constraint. Anytime a grade value is inserted or updated it must be one of these two but I am also saying grade value violation this full thing is the name of this constraint. So the word constraint followed by any arbitrarily chosen name by you and you will not choose it arbitrarily you will choose it meaningfully. Consequently if somebody tries to put in a grade A, D or P, Q at that time the error that will come on the screen will not say salary greater or will not say P, Q, S, Z, Y, 2, 2. It will say grade value violation occurred. Can you not see that this is a very meaningful thing and therefore you can immediately correct the situation. This is the importance of these simple things and therefore help in actually running an operational information system. You can represent chosen candidate key as a primary key. There is a constraint called primary key. You can simply say S-roll integer primary key. When you say primary key it implies uniqueness. The word primary key implies uniqueness constraint. So it must be key. That means if you have inserted one row and if you try to insert the same row number in that table again it will say this constraint is violated. But if you have defined it like this again you will get some funny message X, X, P, Q, 9, 4, 3 violated. So you have to give even this thing a name. Here is a way of doing that. S-name character running 30, S-hostile integer, S-room integer. Remember I said that in a hypothetically beautiful situation where every student gets a single room the S-hostile and S-room can be unique characterized. So here is a constraint. Constraint student underscore PK primary key S-hostile comma S-room. This is not a check constraint. This is a primary key constraint. So remember the format the word constraint followed by any chosen name followed by the constraint itself. If it is check constraint you will like check something. It is a primary key constraint you will say primary key. And then define what it is. So here you are defining primary key as combination of S-hostile comma S-room. So what we are discussing here is what SQL permits as a part of schema definition and this is what it permits. So here is one case. Constraint student underscore PK primary key S-hostile comma S-room. What happens if the same combination is repeated again in any row? The SQL will say student underscore PK violated. Of course there is no violation of a student but that is the message. And in fact I think it is this what you had in mind that just this message to a innate operator may not mean much but to a programmer who is reported this is something interesting since he has started this chain of discussion let me describe something useful. Some information system has been developed that you are request for your organization. Your organization works from thousand branches in the country let's say insurance company a bank or something. Some clerk is sitting in one of the branches in Chharsukuda in Coimbatore somewhere we don't know and he gets this error. He cannot understand as he correctly points out not every time he can understand this violated means what. In all such situations where there are distributed users or staff members working on an application system there is usually a help desk comprising of four or five people who are sitting 24 hours if the application is critical. Their job is if anybody from any place calls saying I have a problem they are supposed to solve that problem. So the end user may not understand this but when he says I have a problem my grade is not going in or my hostel number is not going in. If you are sitting as a information system specialist or particularly as a programmer at the help desk you get a call you just ask him what is the message on the screen that you see and the moment he says something like student underscore pk is violated. You should then be able to make out this is that constraint from the list of constraints I have this is what has happened and then I will instruct you are not supposed to put this comment. This is the purpose in the larger context. Foreign key example. You remember we looked at some additional table definitions create table department dno integer primary key. You remember I gave a hypothetical case where all departments are in a single building each department is numbered and each department has a location each department has a name and faculty members are allocated to different departments. So let's say create table department dno integer primary key d number d location etc. d name whatever work. Consider another table create table faculty fno integer primary key and in that dno integer. So there is a faculty member each faculty member has a unique faculty number let's say there are 400 faculty members in IIT. So there are 400 entries here. These faculty members are allocated to aerotical civil whatever whatever whatever whatever. So against each of them there will be a unique dno. But these dno is not independent of the other table because he must be allocated a department number which is existing in that main table. So consequently you will agree that dno is a foreign key. Look at the way it is defined constraint faculty underscore fk foreign key dno this is referring to this dno. That is in this table I have defined dno which is a foreign key. References department in bracket dno. That means this references the dno attribute of department which is the other table clearly connecting the two. Here again the name faculty underscore fk is a crude approximation of saying that fk which is foreign key is being affected in the table called faculty. Since it is unlikely that a table has 20 foreign keys it will have one or two foreign keys a fellow should be able to figure out that this constraint is violated. So is this clear how you write the constraint constraint name is faculty fk. The constraint itself is foreign key that is a constraint just like check constraint primary key constraint foreign key. You write in bracket dno and you write references department dno. Incidentally if the name of the attribute in both the table containing the foreign key and the primary table is same you do not have to write this dno. Just say foreign key dno references department that is good enough if the department has dno as a primary key because that the sql can figure out from its own table definition. And if there is any violation during update insert delete what we discussed this violation will come up. So it will say faculty underscore fk is violated. This pk fk with underscore is typically is not a sql standard because you are free you can choose any name but typically this is a convention which many programmers follow because pk means primary key fk means foreign key and the name of the table so it is easy to understand all programmers and all designers have completely independent choice to decide what they want to do. What happens if the referential integrity constraint is violated on violation normal behavior is an error message but sql can update other tables if so prescribed this is the more powerful part of sql the update can be set default create table faculty dno integer default is zero default zero means the value is default by the way this is a prescription for department number there is nothing to with constraint with every attribute you can prescribe a default value that means if a value is not given at the time of insertion of a row the default value will be put in. So there is nothing to with constraint now look at constraint references department on delete set default you see constraint is becoming more interesting within the constraint definition you are also suggesting an action in case this constraint is violated on deletion set the value to default when will this constraint be violated please note dno is a foreign key for faculty table it is a primary key for the department table the problem will occur only when department is deleted and I am still assigned to that department I mean imagine tomorrow this is the assignment department number 1 2 3 4 5 means a row this that that imagine computer size means 7 I am a faculty member suddenly the issue decides to close down department of computer size so they will delete computer size so me and several of my colleagues will now become lavarese rows in that table because there is no there is no master now that is violation now the department must have authority to close the department without getting rid of me so maybe a decision is taken that in such cases that the institute closes down the department all faculty members allocated to that department will be temporarily allocated to a dummy department called 0 which is the dummy value in fact if a faculty member comes and I have not decided where to allocate that faculty member I will initially put that faculty member in 0 that is why I have a default value 0 now look at the power of this statement all delete default means that if there is a delete of a particular department number in the main table then all rows in this dependent table which have that value as the value replace that value and reset it to default you see how powerful it is you will also say something more set it to null on delete set null and more importantly cascade update cascade or delete cascade now this is very interesting meanwhile figure out what is delete cascade and update cascade delete cascade is very powerful delete cascade means if department number 7 is deleted then all rows having 7 in the faculty will also be deleted so basically if the director closes down the computer science department he is effectively saying fire all the faculty members from the department of computer science also this is not meaningful in case of faculty hopefully but this is meaningful in case a student is leaving the institute if a student is deleted all the registration records should be deleted so if I say delete cascade you can see beautiful effect the registration records have no meaning now they should be automatically deleted why should I have to write program for it it is a very powerful feature to say not only I define a constraint if a constraint is violated I can describe a logical action to be taken but whether the logical action is correct in terms of the functional requirement of system you have to decide for student to delete registration record perfectly fine for faculty to be fired when the department is closed at least from my perspective is not fine and hopefully from the institute perspective is not fine so for every such constraint you have to understand what is the requirement functionally and do things according there are many such features for describing the referential integrity constraints and violation what action to take I have described some in subsequent slides update cascade would mean something else if in the main table the number of the department is changed remember that corresponds to changing the primary key so basically it will mean delete that row and insert a new row but update implying the change in that number then ordinarily it would mean an administrative decision just like a roll number is changed of a student who changes the branch now obviously all the registration records of that student should still continue for the present semester so update cascade will mean that if that roll number changes then all the rows in the dependent table which has roll number should change that roll number to one which was updated there common sense so that that condition is available assertion is a more complex thing restrictions involving multiple tables this is more natural to express two table constraints separately see all the constraints that we have discussed refer to one table if something happens to this table or in referential integrity something happens to a related table what is to be done in this table is what you are described but suppose you have a constraint which must be satisfied by combination of values some value from one table some value from other table you can describe that constraint either in this table or in that table you have to describe it separately such constraints are called assertions assertion is a mathematical term which says the assertion must be true that is the meaning of assertion the word assertion comes from that and here is an example of assertion consider that in IIT for hypothetically the faculty student ratio should be always 1 is to 4 or beta it is ok 1 is to 9 you can put any number but suppose such constraint is to be implemented how will you describe such constraint and what is the implication the implication is if that constraint is to be maintained suppose you have a certain number of faculty members then you have to watch that the number of students who are admitted are not more than what the ratio permit of course it also implies that if a faculty member leaves the institute 4 students have to be thrown out if the ratio is very precarious these implications are funny but how do you state this obviously not such all implications can be stated in terms of actual action because a scale engine will blindly take those actions and when an assertion is violated it is more like a policy being violated where you still want to permit that action but want to be won so that you take corrective measure you don't want to file students because faculty leaves if a faculty member joins you don't want to in the middle of the semester admit new students you don't want to stop admitting student half way through just because the ratio is fulfilled so this is not something where you want database to take any corrective action but you want that violation to be reported so these assertions create assertion faculty student ratio this is the name of the assertion just like the name of a constraint check select 4 star count star from faculty greater than equal to select count star from students notice that now I am comparing the results of two SQL queries each one of which is guaranteed to give me a single number this gives me 4 times the number of faculty and it will select dynamically from whatever the faculty this gives me the total number of students it will again select dynamically at any point in time ok what does dynamic mean actually it has to recheck this constraint every time there is an insertion or deletion in either student table or faculty table and each of such occasions it should check and it should satisfy that this number is greater than or equal to this number if it is not it should suspend that action temporarily and inform me that this violation has occurred of course the corrective action as he pointed out that some corrective action will have to be taken here in this case just the name of assertion will not permit us to identify what is the corrective action because of policy issue so in policy issue the back end program are on the helpless will temporarily suspend this assertion and will say go ahead and do it but will generate an email or something to the director saying please watch out there are more students than what faculty we have or please remember there is a provision for admitting more students so if you are starting a new branch in MTech or some new program you have this leaving something of that sort the more practical example is that of the inventory value if you want to check that the inventory value is less than 10 crores and if you have 3 different stores in which different items are stored with different values then dynamically you can check if some additional purchase order has been placed and if the inventory value is increasing beyond 10 crores you have to indicate to your CIO or CMD or MD or chairman saying that look the inventory is going more than 10 crores please take some action so this is the way you can prescribe assertion this is by the way an important thing table concerns are required to be true if and only if there is data in the table otherwise the chicken and egg story consider you are describing a schema definition at that time there is no data you have decided the schema you have designed the schema and you are executing SQL statement create table student this create table this etc at that time is there any data no only after tables are created you will have insert delete update which we shall see later but the SQL engine will immediately say this constraint now imagine you are starting to input data okay if there is no data the constraint should have no meaning there is no violation there is a simple precaution that SQL states in a standard way that assertions or constraints have no meaning otherwise