 Now, let us switch over to SQL. This is a summary of what all we are going to cover. We are not going to finish SQL today. We are going to do it over 3 days and it is over 3 chapters in the book. So, this is the first chapter on SQL which is an introduction, covers the basic operations. Then there is another chapter which we call intermediate SQL which covers some more operations. There is no particular meaning to how we have broken it up just from arbitrary point we broke it. And finally, there is something we call advanced SQL which is actually not you know stuff which you would write as part of a basic course, but there are two parts to it actually. There is one part which is accessing SQL from application programs and then there is a second part which is advanced topics in SQL which many databases support which have been designed for decision support and online analytical processing. We are not going to have time in the initial part of the course to cover that, but I will mention it later on. So, again a quick history SQL started off with being called structured English query language of SQL and then people drop the English part and call it SQL. This came from IBM was part of the system R project at IBM San Jose lab which was a tremendously successful project in the sense that it laid all the foundations for modern relational databases. Cod came up with the theory these people built an actual system based on that theory and both were very very important. Now, if you go out in the industry many people still say SQL because that is what it was called initially and instead of SQL they say SQL the two mean exactly the same thing today. There are many versions of SQL started off as small language and grew and grew thousands of pages of manuals it is impossible to read all of it and not even going to ask you to try. So, we are going to look at a subset which is most useful and then depending on your needs you can get into specific parts of SQL. In fact, the language itself has been broken up this way. There is a core language which is smaller and then there are extensions for specific areas. Now, commercial systems offer most if not all aspects of SQL 92 it is a now 21 year old standard. The core of that is something which most database is support and then there are many extensions which were defined later and whether a particular database supports it or not depends. So, one thing I should point out is that our queries are written using standard SQL but every database has some slight variations from standard SQL. So, if you take a query written here and execute it exactly as shown on your database it may or may not work. It turns out PostgreSQL is nice in the sense it is pretty much standard. All the queries shown here will pretty much work as is on PostgreSQL. Oracle is not. So, some of the queries which you show here require small syntactic changes to work on Oracle. Now, as part of the resources we have tips on how to make queries run on Oracle. So, if you are using Oracle you can use those. Now, let us start with the data definition part of SQL. So, the data definition language let us specify schema which what attributes a relation has, what are the domain or types of each attribute, what are the integrity constraints. And SQL can also control some of the physical aspects such as what indices to create, who should have access to a particular relation, how should the relation be stored on disk. But this is not part of standard SQL. So, we would not really spend much time on this part. Each database does its own thing. Now, let us start with the type system in SQL. Any real language needs to deal with types. In relation algebra we ignore type. In the real world we have to deal with it. So, SQL has several basic types. There are many, many more I have just shown a few. There is care and then there is care over here and then there is where care. Where care is simply a variable length string. Now, many people get into trouble by mixing care and where care. And how, what happens if you compare a care with a where care, what is the difference between care and where care. So, I want to spend just a minute. In some database like Postgres it really does not matter. In some others it does and for the following reason care is fixed length. So, when I have strings which are actually exactly the same length it is not a problem. But supposing I have care 10 and then I store a string in there which has only 4 characters. What happens to the other 6 positions? They are going to be filled with blanks. Now, where care is variable length, which means not only if I store a string of length 4 in a where care 10 type, 4 characters are stored and the system records that only 4 characters are present. The other characters are not even present. It is not that they are blank, they are absent. Now, what happens if I have 2 things 1 care 10, 1 where care 10 and a store let us say 4 characters, Rama and both of them. Are they equal? Are they not? Turns out it is database dependent on oracle they are not because Rama in where in care 10 is really Rama followed by 6 spaces. Whereas Rama in where care 10 is just 4 characters and oracle will tell you they are not equal. If you really want to check if they are equal you have to do other stuff like trim and so forth which is okay. So, we have then the comparison is a problem. So, do not mix these 2 types of my recommendation is stick to where care if you are using oracle on some other database it does not matter they automatically take care of removing blanks when comparing. Then we have in small int which you may find useful rarely numeric which is interesting because this is important for commercial purposes. This brings up an incident which happened in IIT where we had earlier on CPI is the credit performed the grade average calculated using calculators. Then we built there was a Fox pro program for it which used a numeric type. Then we switched to Java JDBC and in Java we added up the credits and took the average in Java. So, we did this calculation in Java and in Java we used a floating point number for this and everything seemed to work fine. Then of course you store it back it is rounded off the CPI is shown to 2 decimal places after rounding off. So, all seemed well but then we had some very smart people in our academic office who actually compared the output of this with the old system and with software did using a calculator and said your system is wrong it is giving the wrong results. So, how can it be we looked at the program it is a very very simple program everything seemed right and it is a how is it possible. Then they gave us an example we said how is this possible we run it through and indeed we got a different result in Java and using a calculator. It turns out that there was some rounding error which cascaded up to the second decimal point in some rare case which somehow the academic office found and we tried changing floating point to double no use. So, finally, we used a begin package in Java which is equivalent of numeric it was a lot of work to work around but it is important for commercial purposes that you know even a few pisar mistake will result in accounts not matching and therefore, numeric type is important in SQL. Now, the create table construct. So, the create table construct which we saw an example of earlier is shown again here this time we have we will see it in the light of care and where care we have made ID care 5 name where care 20 why did we choose this because we decided in this university all ID should be 5 characters that is actually not realistic 5 is too small and is it even always fixed length well it is not in IIT it was at 1 point 8 characters for a long time now it has become 9 do not ask me why. So, where care might be more appropriate here also for many enterprises then there is department name salary numeric. So, we are insisting that nobody can get more than 6 digits I will be not counting the decimal plate 8 comma 2 means 2 after the decimal and 6 before this kind of restriction we can come back to haunt you in IIT Bombay we had limit of some number of characters and somebody had a very large consultancy project and over flowed that amount and the program had to be rewritten and the database schema had to be changed to accommodate that. Then we can insert things into a table using these constructs insert into instructor values note that character types are enclosed in single quotes while a numeric type should not be put in single quotes it is a number. So, it should be without quotes. So, we inserted 2 different values here well if you see both of them have the same IIT value one has a null name and the other has a non-null name. So, this illustrates 2 things first of all if we insert both of these there would be a conflict on IIT. So, we want to have a constraint which says that IIT must be unique. So, we are going to declare it as a primary key the second illustrates another point which is that name has a null value here this is legal with this table as shown, but I do not think it would be a good idea to insert an instructor with no name may be it is called NEMO, but that is still in him null is not something you would want to allow. So, we can add constraints. So, here is a better version of that same table ID care file that is the same, but we are declaring it as a primary key here primary key ID name where care 20 years before, but now it is declared as not null and the last part of the department name we had no constraint on it earlier. Now, we know that it must be a foreign key. So, foreign key department name references the department table which is separately defined. Now, note that if you declare something as primary key it is automatically not null. So, ID cannot be null which is declared as primary key. So, few more definitions I will not go over all the details student with ID as primary key foreign key department name reference with department as before here is a table takes this reflects what courses the student takes. So, ID the ID of the student the student has taken a course, but actually not a course, but a course offering. So, course offering is identified by course ID section ID semester and year. So, now this identifies that this student has taken this and the student gets a particular grade when they took that course. So, grade is where care 2. So, we have maybe later grades here. Now, note that we need a primary key for this table should grade be part of the primary key it if you have this whole thing ID course ID second ID semester year grade surely it will be unique. So, it is a super key the question is does it make sense to call it a primary key and the answer is no, because you cannot have somebody get 2 different grades for the same course. So, grade is part of a super key, but dropping it will still let the remaining attributes be a super key. Therefore, you cannot include grade in a candidate key. So, the primary key has only the remaining attributes not grade. Then you have foreign key as before where ID references section and then these 4 sorry ID references student and these 4 course ID section ID semester year references section. Then there are one more course table and then there are couple of quiz questions here. So, again I will let you work through these quiz questions and I will stop and take questions from you and after answering the questions I will come back to the quiz question. There is a question about can foreign key be null and why that is a good question I did not answer that earlier. So, if you see here course has a foreign key department name referencing department. Now, you would assume that a course I better have an associated department. Therefore, it makes sense for the foreign key to be not null. However, there are some cases where you do want a foreign key declaration, but null may be an option. Now, in this particular schema I do not think we have a good example. Let us go back here. So, in these things it is yeah. So, maybe take student this could be a meaningful situation student has a foreign key department name referencing department. So, in most colleges in India student is admitted to a department. In most colleges in the US students are not admitted to a department when they join the university. They are admitted to the university, but they are expected to select a department after couple of years. In that case department name could be null indicating they have not yet chosen a department, but when they do choose it it had better be a meaningful department. Actually life is little more complex they have majors, double majors, minors. So, real life is more complex, but if we just had one department name it might make sense to allow it to be null. So, foreign key could be null in certain situations and SQL allows it. So, if it is not null it had better be present. The next question is is rename operation considered in basic relation algebra. And here you can consider relation algebra which has no column names at all. Everything can be identified by position column 1, column 2 and so forth. For a machine to interpret such a relation algebra expression is perfectly feasible, but for a human to read it is really messy because you do not know what column 1 means, but if you give it a name you know what it means. So, in the relation algebra which does not have column names the rename operator is irrelevant. There is nothing to rename there are no names in the first place, but in the version of relation algebra which includes column names the rename operation is important and is considered part of the relation algebra. So, it depends on what flavor of relation algebra you consider. Somebody had a question about semi-structured versus structured data. This is something we will address in the last day, but SQL is an example of data which is structured. Each tuple has a particular structure it has a fixed set of attributes, they have types there are constraints and so on. But not all data in the real world necessarily matches this kind of structure. If you take web pages there are also data in some sense. They may even have HTML structuring of some form a title body and so forth. But not all pages have the same structure and so they could be considered unstructured or semi-structured data if they have partial structure. XML would be considered semi-structured because it has some structure, but it is very flexible in this structure. So, we will come back to this question later. Relational data is structured. The question about what is max length for back here that is database dependent. So, you have to look up the database manual to know what is the largest length. But I should note here that if you want to store really large strings, no database is going to allow you to do where care. But all databases in the SQL standard itself includes a type called C log or character large object C log. And if you want to store very large strings that is what you should use. There are some questions about default question of real double again I am not going to get into that. Look up the database manual. How to declare date format in SQL? I am going to answer that question later on. The next question says, does natural join match all the common attributes or only the key attributes? So, natural join is based on the attribute theme. It matches all common attributes. If you wanted to match only some attributes, relational algebra does not have notation for it. But you can always write it as a Cartesian product followed by selection or relational algebra has some notation which I have not shown. But let me show it. So, I had said R join S is natural join. But there is also what is called theta join where I can say R join on some condition R dot A equals S dot A of S. So, maybe R had attributes A B C, S had attributes A B D. The natural join of R and S would require both A and B to be the same. Whereas the second one is only requiring the A value to be the same. So, you can use this operation in relational algebra or you could write it in the usual way as select R dot A equals S dot A on R cross product S. So, these two are actually equivalent. The next question is where can we use wear care, wear care 2 with specific examples? I am not going to cover all aspects of data types. But wear care 2 or there are variants of it called N wear care for national wear care and so forth are things which were introduced to support character sets which are not 8 bit or 7 bit. So, the original SQL standard was focused on ASCII where every character is actually 7 bits or it can be stored in 8 bits. But when a database is needed to support other languages like Hindi or Chinese or whatever, people have moved to the Unicode standard which can be represented using UTF 8 which is 8 bit version of Unicode or there is a basic Unicode which may require multiple bytes to represent a single character. So, the SQL standard has something called N wear care or N care or national wear care N wear care which indicates that it should support a Unicode which is also called wear care 2 in Oracle I believe. The next question is what is the practical use of the division operation? Now, the division operation is something I did not cover here. What I will request is that we put off this question till a little later in the SQL chapter where I have a slide which shows how to do the equivalent of division in SQL. So, I am going to answer this question about division along with the SQL way of doing division. So, I will put that question off, but it is a good question what is the practical use of division? It is rare that you need division, but there are some very good examples, but since it was asked let me mention for those of you who are not familiar with division do not worry I will explain it, but if you are familiar with division a common use is to find say people who satisfy all of several different conditions and that condition could be say people who have taken all the core courses in the computer science department. So, that is the check which we need to make. So, there is a set of core courses if somebody has not taken one of the core courses they cannot graduate. So, how do I check this? So, there is a relational algebra operation called division which can check if a particular person has taken all of given set of courses and if they have not well they should be flagged. So, it is useful, but it is not very common method. There are some questions about JDBC Java I am not going to take that right now. So, we will do that later on we have a JDBC lab. The next question about what is the result of comparison of null equal to null I am going to cover this in more detail coming up shortly. And the last question which I have on this list says after the table is created how can we say succeed, but I mean I think you mean how can you change the length. So, if you have a tuple with wire care 10 and you have a name with 8. Now, supposing you have to correct the name to be 9 or 10 it is ok, but if you want to store a name of length 12 you cannot store it in wire care 12. So, then you can go change the schema. So, SQL DDL allows has syntax for modifying the type of an attribute that is coming up ok. I will stop there and go back to the slides. We had a quiz question there. The first question was the primary key constraint ensures the primary care values are repeated across tuples are not null. Of course, it does not ensure they are repeated ensure they are not repeated, but it does ensure they are not null. So, the answer is false and true over here that is a simple question. The second question is the foreign key constraint on course is stable here ensures all departments have associated courses no it does not. It just ensures that a course has an associated department or it may be null. Now, does it ensure department name is not null no it does not. As I said in SQL a foreign key value can be null unless we explicitly say not null. Here we did not say not null therefore, department can be null and department name occurs in the department relation well this should probably be modified if it is not null that is actually what it ensures. People had questions about changing the schema. So, you can drop a table you can delete from a table which deletes all the tuples, but keeps the table here delete from and you can also alter the table. So, add an attribute with a specified type or you can modify I am not shown the syntax for it, but you can also alter table alter attribute and then give a new type for it. So, the syntax for it is slightly database dependent SQL has a standard, but it is not very widely respected. So, read it up from your manual to see the exact syntax for the alter table construct. The alter table construct in SQL also lets you drop an attribute totally, but it turns out many database systems would not let you drop attributes. They let you drop a relation, but not an attribute do not ask me why they have done it that way it is for their implementation convenience there is no logical reason other than convenience. So, if you need to drop an attribute you have to first copy the table contains to another table without this attribute you want to drop then drop the table move the other table back to this and so forth. So, you have to work around this issue. Now we come to the basic query structure in SQL. In SQL every query has the form select list of attributes or it can even have expressions here from a list of relations where some predicate is true this is the basic form and the result of this is itself a relation. So, let us look at this overall structure in pieces by starting with the select loss. So, here is an example query focus your attention here the query says select name from instructor. So, we have an instructor table we are select in this case that is equivalent of relation algebra project of the name from the instructor table. Now, you should note that if two people have the same name this particular thing is going to show the same name multiple times it is not going to remove duplicates. So, this corresponds to the what we call the multi-site relation algebra I will come to that later on. Again syntactic details here SQL names are case insensitive. So, you can use name like this or that or that they are all exactly the same thing, but be careful that you cannot do this for actual strings which you store in the database. So, one of the problems which we find in the lab is you know in a sample data we have a name like physics with a capital P somebody types physics with a small p and it does not match because character equality is case sensitive, but in the SQL language names are not case sensitive that is all. Again formatting conventions we use bold font in our book and in the slides for keywords select from where, but many people instead of using bold font may be bold font is not available they use capital letters for select from and small letters for the relation names. So, as I said by default SQL does not remove duplicates if you wanted to remove duplicates you have to say select distinct department well the previous one was select. So, back here select name from instructor in our sample database no two instructors have the same name, but if you said select department name from instructor which all departments have instructors, department names would be repeated many times. So, we say select distinct department name from instructor now the keyword all specify the duplicates should not be removed and this is a default if you do not say anything select department name is the same as select all department name again some syntactic features select star from instructor says select all attributes star means all attributes again many databases support other variants like in say select R dot star which say select all attributes from the table R and this is the syntactic convenience relation algebra does not have this, but it is easy enough to look at the schema and replace the star by a list of actual attributes from all the tables here or from R dot star meaning all attributes of relation R. You can also have expressions for example in this case select ID name salary divided by 12 from instructor, salary is numeric divide by 12 is a number. Next quiz question says which of these clauses is optional in an SQL query select from where none of these from what you have seen so far which is optional the query just above shows that there is no where clause so that is clearly optional now is select optional no we any query has to return something the select says what it what are the attributes or what are the values that it returns so select is not optional from is also not optional in SQL it should be present. So, where is the answer so now the where clause what does the where clause do it specifies condition the results must satisfy this corresponds to the relation algebra select operation. So, here is a simple query find all instructors in the comm side department with salary greater than 80,000. So, select name from instructor where department name equal to comm side not be capitalization and the space and the dots all have to match exactly with the sample database otherwise this query will give you no output. So, in the labs when you have queries like this be careful about the sample database match exactly as is shown there with space dots cap everything should match where department name is from say and salary greater than 80,000. So, that is a simple query which selects instructors matching this condition. So, you have seen an and here you can have and or not and you can have arithmetic operations you can have string operation we will come to that in a little bit now let us come back to the from clause. So, far we had only a single relation in the from clause now let us look at a query which has two relations in the from clause. So, when I list over here select star from instructor comma teaches that is the same as saying find the Cartesian product of instructor comma teaches and then apply the where clause condition here there is no where clause condition. So, this final result is simply the Cartesian product of instructor and teaches we saw the Cartesian product in the context of relation algebra that is the result here as in the case of relation algebra the Cartesian product is not very useful by itself very rarely is it useful, but it is very useful once you have a where clause condition. So, if you have a SQL query without a where clause and multiple relations in the from clause it is probably an error most cases is an error. So, this is the Cartesian product of instructor teachers we saw this before it can be very big if this had 500 instructors and teachers had let us say 10000 tuples we are looking at 5 million rows in this table which is way too big, but what you really want is joint where there is a condition. So, here instead of taking a cross product of instructor teachers what we have done is we ensure that instructor dot id equal to teachers dot id. So, what is this query doing for all instructors who have taught some course find their names and the course id of the courses they have taught. Now, you will note that the English form of this query is sounds a bit weird we say for all instructors who have taught some course this is to make it unambiguous supposing we say for all instructors find their names and the course id of the courses they taught. So, what about instructors who have not taught a course should they be present or should they not be present it is unclear. So, how do you write an SQL query corresponding to an English specification which is unclear you cannot you will have multiple possible answers. In fact, this is a big problem when we set exam questions or even lab questions with SQL it is very easy to write something in English which is ambiguous which has multiple meanings in and different people may interpret it differently and write different SQL queries. So, you have to be careful with this but mistakes do happen even in my labs I do have many times when I am left something unclear and people use different interpretation. So, in the end we have to allow any of several different interpretations and give marks for each of those, but as far as possible we want to make it unambiguous. So, here we made it unambiguous by saying all instructors have taught some course. So, that is the query if you want instructors who have not taught any course also to appear well what does it mean to show their course ID what should it be how can you specify this. And so you can do this using an operation called outer join you can also do it using Cartesian product and so forth. So, we will come back to this in chapter 4 which we will cover tomorrow not today. Now, here is a more complex join this time between section and course. So, I have shown a piece of the schema diagram section has course ID and various other things course has course ID and various other attributes. What I want is find the course ID semester year and title of each course offered by the comp side department. To answer this question you have to look at this schema first course course is a relation I want courses offered by the comp side department how do I know which courses are offered by comp side. Luckily department name is an attribute. So, I can apply a selection condition right on the course, but now I need to find the semester year and so forth title is here, but what about semester year and so forth that is not in the course relation that is in the section relation which gives the offerings of that course. Therefore, I need to use the section and the course relations, but how do I link them typically this would be on a foreign key. So, this condition here ensures that section dot course ID equal to course dot course ID. So, the query says select something from section comma course where section course ID equal to course course ID and department name is comp side, because I only wanted it for comp side and finally, it says find the course ID semester year title those are listed here. Now, why did we say section dot course ID? So, course ID is there in section and in course if I just say course ID which one turns out that they have to have the same value, because you made sure that these are equal here, but that is something which the basic SQL interpreters may not be clever enough to figure out. So, they tell you you better give a name here. So, that means no unambiguously which one it is that is why we say section dot course ID. So, this is another good point to take a break answer some questions and also write these queries. So, what I will do is give you a few minutes to write let us say the first one or two of these queries and meanwhile I will see if there are earlier questions and answer those and then we will come back and write these queries and that might be a good point to wrap up today. There are lots and lots of questions I cannot possibly answer all of them, but let me pick a few. One question is how to perform image based search in SQL? This is completely out of the context here, because images are first of all what is the type of images? SQL allows you to store something called binary large object, but image search is a lot more complex you do not just match on exactly the same image one similar images that is totally out of the scope of what we cover in this course. Image databases are an important topic, but it is not. Another question is what is no SQL? Are there applications using no SQL? This is a good point to ask this question, because we are covering SQL, whereas there is a lot of buzz out there about no SQL systems. So, this is a good point to talk about no SQL systems. So, I told you a little bit earlier that why databases and not file systems there are many reasons and in the world of big data people wanted parallel systems to do various tasks. So, they actually went and built brand new systems which use file systems directly without a database and that has some benefits, but it has many drawbacks all the drawbacks which we saw earlier. So, then they realized that they actually need databases and now there is a new generation of systems which are massively parallel storage systems which provide many database features and they started with only a few features and they have been extending it to provide more and more database features that was one angle. The second angle is that these systems also were proud of the fact that they did not support SQL, that you had to write queries in their own API. They were very proud of it, but it turns out it was not necessarily something to be proud of for several reasons. One is if not SQL then what? You are going to define your own API. Well, there are at last count about 50 big data systems out there each with its own API. Which one do you choose? What if the one which you chose? So, if the one you chose is no longer supported, what do you do? You are stuck. So, people want a standard. They want a standard language and that is what SQL is. It is a standard language. Then the next thing is the no SQL systems were proud of the fact that you could integrate it directly into your programming language without the overhead of SQL. What they did not realize is that this means that you are now responsible for optimization. It is your headache. So, for all these reasons, no SQL is facing a new death. So, they started off by being proud of being not SQL, no SQL. Then they are now kind of retreating with their tail between their legs much like a dog which is scared and feeling ashamed and going away. So, now to hide the shame, they called a new SQL instead of no SQL. But all this is just terms. The underlying principles still hold. You do want all the nice things which SQL databases have been giving. Should it be the SQL language or some other language? That is a separate point. But the features which they provide, the declarative language, the automatic enforcement of constraints, language which makes it easy to write complex queries. All of these are very important and the no SQL databases are now eventually evolving back into full-fledged SQL databases. So, in the last year, in the last five years, I would say, they started with no SQL with files. Then they went to simple storage systems. Then they went to schemas which are not full-fledged database systems. But there are relations with schemas. Then now we have gone all the way back to SQL and Google's latest offering in this space called Spanner. It is an internal tool. But it is reflective of the history of many of these systems. Google pioneered many of these things. They did not open source any of these things. But they described them in enough detail that people went and built equivalent systems. So, the equivalent systems are following the same trend. First, no SQL systems were built. Now, eventually, we will end up with massively parallel SQL systems for big data. That is the current trend. Good question here says, in the university schema, how can time slot ID of the table time slot be referenced from section as it is part of a key? The time slot is not unique in itself. So, to answer this question, view this bottom of this slide. It is actually visible here. If you see, time slot has key, which is time slot ID, day start time. Whereas here in section, time slot ID is there, but there is no day start time. So, I have shown this as a foreign key. There is a missing arrow head. It should have been there. But this is not a foreign key. You can actually enforce in SQL return source. So, we could have done this design differently. We could have had a time slot table, which had time slot ID as primary key. And then, this thing would reference the time slot table, which no longer has day start time and time. And then, we could have had one more table, which included day start time and end time, where which is equivalent to this time slot table. So, we could have had one more intermediate table, which would have time slot ID as primary key. And in fact, it would have no other attribute. That is the only attribute it would have. Then you could actually have a foreign key from section to that intermediate table. I am going to come back to this when we talk of the ER model. Why we chose this design? It was, we could have chosen a different design, but I am going to come back to it later. So, I would not answer this question in any more detail now. But the bottom line is yes, you cannot declare time slot ID as a foreign key referencing the time slot relation, because time slot ID is not unique over here. SQL does not allow you to have a foreign key if the value is not unique. Now, should SQL allow it? I mean, I think it should. There are many cases where foreign key should be allowed even though it is not unique, but the SQL standard currently does not allow it. I do not know if any database actually supports that, but it would be nice to have it. Next question says, what about other query languages like HQL, Link and so on? Now, that is a good question. In this course, I am focusing on SQL as if no other language exists. In fact, if you see the history, SQL was the first language proposed, but very soon many people proposed many other languages. There was an explosion of languages and then most of those languages died out. For example, Postgres SQL, which we are using started off its life as Ingress, which had a language called Quel, which was actually used commercially quite a lot. In fact, there are a few installations still left behind. Ingress moved to Postgres still with Quel, but eventually Quel died because not enough people were using it. Some people took the Postgres code and made it Postgres SQL. That is why the SQL is the tail of Postgres. So, Ingress to Postgres to Postgres SQL, where Postgres SQL dumped Quel and implemented SQL instead. The bottom line is there were many languages. They all died out and became SQL. That does not mean there are no new languages. So, there are new languages. HQL is one such and then there is Link. So, there are many others. As of today, none of them has gained major traction. May be one of them will, but if history is any indicator, they are going to die. Although, there are many dialects of SQL. So, what happens is people implement pieces of SQL. So, for example, there is this big data system called Hive, which implement some subset of SQL. Let us call it Hive SQL or Hive SQL. So, everybody now basically ends up creating a language, which is as much like SQL as possible, simply for compatibility with may be programmer mind sets is nothing else. So, that is not to say other languages cannot survive, but as of today, the big daddy in town is SQL and it has thumped out all other opposition. So, as a practical matter, we focus on SQL. In the book, we do have other languages. There is Datalog, there is QBE, which are all perfectly nice languages in their own way, but there are no major implementations available today. So, we drop them for convenience. There are some more questions in depth, which we are going to answer later. So, if I do not answer your question now, assume that I will answer it later, if you are asking, specifics about SQL. Somebody has asked the question about the dual table in Oracle. There is really nothing deep about the dual table. It is simply a convenience, rotational convenience. It is a table with, I think, only one row and one column, which lets you do things like select 4 plus 5 from dual. You need a table name, and so the table dual in Oracle, simply a convenience, which lets you write your own expressions. Why would you say select 4 plus 5? Of course, we know it is equal to 9, but you could say select current time from dual, where current time is a function which displays the current time. So, that is useful to output the current time, without worrying about which table to select. I just want the current time. I do not really care about which table from which I should select it. It is not even associated with the table. Current time is a function. So, the dual table is simply a table of convenience, which lets you do this. If you are used to Oracle, dual table, and you want to do it in PostgreSQL, which we are using in this course, just create a table called dual with one attribute, let us say, call it whatever you want. Insert one row in that table with whatever value you want, and then you can say select something from dual, and say current time from dual, and get the equivalent of the Oracle dual. The question about dropping foreign key constraints, the answer is yes. It is possible to drop a foreign key constraint, without dropping the column. This is very useful sometimes. You want to move tables around, temporarily violate the constraint. You can drop the foreign key constraint. Do whatever you want, which violates the constraint. Restore the data to a form which satisfies the constraints, and then enforce the constraints again. Of course, while you are doing all this, you must make sure that nobody is updating the database directly. The last question is interesting. Aadhar is using what data model and what database. I wish I could answer that question, but I cannot, because I do not know what Aadhar is doing internally. But if you want this question to be answered, I could find out the answer from somebody who I kind of know who works on the Aadhar, who is heading the, I think the chief architect of the Aadhar project. So, it could be an interesting thing to find out what is going on there. So, for those of you who do not know about Aadhar, which is nobody, I am sure all of you know what Aadhar is. But the technical challenges in Aadhar are storing fingerprints. Then they have, I think, iris scan, and then they have height and some other, so these are the basic biometrics. In addition, they have name, address, and so on. So, storing all of this in a database, storing is not a big challenge. Fingerprint is simply another binary type as far as the database is concerned, iris scan is another binary type. The challenge is matching records over here. How do we know if a particular person who just registered is cheating because that person has already registered and is creating two Aadhar numbers? How do we detect this? We want detection of duplicates. That is a very hard problem. This is the second use of Aadhar, which is, here this person who registered and now somebody is coming to the bank account and claiming to be that person. Can we verify that it is the same person? This is the use of Aadhar, which will happen at every bank and so forth. It will happen in the future, if not already. That is for verification. Are you who you claim to be? Aadhar will be very useful because it is already stored the fingerprint and iris scan. Now, if somebody shows up, you scan the fingerprint and ask Aadhar, please verify if this fingerprint matches. What are the fingerprint match algorithms? That is actually not a concern in this context. Anything which works properly is good enough. We do not care. The hard problem is, can you detect if this person already applied for Aadhar card? That is very hard because you are doing approximate matching of fingerprints on a database with one billion records with one billion plus Indians. That is very hard. Frankly, I do not know how they do it. I do not know how good a job they can even do of this. It is very hard. It is a very massive problem. So, it will be interesting to find out how they do it. But it is outside the scope of what we cover in this course. We are focusing on the simpler problems where match means value equal to value. In the context of other match means value is approximately equal to this other value for some weird notion of approximately equal to that is very hard. So, we will not get into that. That is a good point to stop because we are already into lunchtime. I did want to spend a few minutes on writing these queries. So, let us maybe just do that and I will wrap up in 5 minutes. So, let us take the first query and not just give me a minute to find the value of instructor with ID 10101. This is actually an easy one. I want to even bother writing it. We have instructor which is kind of lost here. It is not shown in this particular diagram. But let us say this is student and total credits instead of instructor. Then it is simply a query on this one table. So, we can write it out. Let me select name from instructor where ID equals 10101. That is a very, very simple query. Now, let us come back to the second query. Find the titles of all courses in the COMSI department. So, we have a department here. We have a department name. Then we have the course table here. So, do we need to do a join of course and department? That is what in this case we do not because the department name is already present in the course table. So, all the information we want is in the course table. We do not have to look at the department table in this case. So, what we can do is select title from course where department name equal to COMSI. So, that is a very, very simple SQL query. Now, let us look at the third one. Find course ID year and semester of all courses taken by any student named Shankar. So, what are we doing here? First of all we need the name of a student. Where is that available in the student table? We also need to know what courses this person has taken which is not in the student table which is in the takes table. So, now, we need to take a join of takes with student and from takes we get ID year semester. So, all the information we need is there in takes and student put together. So, we just have to write a simple join. Select ID what is it year and semester I think year from takes comma student where first of all we have to have a join condition between takes and student. So, what is the join condition? Takes dot ID equal to student dot ID. Remember that was the foreign key referencing there. So, we say takes dot ID equal to student dot ID that is the join condition. Moreover, we wanted this for the student named Shankar. So, and name equal to Shankar. So, that is our query which combines things from these two relations takes and student and has a join condition and a selection condition with the name. What if there are two students named Shankar? What will happen? Well, you are going to see the union of courses taken by both of them. You may get duplicates even otherwise if Shankar repeated a course twice you will see that course appearing twice. Maybe it is not going to happen in the same year because nobody can take the same course twice in the same year semester. That is a constraint which we have not actually enforced here, but in the takes table if you see here do we prevent a student from taking the course twice in the same semester. In fact, we do not in this particular case because section ID is part of the key, but supposing we wanted to say that a student cannot take the same course twice in a given semester. So, then ID course ID semester year would be a key and section ID would not be part of that key. We could have chosen to do it that way that would enforce the constraint that a student cannot take the same course two different sections of the same course in the same year and semester. And the last query here as above, but additionally show the title of the course two. Now, where is the title of the course? Is it in takes? No. Is there in course? Now, how do we join takes and course? If you see takes is linked to section, section to course. So, you might think that we have to join section and course to get the data that we need, but in this case we can short circuit it because course ID is actually also reference through this to course directly. So, we can directly join course takes with course. So, what we will do is go back to the previous query and modify it to add course. And here we want title and we need an extra join condition here. So, we will say and takes dot course ID equal to course dot course ID that wraps up that particular query and it also wraps up our pre lunch session.