 I will wrap up the relational algebra section and move on to the SQL language. So, chapter 3 of the book has coverage of SQL. Now, there is a fair amount of content here and I do not plan to finish all of it today. I am going to cover part of it today and then continue on tomorrow. We have basically 3 days of lectures for SQL. So, we have plenty of time ahead of us. So, at points I will stop and allow you to ask questions through the A-view software. So, this chapter has several topics starting from an overview of SQL moving on to very brief introduction to data definition. We are going to come back to the full-fledged data definition language later on, but we are going to do just enough to get started now. And with that we will move quickly on to the basic query structure, simple queries, set operations, some of the complications due to null values. And we probably would not start on it in detail yet today, but aggregate functions and nested subqueries are the next part followed by modification of the database. So, a little bit of history of SQL when relational databases were first proposed by COD back in the late 60s and they became very popular in the early 70s. In fact, COD won the extremely prestigious ACM Turing Award for his work. That Turing Award if you are not familiar is the equivalent of in computer science of the Nobel prizes in other fields. So, COD won the award for the relational model. Now, he did not win it for just saying that, hey, store data in tables. He did a lot more. He introduced the notion of the relational algebra and calculus. So, there are different ways of querying the same data which he showed are all roughly equally expressive and he even worked on database design and normalization. So, he did a lot of the foundational work of the relational model, but with all that he did one more thing. He realized that people are not going to be programming using this Rayleigh Schaljebrand calculus, but you need a higher level language and SQL came out of that. So, it was inspired by COD who in fact, one of the selling points he had for the relational model was that managers could write the queries themselves. They did not have to depend on programmers. So, the original idea was SQL was supposed to be S-E-Q-U-E-L for structured English query language. So, you write kind of English's queries, but has to be a little more structured and the idea was managers could write their own queries. As it turned out current generation SQL even back then SQL really did not look like English and at some point people abandoned the full form of structured English query language and just said it is structured query language or SQL. In the industry many people still use the old term SQL. So, you will find people saying SQL or SQL they really mean the same thing. Today the original expansion is irrelevant, but people still say SQL. So, initially it was prototyped and later there were commercial systems which released it. Oracle and IBM strangely now Oracle was one of the first to release a commercial system. IBM had already done its laboratory prototype, but they waited a little more to make it more robust before releasing it. And there were various versions of the standard starting from 86. The last one mentioned on this slide is 2003, but there have been updates to parts of the standards later on 2006 and 2008. Now, the SQL language standards are enormous. They are thousands I think 5, 6,000 pages worth of standards and these standards do not even have examples. They are all very terse documentation which is very, very hard to read and understand what the hell they are saying. So, the people who wrote the standards also wrote books to explain intuitively what is going on. But even those cover a lot of features. SQL is a very, very big language today. And in fact it is so big that most implementations do not implement all of SQL. Implementation typically implement only some subset of SQL. In fact there is a trade off here. So, there have been times when the SQL committee decided it is too early to standardize because not everyone wants to implement it. And then people went and implemented the same feature in 5 different ways and then it was too late. So, then they said let us standardize ahead of anyone implementing it. So, that everyone will do the features in the same way. So, what happened is the features ran away a way ahead of actual implementations. So, you will see when you try to run SQL queries on any specific database system you will see problems on both these sides. You will see certain features which are there in the SQL language are not even implemented in the database. You will also see there are certain features which they probably implemented before it was standardized. So, they have their own syntax for it, their own quirks. Unfortunately this is a fact of life which we kind of have to deal with. There have been a few companies which have tried to build translators from the standard SQL to specific SQL dialects supported by different databases, but they are not commonly available. So, when you do the lab exercises you will find a little bit of difference between the SQL as presented here in these slides and what actually runs on your database system. Now, in this course lab we have standardized on PostgreSQL and as part of the lab in the afternoon you will be setting up PostgreSQL that is possible or at least going over the steps and seeing a demo in case you do not have access on individual machines. The goal is that you should be able to replicate this in your college when you go back you should be able to set it up for your students. Even if your students do not set it up themselves you need to set it up for them. So, by following SQL as a standard we are going to also mention what work in SQL in PostgreSQL and what does not. Luckily the differences are not too many, but many places I know use Oracle and in fact Oracle is written into the syllabus in many places. So, we will also mention SQL variants as they exist in Oracle. So, that if you use Oracle you will know what features to avoid or what to rewrite so that it will work in Oracle. Now, there are other dialects SQL server DB2, MySQL there are many other implementations each of which has its own variants. So, you sometimes you just have to do trial and error if particular syntax does not work you have to figure out what does on that database system. In the book we have tried to mention in a few places, but it is not exhausted. With that as an introduction to SQL let us look at the main features. So, any data definition language has to define types and as you know in any programming language if you want to declare a variable you have to say what is the type of the variable. Of course, there are untyped languages, but there are trade offs if you do not type you may end up storing wrong values where you wanted to store an integer you might end up storing a string. A string for a salary makes no sense and therefore we want it to be a number. So, types are very important SQL has its own set of types which vary a little bit from the type systems of typical languages. Since in business applications we often have to store strings and we often want to put limits on the string sizes for various reasons. In early days it was to make sure the data did not become too big. These days it is meant to ensure that when you print out a report it stays within the boundaries of page and does not overflow widely. Therefore, SQL has a notion of a character type and an associated length. So, there are two basic types care n and where care n, but I should mention it is not on this slide, but these are for ASCII. So, what if you want to store in Indian language? You want to store in the or Tamil or Canada or whatever in your database as a string what you do? You cannot store in ASCII. ASCII does not have a representation for our characters. Therefore, you would have to use Unicode that is the standard. You could get away with ISCII which is basically reinterpreting the ASCII characters to mean something else, but the standard today is Unicode. Now, how do you store Unicode in a database? So, this SQL standard is to use n care or n where care n standing for national. So, n where care and n care are able to store Unicode and they store 2 bytes per character by default although they can be compressed. So, that allows the system to store characters from all scripts which are existing in the world almost today. So, if you want to do local language work use that, but again databases implemented differently for example, in Oracle you would have to use where care 2 to store Unicode n where care or n care do not exist care 2 and where care 2 what you would use. Where would you choose between care n and where care n? This is a bit tricky it turns out that where care n cuts off the string up to the last character you used whereas, care n pads the rest of the string with blanks. This can cause trouble when you compare a where care with a care. So, you should be careful with this typing. As the rule of the thumb which probably perfectly to use where care everywhere except where you know the input has to be exactly the specified size. So, in our case the IDs we are assuming in our domain will be exactly 5 characters 5 is 2 small really IIT Bombay uses 8 characters. The Indian UID will use 12 characters believe 11 characters to uniquely identify plus a check digits. So, it is 12 always 12 12 numbers probably not even characters 12 numbers 12 digits. So, you could specify even a numeric with a fixed precision. So, the numeric type if you say numeric 12 comma 0 that is 12 digits with no decimal places after it. We could also use where care 12 which should allow characters other than numbers. Then there is a usual int real double precision float and so on and some of these take length specifier. So, you can store it to whatever accuracy you need for your application. The language allows you to specify the accuracy. Most other languages do not let you do this java C plus plus do not. If you say int, int means whatever the compiler chooses it int to mean it could be 32 bits it could be 64 bits depending on the architecture. Since, SQL is data which is stored which is shared between many different computers you need a representation which is very specific. So, that is what SQL allows. There are a few more types which we will see later, these are the important ones. So, now we have seen the usual the standard types we can create tables. So, how do you create a table? You just list the attributes along with their domain specification domain is a type basically plus you can have a little bit of extra information as we will see. And after you list the attributes you can specify any other constraints which are primary key foreign key constraints. So, we will see how to do a few of those. So, here is an example down here which creates a table called instructor with an ID which is care 5, a name which is wire care 20. And now the designer decided that we will not allow anybody without a name. Those of you who are familiar with this musician called prince he decided one day to call himself prince that is as I know not his real name. And he decided another day that he will no longer be called prince his name will be a symbol which does not have a pronunciation which is all a bit crazy. So, if you leave out crazy people everybody else should have a name. So, the schema designer said that name is not null. So, that is an extra constraint because SQL by default allows nulls in every single type. So, you can have null for integer you can have null for a string you can have null for a floating point number anything can be null. This is a little different from regular languages. So, now any student who is new to SQL may not realize this and when they do the design they forget that null is a possible value. Now, how does this have an impact? So, if you try to interface language like Java to a database the problem is that you may get a null value back. So, now how do you deal with a null value given the fact that an integer cannot be a null value in Java strings are a little bit easier. They are usually references and null is a valid value for references even in a language like C plus plus or Java. But if you have ints or floating points and so on how do you deal with null values. So, that is something which is handled using API functions I would not get into it here, but it is something to one students about. So, now moving on department name is where care 20 and salary is declared as numeric 8 2. So, the salary can have PISA value behind it with up to 6 digits of actual value. Now, 6 is probably too small salary for high paid people these days in companies, but it is good enough for government employees at least one of probably not for long with inflation being what it is. So, a real database would probably allocate more characters instead of 8 2 it may be 10 2 or so, because it does not tell us what currency it is SQL does not have any notion of currencies. So, the application designer basically chooses what currency it should be. So, now we have created a table we can insert records into that table how do you do that there are 2 statements here insert into instructor values 1 0 2 1 1 that is the ID note that this value is enclosed in single quotes. If you do not put it in single quotes it will be treated as a number whereas, the type of ID is where care we have decided to allow the characters in that it is not a number. So, you need the quotes similarly name is a where care. So, it also has to be in single quotes the name is Smith department is also where care it is biology in quotes and finally, salary is an integer. So, we have decided this person will be paid 66000 rupees. So, that is the value down there similarly we have one more with which tries to insert the same ID, but with a null name what will happen in this case because, we declared the name to be not null the SQL engine will say hey you are trying to insert something which is null where I expected something which is not null and that particular insert will be rejected it will be a constraint violation. In fact, supposing I inserted a new tuple instead of null I gave something it could be a Smith again or it could be John or whatever does not matter what it is. In this case it will allow the insertion and you are going to land up with two instructors with the same ID 1 0 2 1 1 and different names that is not good you do not want that to happen. So, what do you do we have to tell SQL that the ID attribute should be unique and we want to say that it is a primary key in this case. So, we can do that in SQL. So, primary key is one of several types of constraints which we can express using SQL the first one is not null which we saw in the previous slide primary key we are going to cover now and then we are also going to look at foreign key which is also there on this slide. So, if you go down to the example here. So, now what we have done is we have listed a constraint further down which says primary key ID. So, there can only be one row with a particular ID value and if you try to insert a second row the database system will reject it saying the primary key constraint is violated the line after that is the next integrity constraint and what is this constraint is the foreign key constraint. So, remember that instructors department names should occur in the department table otherwise it is an error if you do not do this when people do data entry they may type in arbitrary names they may want to type computer signs and you know they may make typing mistake and write a little bit wrong or they may spell it grossly wrong and your data will be a mess even if people thought they were doing it right they may make errors and worse still if people do not care they may put anything they want they may be too lazy to type anything and they may say department x y z dash anything and that happens I have seen this happen in many applications if you do not insist on data being correct users will put junk into it. So, in this case what we have said is the department name in instructor should be present in the department table. So, the constraint specifically says foreign key department name references department. Now, we are telling which table it references, but which attribute of that table is reference. So, what we are saying is a particular attribute in this table that value must appear in a particular attribute of another table, but note that a declaration does not give an attribute name in the department table that is because by default the reference attribute is the primary key of that table. So, if we declared the department table with primary key as department name then by default the department name here in this foreign key constraint will refer to the department name attribute of the department table. Note that it is the correspondence is not by name that primary key of department table could have been called anything it could have been called d i d for example, but even though here it is called department name there it is called d i d that does not matter. Once we have declared it as a foreign key the value of department here will have to be present in the d i d attribute there. So, that is how you declare a foreign key. Now, if you try to insert an instructor whose department name does not occur in the department table again the SQL system will reject it saying foreign key constraint violated. So, if you want an instructor in a new department what do you do? First you create the department then you insert the instructor in that order you should also note that the primary key declaration automatically ensures the value is not null because if you had a tuple identified by the value null that is not unique you may have two things with null you know that is not a good situation. So, the primary constraint also forces the value to be not null. So, in this case you cannot create a tuple with id value being null it will be rejected. By the way to make a small detour many of you may already know the stuff I am teaching here and you may have taught it multiple times. So, you may have you know questions or feedback on what are aspects of this that one should stress to students. I am trying to do that here trying to stress issues which could crop up when you teach it questions which people may raise. So, I hope I am able to answer some of the doubts which you may have had while you are teaching or some of the lessons which you learnt when you taught your students and later you realized they did not understand something and maybe you should have taught it a little bit differently. But I am sure there are things which you have realized which I am not covering here. So, part of this course is to teach the subject, but given that many of you probably already know this subject the second part of this course is to learn how to teach I am not I am not saying teach how to teach that is part of it, but I also want to learn from you through your feedback on specific issues or confusion which students had which you realized later after teaching it once and therefore, you want to tell people watch out for this. Now, how do you give this feedback? There is of course, the question time which we will have in the little while, but also through other means we will provide you an email address for you to send suggestions comments every day. So, coming back we have seen a few tables so far here are a few more tables. We have the student table and the takes table. So, let us start with the student table we have declared I D to be again a care 5 here we made it where care 5, but for consistency it probably should have been care 5 this is a typo in the slide and which is also declared as primary key. We have name as where care 20 not null that is similar to instructor. We have department name total credits note that total credits is numeric 3 0. So, up to 999 credits and we have declared that department name is a foreign key referencing department. So, very similar to instructor. So, that is easy. Now, the next table is the takes table which says which student has taken which section. Now, as a reminder a section is identified by what all which course it is for which year and which semester it is running in because the same course is repeated every year or may be in multiple semesters in a year and last of all a section I D because there may be two or more sections of a course within the same year and semester. So, there are four parts. So, we have listed out all these here. So, the first one is I D which is the I D of the student again for consistency it should be care 5 if the I D of student is care 5. Then, we have at the identifier of section which is course I D second I D semester in year each of which has a type. Note that we called year to be numeric 4 0 those of you who are old enough will remember the hoopla about the y 2 k problem back in the late 90s when people predicted all computer software is going to crash. The origin of that was year was declared to be two characters just the last two digits by programmers in the 70s and 80s who thought 2000 would never come or they would be retired by the time it came or those of you who have seen the movie 2012 assume that the world will end by 2012. So, it does not matter what happens after that, but realistically we do need at least these four. We have postponed the problem to A D 99 99 and pretty sure none of us will be alive at that point. So, we do not care beyond that SQL does not have a type called year per se although it has a type called date, but then that date also includes a month and a day of the month it is not just year. So, we decided to call it numeric 4 rather than use the date type in SQL. Then there is the grade attribute which we have declared to note that we are allowing this to be null. So, we have not declared it to be null and we have several integrity constraints here. The first one is the primary key wait there is a mistake here this primary key I will give you a minute to think about what is the mistake in this slide. Look at the primary key declaration and see what is wrong there the primary key has I D it has course I D semester year what is lost in the middle the section I D was omitted by mistake it is very much part of the primary key now why is or should it be let us think for a moment should it be part of the primary key or should it not. So, what is a primary key it is a minimal super key. So, first of all the takes says that this student takes this section. So, if you want to uniquely identify the section we certainly have to store the section I D in there. So, the question is should the section I D be part of the primary key or not. If you think about the particular domain does it make sense for a student to be registered for two sections of the same course in the same semester surely not that does not make very much sense the student will take one of the sections of the course, but he cannot concurrently take two sections of the course. So, although the section I D is required otherwise you would not know which section the student took it is not required to uniquely identify a row in this table because for a particular student year semester and course there can be only one section I D it cannot be two different section I D is within that. Therefore, we could have omitted section I D from this from the primary key declaration probably would not have hurt if we put section I D in the primary key declaration. In fact, that would be the default when we look at way we do relational database design that is often the way it would get generated by default, but here we have actually realize this constraint and removed it preventing thereby a student from registering for two different sections of the same course in the same semester which is probably a constraint which should be enforced we do not want allow students to multiple register for two sections of the same course. So, the next one is so actually that is no a mistake in this particular slide is actually correct I guess I need a little bit more caffeine to keep up with my slides. So, now, moving on to the next constraint it is a foreign key constraint the foreign key constraint the first one says that I D references student. So, the takes relation cannot have an I D value which is not present in student. So, it cannot have a junk I D value over there. So, that is obviously required and what attribute of student does it reference well it references the primary key attribute which is I D of student. So, in this case again the two attributes have the same name this is not a coincidence you would often give the same name for an attribute and the referencing attribute because it means the same thing the last foreign key constraint make sure that if you have a particular row in the takes relation it must belong it must correspond to a section which is actually running in that year semester otherwise you can say that a student registered for CS 101 in a semester in which it was not even being offered then that is junk data. So, the constraint to prevent it says that course I D sec I D semester year references section. So, that is a foreign key constraint and here is one more table which is a course declaration course I D is as expected the primary key we have decided to allow 8 characters for course I D. This is the title every course has to have a title and we have restricted it to 50 characters may be because we want to display it meaningfully on a screen, but probably we were too stingy here. These days it does not make sense to be too stingy you might as well make it longer. Department name which is a foreign key referencing department and the credits the number of credits that the course carries note that this is for a course the same course may be offered multiple times, but every time its credit will be the same its title will be the same it cannot change just like that. If it does change what do we do well that is a update to the course and to model that we will probably have to keep some kind of history with the course. We will say that in the period 1999 to 2008 the course had 3 credits after that the credits was increased to 4 may be these kinds of changes do happen. How to deal with it? Well we can make more complicated schemas to deal with it and we will actually see a little bit about how to model such temporal aspects later as part of the database design process, but again to keep our life simple we are only modeling the current state. Currently this is the credits that the course has we are not going to model history. So, that was a quick introduction to the DDL. Now here are a couple of quiz questions. So, in the interim please read the question and figure out the answer for both the questions question 1 and question 2 and think about it and you can and make sure you press this T key and then wait till we tell you go ahead and answer the question. So, now we have received the responses and as you have seen in Konbanega Karotpati the audience is generally right and as usual the audience has one here handsomely. You cannot see the results I think from your side it is not set up properly here for that, but the overwhelming majority chose the third option which is the right answer. A few people chose options 1, 2 or 4 and for those of you who did that was probably just the confusion from the way the question is phrased because A was repeated across tuples the primary key constraints ensures the key values are repeated of course not. So, that is false it ensures they are not repeated and the second part was a not null as we just discussed it ensures they are not null. So, that part is true. So, it is false true. So, now let us move on to the second question and when I tell you you can enter the options. Time is up for the question in minute or so we will have the less than a minute in 10, 20 seconds we will have the response, but let me explain the answer here. So, the foreign key constraint on course ensures what that all departments have associated courses no it does not what it does is it says that the course must have a valid department associated with it. The second thing says department name is not null that is all courses have departments. In reality the foreign key constraint in SQL does not ensure that a value is not null I did not tell you about this explicitly. So, perhaps some of you could not have answered the question you are not sure, but it does allow department name to be null in the referencing relation. In the referenced relation department name is a primary key. So, of course it is not null there, but in the course relation you can set the department name to null and it will be accepted. So, it does not ensure that all courses have departments. The third answer is that department name occurs in the department relation that is the right answer. Although it is slightly misphrased it should have read that department names either occur in the department relation or are null that would have been the correct phrasing. So, maybe some of you figured the right answer is none of the above. So, let us see the answers this time there has been more choices partly because I asked you a question on something I had not taught you. So, that is a good thing in a way because it helps me understand how much you know already before coming into this lecture that was something which I had intended to have a small quiz on. So, I will again revisit this may be tomorrow with a quiz on your background knowledge. So, what I have seen here is that the winner by almost the two thirds majority actually not even two thirds a slight majority is the correct answer C. So, audience again wins but this time a few people quite a few people chose one or two. Two I can understand one people have got confused because it is flipped. The foreign key constraint is on course it ensures that the course has a department. It does not ensure that the department have a course you can very well have a department with no course even though there is a foreign key constraint. If you wanted every department to have a course unfortunately it turns out you cannot express this in SQL using foreign key constraints because the course ID in the course relation course ID is the primary key department name is not a primary key you can have multiple courses in the same department. And SQL does not allow you to have a foreign key referencing anything which is not unique. So, there is in fact no simple way to ensure that every department has a course. Course has a department is easy to enforce. If you want to ensure that the department name is not null you would have to say department name not null and foreign key department name references department then we can be sure that 3 is exactly satisfied. So, now that was how to create a table. Now, if you made a mistake in creating a table you could either delete it and start again or you could modify it. Now, to remove a table completely you say drop table followed by the table name. Now, databases will not even ask you to confirm it they will just drop it that is a little dangerous if you drop a table it is gone you cannot get it back. Well not quite oracle from I think version 10 onwards has a feature which saves a backup copy of the table somewhere. So, it is possible to get it back, but on other databases typically if you drop it it is gone. So, be careful with it. You can also alter a table to add attributes add constraints into drop attributes and drop constraints. The support for this varies a little bit by databases. Many databases will not allow you to actually drop an attribute. So, but they will allow you to drop a table. Again dropping a table may be restricted in several ways. If you had a foreign key from course to department if you say drop table department it will say sorry there are foreign key referencing department I cannot drop it. Similarly, if you try to delete tuples from department if a particular department is being deleted you delete computer science department, but there is a student or an instructor in that department then the foreign key will be violated. Therefore, the system will say sorry I cannot delete this department. So, if you really want to get rid of the department what should you do? First you have to get rid of all the instructors, all the students, all the courses which refer that department only after that can you delete that department. Of course, in real life you do not go around deleting departments, but there are situations where this is required. Unless of course you want to just clean out the whole database in which case what it means is you should first drop the tables which are referencing namely instructor, student course and only then drop the department table. So, that was the quick introduction to the data definition language. Now, let us move on to queries and that is what we are going to focus on for the next several hours today and tomorrow. A typical SQL query which many of you would have seen already has the form select from where and then there are some other classes which are optional. In fact, the where class is optional, but most queries do have a where class. So, what are these things? We say select and we give a list of attributes. So, this corresponds to the projection operation we saw in relation algebra. It says which attributes of the tables do I want to be output in the result. So, it is selecting columns. If you want to select certain rows that is restrict which rows should be output that is going to be specified in the where class which is the last class. So, in this case p represents a predicate. The from class is a list of relations which we are going to use in the query. Now, if I just give a list of relations in the from class r 1, r 2, r 3, the from class actually takes a cross product of those relations. It does not do a join. It takes a cross or a Cartesian product. By the way, cross product and Cartesian product mean the same thing and different people use both these terms and I end up using both depending on nothing randomly. So, just remember that they mean the same thing. So, the from class basically lists the relations and if you want to think of it in relation algebra terms, it takes a Cartesian product. Then the where class selects certain combinations of the rows as specified in the query and finally, the that is a where class predicate. The select class outputs specified columns of the query. So, let us see a queries by example. It is easier to see the examples rather than to understand what I just told you. So, the let us take it class by class. The select class lists the attributes which you want to appear in the result and as I said it corresponds to the projection operation. So, if I want just the names of all the instructors, I can say select name from instructor. So, what are the other attributes? We have id, department name and salary. Those are all not going to appear in the output only the names are going to appear. It is worth mentioning here that SQL language the constructs in there the select the key words like select from where as well as the relation names and attribute names are all case insensitive. So, I can do select name with capital N alone. I can use all caps or I can use all small it does not matter they mean the same thing in SQL which is a little different from other languages. You cannot do this in C or Java those names are case sensitive. Now, in SQL there can be relations can have duplicates in the input that is you can have unless you declare a primary key SQL will not object if you have two copies of the same term. You insert twice the same term there will be two copies unless you have declared a primary key in which is the second one will be rejected. But even if the input does not have duplicates if I ran the previous query select name from instructor if two people have the same name there will be two copies. To show this even better consider the query select department name from instructor that is going to list all the department names in which there are instructors. Now, it should be clear that if a particular department does not have any instructor that department name will not appear. What is less clear is what happens if a department has five instructors and the default in SQL is the department name will be output five times which is probably not what you intended you want only one copy of it to come out. So, what you do is add the keyword distinct in there. So, if I say select distinct department name from instructor every department name will occur at most ones the no instructor it will not appear at all could also say select all department name which ensures duplicates are not eliminated but that is actually the default. So, you do not have to specify a few more aspects of the select clause. If I use a star in there it say select all the attributes. So, select star from instructor selects all attributes of instructor. So, there are four attributes in this case there are four attributes all will appear in the. You can also say select instructor dot star if there are multiple relations you can say all attributes of instructor all attributes of any of the other relations which appear in the from clause. The select clause need not just output an attribute as it can be an expression. I can say select salary times 12 if the salary was a monthly salary and I want to compute the annual salary. I will say select salary star 12 from instructor or if salary was supposed to be the annual salary I can say select salary divided by 12 to get the monthly salary any expression is possible. So, for example, if I say select one from instructor what is going to happen I am going to get the tuple containing just the value one as many times as there are instructors. If I say select distinct one from instructor I will get one tuple containing the value one assuming instructor has at least one instructor if there are no instructors nothing will come out. So, you can have any expression in there. So, now let us have another very simple quiz we set up. So, let me explain the question while we set up the quiz question. This question says which of these clauses is optional in an SQL query? The first clause is the select clause. The second option 1, 2, 3, 4 again correspond to A, B, C, D select from where on none of these. So, which of these clauses is optional? Good. So, now the number of responses has gone up we have had 140 responses out of 289 clickers up from about 90. So, quite a few have got it working. The answer of course is C as we already discussed and almost everybody has got it right. Now, let us move on to the where clause. This clause defines the predicate the condition which the result must satisfy. So, if you start with a single relation it is easiest to understand. So, if I want to find all instructors in computer science who earn more than 80,000 these are the highly paid guys. So, how do I do it? I can say select in this case the English query often does not say what attribute to select it says find all instructors. This has to be interpreted appropriately. In this case we interpret it as find the names. So, the query as shown here says select name from instructor. Remember the query said computer science instructors with salary greater than 80,000. So, the query where clause has where department name equal to comp psi and salary greater than 80,000. I should also mention that the strings in here are case sensitive on most database systems, but certain database system like MySQL and I think SQL server also allow the string comparison to be case insensitive. So, if I have comp psi with C and S capital here and in the database I have comp psi with C and S small in the SQL standard and in most database systems they will not be equal. However, on a few database systems which do not quite follow the standard they will be allowed to be equal and certain systems I think SQL server let you configure this whether you want it to be case sensitive or case insensitive. So, the default I think is case insensitive on some of these systems, but in the SQL standard it is case sensitive. So, we have seen a comparison here. You can combine these with and or not the usual way. Now, let us come to a from clause with more than one relation. So, this is required when you want to combine information from two relations. So far our queries were very simple, but in the introductory chapter we saw a query which combine information from two different relations. So, what do we do in that case? So, here is the from clause query which combines instructor and teachers. This query is not a very useful query, but it shows what you can do. It is a select star from instructor comma teachers with no where clause. Because we do not have a where clause and remember the from clause does a Cartesian product. I am going to get every pair of instructor teachers tuple whether that instructor taught the course or not. And if there are a lot of instructors and a lot of teachers records, every pair will come. If I have a thousand and a thousand I am going to get one million results here. Now, conceptually an SQL query computes the cross product in the from clause and then applies the where clause if it is present on the cross product. Practically it is a stupid idea to compute the cross product and then apply the predicate. For example, in this case what I no doubt wanted was the I D of the teachers rows to match the I D of the instructor rows. I want only matching I D's. I do not want rows where instructor I D does not match teachers I D. Remember that teachers links an instructor to a course section. So, both of these have I D and the I D value should be the same teachers and instructor I D. But that is not there in the where clause. In this case all pairs are going to appear in the result. But it is not although it is not useful once we add that extra condition it is going to be useful. So, here is a slide which shows the effect of Cartesian product if you do not apply any more condition in the where clause. So, instructor and teachers the number of results is going to be very large. So, in this slide we do not have space to show all the results. So, we just shown a sample of results. So, you will notice in this result that instructor I D 1 0 1 0 1 is linked with 1 0 1 0 1 here. But it is also linked with over here this instructor is also linked with 1 5 1 5 1 and with 2 2 2 2 2 and with every other record in the teachers table which we have do not have space to show. So, we have shown dot dot. Similarly 1 2 1 2 1 is linked with 1 0 1 0 1 and every other value including 1 2 1 2 1 which appears further down we have not shown it here. So, the Cartesian product is huge, but once we add this condition which is select in this case we want name and course I D from instructor teachers where instructor I D equal to teachers I D what are we going to get we will get only matching rows. So, now we are able to get name and the course I D earlier what did we have teachers had did not have the instructor name it did not teachers had the I D of the instructor. So, we could get the I D of the instructor and the course, but we could not get the name. Instructor had the name, but did not have the course. So, we had to use both of these relations and combine their information and the restriction was the I D value should be the same. So, that is what we have put in the where clause over here. Now, the number of rows is going to be how many for every teachers tuple it will match with 1 instructor tuple because I D is the primary key of instructor. So, you will get as many rows as there are teachers tuples in this we are not going to get a huge explosion of rows. So, obviously a database system given this query will not actually create a cross product and then remove all the extra rows that would be very very inefficient. We will see later how to efficiently evaluate a condition like this by matching only those rows which actually match and avoiding matching those which do not match. So, you can actually do this very fast instead of computing millions of intermediate junk results which gets thrown away. So, that was a very simple query. Now, here is another query little bit more work. This says find the course I D semester year and title of each course offered by the comp side department. So, now the first job in anything like this is to understand where all this information is to be here I have given the query here partly because a view does not support animation, but otherwise I would have just shown you the English specification without showing you the actual SQL query. So, in order to write the SQL query you should make sure your students understand this. The first step is to find out where all there is information required in the query. In this case the course I D semester year and title are available in two different relations. The section information tells us which courses are offered in which semester and year and that includes the course I D. Unfortunately, the section relation does not include the title. On the other hand the course relation has the course I D and the title, but it does not have any year or semester because that is not part of the course definition. The section is the one which has it. So, we have to link information from these two tables. Now, your schema diagram is going to be very useful to understand how to write the query which is why I have been saying you should have a copy of the schema diagram with you. So, please make sure you have a print out before the lab session. So, now if you look at the schema diagram I have just shown the part which is relevant. This is section and there is a course and there is a line from section to course which goes from course I D here to course I D there. What does that mean? It means that this course I D corresponds to this course I D here. So, we can enforce the condition that section dot course I D equal to course dot course I D. So, we take the cross product and then add the condition corresponding to this line here which is what we have done here. So, we said select whatever information we want from section course where section dot course I D equal to course dot course I D that will match every section with its corresponding course, but what we wanted is only for comm side that department name is where is it in section? No, the department name is in course, courses have an associated department. So, even though department name is not required in the result it is required in the where clause and it is taken from course. So, where department equal to comm side note something here I did not say course dot department name why? Because it is not ambiguous only course has the valued attribute department name section does not have department name. On the other hand course I D occurs both in section and in course it is repeated. If I just say course I D the system will not know which course I D I meant. Therefore, to make it unambiguous I have to say section dot course I D equals course dot course I D that makes it unambiguous. So, where it is unambiguous I can drop the relation name, but where it is ambiguous I have to include the relation name. So, we are almost to end at the end of the lecture. So, I am going to stop this session here in terms of slides, but I am going to do couple of things. I am going to take a little bit more of your time partly to just give a quick look ahead. We also have soft on natural join if you know it you can use it in today's lab, but if you do not what we have so far is enough. We also have soft on the rename operation again that is not required for today's lab, but I will cover it later. There is some soft on string operations. So, I do not think this is quite required for today's lab, but I will just introduce you to the like operation in case you find it useful. The like operations looks does string pattern matching with percentage characters. So, like percent D or percent will match any string with D or anywhere in there it will match Sudarshan it will match anything else with D or in there. I will cover the other things in there later on. Then there is an order by clause which you can tag on to an SQL query to get results in a specified order you may find this useful in your assignment. And there are a few other where clause predicates I am going to skip that duplicates again you do not need for today's lab. I think you do not need this for today's lab exercises, but just in case I will tell you this part. So, the set operations let you take the result of two queries and then union intersect or set difference. So, in this case it says find courses that ran in fall 2009 or in spring 2010. How do you do that? First of all how do you find what ran in fall 2009. So, select course ID from section where semesters fall and year is 2009. Similarly, for spring 2010 and now I can take the union of the two to find those which ran in either of those semesters. If I want to find those which ran in both the semesters what I do I take an intersection of the same two things. And lastly if I want to find those that ran in fall 2009, but did not run in spring 2010 what do I do? I say the first one except the second one except is the SQL notation for the set difference of minus operator. Again oracle let you do this by using the key word minus not except, but the functionality is there. One last quiz question for today. So, here is the question. The answer to that question was which one? So, as we just saw if you do not have a where clause it takes a cross product of all the tuples. So, the answer is 3 which returns all pairs of instructor teachers. This time again the majority as usual is right, but quite a few people chose answer one which says returns all matching instructor teachers pairs. It returns all pairs not just those that match even those that do not match. What does matching mean? In this case the matching is on the common attribute which is id. It returns even those which are not matching. So, one is not the right answer. It is not a syntax error. It is perfectly legal to remove to omit the where clause in an SQL query. So, it is not a syntax error and so 3 was the correct answer. Let me move on to the last item for the morning well there are two items last but one item for the morning session which is how the labs are going to be conducted in the afternoon what you will be doing in the lab. So, all of you hopefully have access to the Moodle your coordinators have had access for a while. The participants should also have access to Moodle maybe some people who came in late today you know on the spot registrations may not be registered on Moodle. You can share the thing with somebody else, but what you need to do is in the afternoon log in to the Moodle site and see the things to be done today afternoon. So, configure per SQL by setting up passwords adding users with passwords and so on. The last part of today's lab is to write a few simple SQL queries using the PgAdmin3 tool. So, first of all you have to get the tool running hopefully it has been set up for you if not your coordinators should tell you how to set it up. Setting up is not a big deal it is just a question of downloading it and executing it. So, once you have PgAdmin running you have to set up a connection to the database again those instructions are there on the Moodle site which you should be able to view. So, post SQL requires a lot of configuration including allowing other machines to access it. So, all of that is there in the instructions and you should do all of that then connect from PgAdmin3 and run a few SQL queries which are already provided to make sure everything is working. Once you have reached that you can start writing your own queries in SQL. So, we have a few maybe 5 or 6 fairly straightforward SQL queries nothing very complicated based on what we have covered so far today. So, please write those and execute those now when you are done with that we would like you to upload your solutions. So, that your center coordinators can actually correct them afterwards. So, now how do you upload on Moodle your center coordinators can show you a demo of how to log into Moodle and how to upload your answers. So, basically you will have to create a file as you execute the queries on PgAdmin3 make sure they are running you can copy paste them into a file containing the question one answer question two answer and so on. And when you are done with everything you can upload the solutions. Now, to help you in making sure your query is correct we will be providing us the expected answers on a sample schema why this is something I forgot to mention one of the steps you have to do today is to load a sample schema and sample data the university schema is provided in a file which is available through Moodle. So, you have to download that file and execute it on PostgreSQL to create the schema first and then another file which contains data. So, you have to execute that. So, both of these can be done by copying the whole file contents select all copy paste it into PgAdmin3 and execute then clear the PgAdmin3 screen then copy paste the data and execute it. So, that will have your sample schema and sample data set up we will provide a file which contains the expected results for each query on this sample data. So, if you made a mistake your answer will differ. So, you can check now if it does match that is not a proof that your answer is correct you might have missed out something, but it is so happened answer was still correct. But, hopefully simple errors will be caught with this amount of testing it is not really enough testing for a real database, but for this purpose it is good enough. So, you have to upload it. Now, we realize that you may have trouble finishing the exercise on time today because of problems with setting up PostgreSQL which may take a little more time for some of you. So, we have actually given more time the deadline for submitting the assignment is actually day after tomorrow morning. So, if you could not do it today during the lab session you can still do it in the evening and if today evening for some reason is not possible tomorrow evening is still available, but day after tomorrow morning is the deadline. So, that your local coordinators can correct it and give you feedback in the early years. So, coordinators please note this deadline. If you feel this is too much time and you want the submission to be earlier let us know we can you can tell your local participants your own deadline which may be earlier than what I am specifying. The deadline I am specifying will be enforced by Moodle local deadlines will be enforced manually by your coordinator. So, that is for the lab session today and finally, let us try out the question answering part. Now, anyone who has a question please use the A view software the coordinators have to do it. So, participants please tell your coordinators the coordinators can flag that question is to be asked. The question was how do you install PostgreSQL? So, to answer your specific question how do you set up the PostgreSQL server? There are two parts one is the Moodle site has detailed instructions when you log in in the afternoon you will see the instructions. The second part is every center has one coordinator who came to IITB earlier on in October and they have done the hands on installation here. So, they should be able to help you with it. Now, to give you a little more specific answer I hope all of you are running Linux Ubuntu or Fedora distributions. So, on these with the software package manager synaptics on Ubuntu and YAM on Fedora you should be able to just get PostgreSQL installed with a very simple specification saying please install it. If not you will have to download PostgreSQL binaries and execute them or PostgreSQL sources and compile them which will run for other operating systems also. Once you have got PostgreSQL up and running you have to configure it. You have to configure if you do not configure it PostgreSQL by default will not accept connections from any other machine except the machine it is running on. So, nobody else will be able to connect to it only you which may be ok for the lab, but when you run this in your university students would not be able to connect. So, you have to configure PostgreSQL to tell it to A accept connections and B which machines to accept connections from. So, there are instructions on how to do this. The next step is to create users. There is a default user called Postgres which you should not be using to actually load data and run queries that is used only as an administrator account. So, you have to create user accounts again the lab exercises tell you how to create a user account and how to connect to Postgres using pgadmin 3 from that user account and then you can execute the sample DDL the sample data and then the queries which we have provided. So, that is the sequence of steps for the lab. Does that answer your question? If you have a follow up question from the same place Baramati please ask. My question is in the rename operation you are given that the keyword as is optional. The keyword as is optional and maybe omitted how can we do that? Two parts to answer A I have not really covered the rename operation yet in detail I will be discussing it in tomorrow's lecture. But, since you ask the question omitting it means just write the same query without the as. So, instead of you know relation name as some new name just say relation name new name. So, but I will cover this tomorrow in more detail and in fact oracle requires you to omit the as clause it does not even support the as clause. But, anyway this is for tomorrow today's exercise should not require you to do that. Let us try again Anna University. Let us see if you have got through yes I can see you is there any question from Anna University. Sir, can you give us more examples on joints or different type of joints? Yeah. So, please switch off the mic when I am answering yes. So, I will be covering a lot more queries with joints today as morning session we have not had much time to cover that. So, you will see lab exercises which will let you play around with it. One thing I want to note here is any query which involves a join the first thing you will notice the specification will require attributes from multiple relations. So, first step is identify them in the schema diagram you have with you. Then the second step is to connect up those relations in most cases you will have a direct connection between those relations. But, sometimes you will have to add an extra relation in between I do not think you will need it in today's queries, but we will see more complicated queries later. And so, then you have to write a query with listing them in the from clause with the conditions connecting them in the where clause. Now, in tomorrow and day after lecture I will be covering a lot more queries with join auto join and so forth. So, if Anna is done I can move on to one more center. If we can achieve the same results using Cartesian products and with condition why make use of joints. So, I have not covered the connection between joints and Cartesian products in great detail, but there are two things I want you to note. In the syntax we have seen so far the natural join operation I have not covered in SQL. I mentioned it in the context of relational algebra, but I did not cover it yet in the context of SQL. Tomorrow we will see how to write natural joints in SQL, but for today's lab do not use it because you should be familiar with both ways of doing it. For today's lab as I told you when I covered relation algebra you can cover the you can express the natural join operation as a combination of Cartesian product followed by a selection and a projection as required. So, the example queries which I used today did exactly that I listed you know instructor teachers in the from clause which is conceptually a Cartesian product followed by a condition in the where clause which force the IDs to be the same and then I got projected out whatever attributes I wanted in the select clause of the SQL query. So, for today we are going to stick to this syntax and the alternative syntax we will see tomorrow. So, you can do it either way SQL supports both you know using the natural join operation straight away or expressing it in this two step fashion which you want to use we are going to actually discuss the pros and cons when we cover natural join does that answer your question. How can the referencing variables can have the null value if it is not in the master table same was the case with option 2 in the department name how can it be null if it refers to the master table where it cannot be null. That is a very good question. So, the question was how come you know we declare something as a foreign key and SQL allows the value to be null that seems to be wrong and that is a tradeoff which SQL had to make should it force the referencing value to be not null and to be a meaningful value every time or do we want to say look if you know the value it had better be in the table, but we still allow you to say that I do not know the value. SQL chose this option which said if you know the value it better be in the referenced table, but we will allow you to say I do not know the value that option is left on by default. The schema designer can force this choice by declaring the value to be not null. So, that is part of the schema design phase where you decide if you really want to enforce that it should not be null declare it to be not null, but SQL does not do that automatically. So, SQL leaves the choice to you gives you the control you meaning the schema designer has the control to do it either way. Does that answer your question? Thank you. Now, I will go to one last center I am switching away from Srinagar again. I wanted to ask you sir what do we mean actually by atomic attributes that is can you give an example of such attributes which are atomic and non atomic. That is a good question atomic attribute what is not. We actually will be covering that in a fair amount of detail when we come to database normalization because the first normal form really forces values to be atomic. So, when we discuss that I am going to cover in more detail what is atomic what is not, but for now to keep us going until we reach that an atomic attribute should not have a set of values. So, a set of four numbers is not atomic and what else if it is a single value is it always atomic we will discuss that later. There are cases where you can have a single value and still treat it as non atomic, but I do not want to confuse people right now because we are going to cover it in detail later. Does that answer your question. So, can we say that majority like say all single valued attributes are atomic. As I just said it is not the case that all single valued attributes are atomic you can have single valued attributes which are not atomic as an example in our schema itself. We have course code such as CS 101. Now, it depends on how you view this. Supposing you have an application program which strips out the first or an SQL query which strips out the first two characters of the course code and says it is CS therefore it is a course from the computer science department. That query or that program is treating the value as non atomic it is breaking it into parts. So, first normal form does not want to have values which are non atomic in this fashion and in fact it is a bad idea to be breaking it up like that. So, what we do instead is along with the course code we store the department of the course. So, instead of pulling out a few values from there we store it as a separate attribute which we will use in our query. So, we could have abused the same representation and treated it non atomically, but we do not. So, it is a question it is not just the values which are stored, but how it is used. So, atomic values are generally I mean single values are generally atomic unless they are abused and treated as non atomic. Now, any further question clarifications please wait for the normalization. So, I think we have reached one o clock you need at least an hour to get back for your lab. So, we will stop here. Thank you.