 But, in this particular case, what we wish to do is to relate to the syllabi that is used in the universities actually, relate to the question papers that are required to be solved by students in your exams and additionally also provide some glimpse of IIT style teaching and labs. So, that when you coordinate the workshop for the teachers, you can sort of encourage them to think both on what they are required to do at the ground level, additionally what better practices can be followed, that is the general idea. Needless to add, our ambition is to engage 700, 800, 1000 teachers at one time, not for just one workshop of two weeks, but continue to engage them later. As many of you would know, we now have email IDs and connectivity to about 1300 teachers who teach programming and about 200 of them have been in touch with us on one pretext or the other and when we launch the subject portal, that incidentally is the objective for every workshop that we conduct for a subject, that within six to eight months of the main workshop, we will launch a national subject portal which will contain all open source contents, including the question banks, including the lecture material, whatever is covered, including the coordinators workshop, recorded lectures, main workshop, recorded lectures, etcetera. So, that students and teachers all across the country can benefit and additionally, we want that portal to be a forum where people will continue to interact with each other. So, we will be permitting interactive discussion forums, discussion sessions, mechanism for people to continue to contribute additional examples, additional problems, etcetera, for which we will require a strong team of editorial board for each subject. So, needless to add, some of you would serve as editorial boards for that programming subject, some of you will serve for database subject and additionally, from amongst these 200 teachers for the first subject which is computer programming and hopefully, another 100, 200 teachers from the database community who are teaching elsewhere, we will have to pick up additional faculty members who will join us in becoming the editorial board to sort of manage and enhance the contents. So, it is a very large dream project. I was asked very sarcastically by somebody that are you suggesting by this national mission workshops, you will overnight change the quality of education across the country and I said I make no such claim because nothing of that sort is possible at all. But what we believe is not just the creation of open source contents, but engaging teachers in the long term will be beneficial because every teacher will perhaps go back with a little bit extra knowledge, extra mechanism, extra methodology and that should reflect in the students' quality. So, that is the general ambition. Unlike in the computer programming where although the basic syllabus used to be C programming in most places, some places C, C plus plus as we follow in IIT, but in spite of that the books followed the question paper style, etcetera, etcetera, was widely different as some of you will recall. Fortunately, in databases at least the syllabus appears to be very similar across the country. I was very pleasantly surprised to find that Sudarshan, God's Silber Shard, Sudarshan is a book which is mentioned invariably either as a text or as a reference in most places. So, that means that we have automatically some kind of a standardizing. And therefore, we have decided to follow this book completely. The slides for this book have been kept on the slides page. So, the slides are also accessible as it is. We had a worry whether we will be able to distribute these because there is a copyright. Just as I cannot distribute soft copy of the book, the book has to be purchased. Fortunately, the slides what Professor Sudarshan has talked to is co-authors and they said that as long as there is some mention that these slides are supporting material for this book, then they have no problem in distributing it in open source. So, we are fortunate and yet we will have to construct for those of you who have participated in the earlier workshops will know, but others, these teachers who come for the December workshop not only attend this two week workshop, but we make them into teams of three or four teachers and we ask them to do an additional assignment for two weeks after the conclusion of that workshop. The certificates for the ISD workshop attendance is given only after they complete that assignment. And that assignment will take various forms, but the most preferred form is that that team is required to set model questions and prepare model answers and contribute that. We have collected as many as 1800 questions and answers from the programming workshop. Unfortunately, we were not very careful in setting down the quality standards. So, when these questions came in as the submissions, we of course gave them certificate because people had work, but we found out when we examined in several questions the English language itself there were mistakes. In model answers, some of the programs would not run. Now, that is a quality check if we had advised the team that look, unless your programs work and unless your question English is good, you will not get the certificate or some such thing. Perhaps some people have been careless. We spent almost a semester full with about 30 teaching assistants to cleanse this material. Unfortunately, they have been able to clean only about 650 questions and answers, but that they have done, they have converted that question bank into an XML format. We will be standardizing that and we will be using that format now with additional checks and balances for ensuring that our colleagues who attend the workshop in December produce a better quality work because all of it has to go into the open source and it will carry their names as well as your names as well as our names and I think all of us should be careful about what the word sees with our names behind. So, with that let me just start with the other topics. Basically, SQL is the lingua franca for databases. All of you are familiar with the evolution of databases. Let me very briefly summarize that. If you recall the history of programming when the first programming language came into being which was Fortran in 1956. At that time itself people felt that there has to be a special purpose mechanism to handle data management jobs and that is how common business oriented language or COBOL was born in 1960. In fact, there was a committee, Kodasev, which actually designed or decided on design of the COBOL. COBOL did manage data in the sense that it permitted us to handle large volumes of data and files. It defined index files. It defined relative files. It defined sequential access files. So, data management on the disk was handled there. Data management inside the memory was also handled better because structures were defined for records. Records could be described and a whole lot of processing capabilities which are required for data processing were made available. Yet the data volumes grew so rapidly and the need to model complex situations yielded a very large number of files to be handled. So, what we call join of two files was actually felt as a dire necessity when you did not have join algorithms. So, people used to do joins using COBOL programs. And soon the need came about that you need something stronger for data management and that is how the Kodasev first model of network databases came about. You had a hierarchical database then, but everything changed when in early 1970s, 1972. In fact, when Professor Kort's paper on relational algebra led to the IBM putting in lot of effort in building actually a relational database. So, when DB2 was being built, which is world's first commercial database, simultaneously several universities were involved in building database management system based on the relational model. One of the earliest implementations in the university system was Ingress, as you will know. But what came out of all of this column term was that the structured query language got standardized as the language in which to express your data management needs. The underlying model was a relational model and therefore, relational databases since then have been ruling the rules. As far as converting your requirements for data processing and data management from the real world into the realm of relational databases is concerned. Various models have been used. ER model or entity relationship model was amongst the first ones. Later, as object oriented paradigm took over, people started using UML. But ER model by and large still continues to be the basic model through which we try to explain the fundamental concepts of modeling to students. And I find that ER model is a part of most of the syllabi. So, we have in this session, as Professor Sudarshan said, we are not going to teach, we are going to glance over the portions and how we are going to discuss that in a larger workshop later. So, we have sessions for ER design. That is one aspect which is often not conducted in great rigor. Similarly, SQL, while SQL is taught, as Professor Sudarshan said, we would like you to begin with and through you, the other teachers whom you coordinate in December to do a lot of rigorous SQL program. I will tell you what I have found. These I have found in the computer programming course also and this I am finding in database courses also. In fact, most of the computer science courses, the major difference between the IIT, NIT or some of the better colleges from which you come and the other colleges is that as Professor Sudarshan said, practical subjects are taught like theory subjects. So, SQL queries are written on board. Now, unless a student has executed a few SQL queries, unless a student has made mistakes, it is impossible for us to learn. All of us learn through mistakes. But there is no provision for making mistakes because there is no facility. Now, in places where database courses have some associated workshops, the assignments which are given, the lab assignments which are given are pretty mundane types which does not really teach people anything beyond the very preliminary. So, therefore, our emphasis will be to do as rigorous a lab as possible. And as a glimpse of that, we have tried to organize labs here in the afternoon. Of course, there is an additional thing because of the standardization. As you said, what do you standardize? Some people use SQL server. Some people use not even a database. Some people demonstrate that using XS or something. Some have Oracle. Some have Ingress. Some use MySQL. Some use Postgres SQL. Using open source as one advantage, there is no licensing cost. So, as we did in the computer programming, we decided on a Linux environment and we used GCC as the open source compiler. Similarly, we will be using Postgres SQL which has a PHP admin graphical interface. So, it is quite good for explaining to students the basics. And it runs on the Linux environment with which most of you are familiar. Most of the remote centers have Linux environment. So, it should not be difficult to do that. SQL is the main state. And therefore, we should be very thoroughly comfortable in SQL. And that is why we will very quickly review the structured query language right from the beginning. So, with that, let me go over to the introduction of SQL. As I said, this is more like a refresher. We presume, of course, the basic entity relationship model where we have an entity described by attributes. And although we will be covering the normalization and other aspects during your design, but the basic model we are all familiar with. You have an entity. You have attributes of that entity. And you typically model that entity in relational database by translating it into a table. So, you say create table such and such thing and all the attributes that you have, you say these are the attributes of that table. And of course, you want to know how exactly these are represented, whether character, numeric, etc., etc. And if you have such multiple tables, how do you extract information out of these multiple tables? So, that is essentially the structured query language. So, this is the introduction to SQL chapter. The slides for chapter 3 are there. You have an overview of SQL query language. The data definition language or DDL is the fundamental mechanism through which you define tables. Then you discuss, we discuss the basic query structure, sub-original operations, set operations, null values which hold a very important place because we have nothing like null values in conventional programming. So, if you say marks of a student, so if no marks are given because the student has not appeared for exam, in the conventional programming language, we still do not know how to represent that. And different programmers will choose different mechanisms of represent. Because if you say 0, that idiot might actually have got 0. So, that is not correct. So, some people say minus 5 means not there because it is a numeric quantity. So, you have to put a numeric value. All these nonsense is taken care of by defining a null value formally and by actually implementing that null value in the stored database by putting a flag across such fields saying that look there is no value there. We have aggregate functions, nested sub-queries and of course, the transactions on the databases by which the values are modified. So, here is essentially the history of SQL. As I said IBM San Jose research laboratory and professor or Dr. C Mohan continues in fact, to work actively in databases. He has been one of the earliest researchers. Currently, the dawn of databases we call them. There are incidentally only two Indian researchers I know who have a fellowship from IEEE and also a fellowship from ACM. Being a double fellow is a great honor. One is Dr. C Mohan. The other is professor Krithi Ramaband. Krithi works with us. Of course, we are very proud of that. Unfortunately, it is unwell seriously currently and Dr. C Mohan continues to work for IBM. Both of them incidentally are passouts of IIT Madras. And as an added attraction, Dr. C Mohan is a chemical engineer to begin. I am telling you this to emphasize that it just does not matter what our background is. If we decide to master something, we can do that. So, this is the advantage of what I say the human well to say I will do this and I can do this. Notice some few interesting things. The first SQL standard by the way was SQL 86. It was succeeded by SQL 89, which was considered a very major standard at that time because SQL 89 for the first time defined what you call constraints. So, primary key, foreign key, these became definition constraints rather than to be implemented through stored procedures, etcetera, for the first time in 1989. Very unfortunately, practitioners of users of databases continue to write code and some unfortunately, write code even today to check for foreign key violations, etcetera by stored procedures rather than defining these to be the constraints in the database itself. That is something that we have to emphatically tell all our teachers to ensure that they actually use constraints in the definition of database. SQL 92 again was a very major release. Subsequent releases became what is jokingly called Y2K compliant. So, the SQL 1999 standard is not called SQL 99 standard. It is called SQL 1999 standard because year has to be reprinted by four digits. Similarly, SQL 2003 standards. So, these are the current standard SQL 2003. However, most of the database products that you see will certainly implement practically everything which is there in SQL 92 because that is a major release and there are unfortunately some proprietary features which every database product introduces. Now, one thing that we must tell our teachers and who must in turn tell our students is to practice writing database code in a standard form only. Whatever may be the advantage of using this feature of Oracle or that feature of DB2 or that feature of Informix or whatever, there is no long term benefit in using any proprietary feature which is not standard because that means your application becomes non-portable. You cannot move out of this database and go to the other database. And increasingly in the world, it will be required that you keep shifting from technologies and you do not want to rewrite your entire application. So, that is why for example, there is a huge lot of applications which are built using such proprietary tools. Take for example, Oracle. Oracle has a tool called PLSQL. Now, PLSQL is actually, permits you to embed SQL in a very specific programming language which is part of Oracle. Large number of very good applications have been written using PLSQL. But what is the net result? You cannot run it on any database other than Oracle. So, people actually written PLSQL translators but they do not work as well as the native components. On the other hand, the modern technology says that you use JDBC and you use JDBC compliant code. So, there is no code that is written which is not compliant and therefore if you want you write in Java you write in C but don't write in any proprietary thing because such applications can be converted from one database to another. Just to give you a real-life example, you would have heard of NSDL, National Securities Depository Limited. They hold shares in dematerialized form and they hold about 85 percent of the nation's share. Huge this thing. Their application runs on IBM. It was implemented on DB2 on the IBM mainframe and it was written essentially in Kix and Kobol some 15 years ago. But it was DB2 and therefore it could not run anywhere else. Last two years, three years almost NSDL took a major step. They spent more than 25 crore rupees of getting that application rewritten. The original application was developed in Switzerland where a system called SEGA on which the TCS had based the current system. But last three years they spent so much money and now they have a revamped application. Very proud to say that several of us were associated with that redesign which NSDL did and today that application can run on DB2, can run on Oracle, can run on any JDBC compliant database. It has been tested. Currently they are implementing it on IBM mainframe but IBM mainframe not using the IBM mainframe tools. So DB2 is used only in so far as the standard SQL is concerned and all JDBC connectivity is through application which will run on a Unix machine which will run anywhere else. Consequently the hardware pricing which they were offered some four years ago by IBM mainframe has come down to less than half the price today because of the competition. This is what happens when you teach your students to build applications which are technology agnostics. It does not matter today I run here, tomorrow if you misbehave with me I will take my application lock, stock and barrel and run it somewhere else. That must be the approach that we must teach our teachers and our students. So here are the domain types you would all be familiar with them. Care and varkar are mostly used as we know as a fixed length versus a variable length. Of course the variable length will have to have internally some large length permitted. Int, small int, numeric, real, double precision and float n are the numeric types. One is the character type. These are most of the more important data types. Of course there would be additional types which we can which will get discussed in chapter 4. You can refer to those later. Here is how you create tables. So create table. You can define a attribute followed by domain. Attribute followed by domain. Attribute followed by domain. Then you can give integrity constraints 1, 2, 3, 4, 5, 6, etc. etc. And where r is the name of the relation. So each a i is an attribute name and each d i is the data type of values. Here is an example. Very simple. Create table instructor. So instructor is like a teacher like us. Teacher has an id which is character 5. A name which is a 20 character variable. We say not null, specifying that there has to be a non-null value for name. Department name, worker, salary, numeric 8 comma 2. Here is an example of an insert statement which is actually a transaction processing statement in database which manipulates the values. Please note that although structured query language is called so because it is a query language. So most of the query statements that we will discuss and are discussed in a textbook are about querying a database, extracting information of our database. The statements which modify values in the database are very few. So one is insert, the other is update. That is it. So the insert statement will insert values into the instructor table. We are just saying a quadruple of values. So we have id, name, department name and salary as inserted there. Just as an example. This is how you define integrity constraints. North null is one integrity constraint. Primary key is another integrity constraint and foreign key which references some other key in some other table is another constraint. And it is to be recommended that such constraint should be defined as part of the table. A very fundamental difference. For example, when you create table instructor and say primary key is id, it means that id will have a unique value. And it means also that it will be non-null because primary key has to be non-null. Similarly, when you say foreign key, department name, the references department. So here is a department name. There is a separate table. Obviously it is expected. There is a separate table called department and department name is a key there. Primary key typical. Now the fact that you mentioned that department name refers to department name, then I cannot insert a value of department name in this table unless that department exists there. See this constraint looks very simple now, but to check for it used to be a major affair. If you go back to the Koval file days, you could easily have a file containing teachers with department names, another file containing department information, but you might by mistake give Z, Z, Z, Z as department name. Koval doesn't care. The relational databases also before the introduction of these constraints did not care. You could actually put in a department name called Z, Z, Z, whereas there was no such department in the department table. So if you wanted the department name to exist in a department table, you had to check by writing separate procedures yourself, even in database. That is the importance of the SQL 89 and the subsequent SQL 92 standardization, where we said no at the database definition level we introduce this. In fact, the databases have come, have evolved over a very long time. Very early databases actually permitted a lot of direct access to your database tables which were essentially files of some kind by writing programs. So you could write a C program and directly insert something or read something etcetera. Around SQL 89, it was realized that this nonsense must stop. So whatever be the physical organization of the database, you cannot touch it except through an SQL query. Once that discipline was imposed, SQL became the master keeper of all the data. Consequently, once you define this constraint and suppose you try to insert a value as we saw there by giving a department name which was ZZZ, it is the SQL engine which will say nonsense. This constraint is violated. I will not permit this tuple to be inserted. I will not permit this value to be inserted. So the SQL became the keeper of the integrity of data and you don't need stored procedures, you don't need anything else to ensure that. The SQL language itself will guarantee that non-integral data will not get into the data. That's the advantage of the constraints. Here are a few more relations. Create table student with name, department name, total credits and again department name is a foreign key here. Here is a table text. So if you notice the ER model, let me... So this was student and this was teacher and you have of course the ID number as one of the attributes and several other attributes. This is the typical way you will draw this. And of course student, I forgot what is the student attribute. There is an ID there or what is it called? There is also an ID here. And then there is a subject. And the next slide on the printout you will see on page 4. There is a table called course. So this course has a course ID which is the key attribute and a student takes a course is shown by a relation. So this is a relation between student and a course. Similarly, a teacher teaches a course is again shown by a relation. And these are all many to many relations because a student can take several courses. A course can be taken by several students etc. etc. etc. This then in general is the mechanism through which we will ultimately derive what should be the tables in our database and then we will use those tables. So here is the course table where the primary key is course ID. There is a title, there is a department name, credits etc. Again there is a foreign key here. There are drop and alter commands. The drop command actually sort of deletes that table. Alter table changes the composition. So it might be used to add a additional column. For example, alter table R it says add another attribute A which is domain so and so. So add let us say salary numeric something something which was not there let us say. So whenever such columns are added a sort of new table is now created because old table had only five columns. Let us say now it has six columns. The sixth column will be created and all null values will be given unless you have given a default. Similarly you can drop a particular column if you want or more columns. It should be emphasized when our teachers come in December that the alter table drop table commands are not to be used at the drop of a hat. Database design is a serious matter and once you have designed barring a few alterations which you do in the initial days but typically you don't change the database design. In fact database design change even of altering a table is considered a major activity. I mean those days are gone when our database sizes used to be 200 megabytes or 500 megabytes. They are not even few gigabytes. They are not even few terabytes. We are already talking about petabytes of data. It is impossible if we say add one column. It is not as easy as it may take as much as seven hours to add a column in a data. So physically changing the database design is considered the last resort. And that is why we have to emphasize good design practice. That means the design that you get in the first time must be the right design. It's like almost saying that I have a house. Now I say some guests have come add one room. Those guests go demolish one room. It's not easy. It's almost like that. The database is like a house. It's a structure. It better be architected properly right from the beginning with anticipation of how many guests may come. And even if there are no guests I am saying alone I better live in all the rooms. I can't demolish rooms. That is what is the database. Here is the basic query structure. Again all of us are familiar with this. Typical SQL query select attribute comma attribute comma attribute comma attribute from relation comma relation comma relation where some condition P which is a predicate. So Rs represent a relation. P represent a predicate and A represent an attribute. And of course as we shall see in a short while all of these represent collectively Cartesian products of all these relations. Whenever I say R1 comma R2 comma R3 means R1 Cartesian product R2 Cartesian product R3 which is of course nonsense as we shall see and we have to therefore include some kind of meaningful joining of these tables either through joint conditions in the where predicate or by using an appropriate joint word in the where clause itself. So the select clause lists all the attribute and this is the in fact if you see selection projection Cartesian product as the relational algebra operators if you recall the select clause list will give you the selection list. So these many attributes are to be output. So getting the names of all instructors select name from instructor simple query all SQL queries incidentally have to be terminated by a semicolon when you actually write them and the names are case insensitive. So for example name is same as name is same as name this is fundamentally different from the notion of variable in a programming language. In a programming language I mean capital N small a small m small e is completely different from this and in fact we emphatically tell our students never to mess up with this kind of whereas SQL says it doesn't matter. Now to retain the sanity of our teachers and students it will be useful for us to tell them that look while this is so it is preferable to remain consistent with whatever names that you use so that there is no confusion in somebody's name. Since you may do a selection where based on the predicate you may select only sub portion for example you want to select all department names. Now if there are 200 teachers and there are only seven departments the department name will uniquely occur only seven times but actually there will be 200 entries. So if you just say select department name from instructor you'll get 200. You want only unique of those in the proper relational sense then you have to say select distinct. If you want all of them you can actually say select all department. But if you just don't say all the distinct is not default all is the default. When you say select star it means select all attributes and it can contain arithmetic expressions plus minus star and slash and operating on constants or attributes of top. So for example you can say select ID comma name comma salary by 12. So if the annual salary that is recorded by 12 will give you monthly salary. You can even select the constant. So select ID comma name comma salary by 12 comma five. Just say five, select expression. So what SQL does is it selects all these from the database table upends of five and puts that as output. Because any expression is permitted in fact. Obviously it doesn't make sense to have constant expressions normally. You'll have an expression involving some of the field values as in here. Where is perhaps the most important clause which permits us to write what we call a condition or a predicate. And this condition is written using normal conventional condition writing. So you have operators which are equal less than greater than all kinds of things which make comparison. And whenever a comparison is made in a complex condition that you write every row that is extracted from table one or more tables is examined for this condition. And only a row which satisfies that condition is thrown out as output other rows are forgotten. So that is the standard interpretation of this where clause. The comparison results can be conjugated by using and or not. We are all familiar with two value logic. When we introduce null value we have to tell our teachers the notion of a three value logic. Because ordinarily a predicate is either true or false. But if there is a value which is null somewhere then what is is five greater than null. The answer is neither yes nor no. Neither true nor false. So that that logical value is called unknown. And if you are unknown then you have to discuss a three value logic true false and unknown and have to redefine and or not in the context of these three values which is also covered somewhere later in this chapter. The from clause lists the relations or the tables from which data has to be extracted. Ordinarily there will be one one from clause say from instructor you select all star. But if you just select from star from instructor and teachers it generates every possible instructor teacher's pairs with all attributes. So effectively if there are two hundred instructions and if let's say each teacher teaches on an average two courses. So there are four hundred less which a teacher teaches these. But when you combine these two in this fashion a Cartesian product of two hundred into four hundred is made. And that of course creates information which is completely worthless. That table directly will not give us anything. And therefore we need to do something additional checking, additional extraction, additional conditions so that we extract meaningful information out of it. In fact when we teach our students we might want to advise our teachers in December that this notion of a join should evolve out of discussion of a Cartesian product and selecting something from the Cartesian product. So selecting something meaningful. And that meaningful selection can effectively be interpreted as a join. Although we might have covered the join operation in relational algebra R. So here is a Cartesian product. This is the instructor table and this is the teacher's table. So for example one zero one zero one Srinivasan teaches CS 101. Srinivasan also teaches CS 315 Srinivasan also teaches CS 347. 12121 who is Wu, a finance guy teaches finance 201 etc. So when you do a proper join of these two tables you will get the correct result of course. This slide tries to explain what happens when you do a Cartesian product. So Cartesian product will take one row from this table, combine it with one row of this. It will take the same row combined with this. It will take the same row combined with the third and it is supposed to produce that as a Cartesian product. It attempts to do that so you have Srinivasan CS 101, Srinivasan CS 315, Srinivasan CS 347, Srinivasan finance 201 because there is a row there. The mistake is rich Srinivasan now suddenly gets 95,000 rupees per year rather than 65,000. And of course he has been ported all the way from computer science to physics. So this is an error. There is no way physics can come here. Physics can come for the first time only with the great Einstein. It cannot come anywhere else. I was quite shocked and I called Sudarshan at about 10 o'clock in the night. I said, Sudarshan, isn't there an error in your book? So he said, oh you have found it out, is it? So he says that they are pretty messed up that figure. Unfortunately, he has not been able to prepare the errata because he himself became aware of it when last week somebody sent him an email saying that you have an error in that book. So you see even the great books by great authors and great publishers can contain errors. And I am illustrating this point is I would like to share with you the psychology of us teachers that I have seen over last 40 years at various places. In fact my colleague and friend, one-time colleague and friend Professor Prakash Vihare from Indore pointed this out to me. Precisely in the context of this book when this second edition or third edition was being used and Sudarshan had joined IIT Bombay and when Professor Vihare was doing his PhD here, Sudarshan was teaching databases. So he gave an example to me. He says that consider three different offerings of this course. He says Professor Sudarshan teaching database from the book written by Professor Sudarshan in IIT, one case. He says somebody like me who has studied under Professor Sudarshan, goes back to Indore and teaches a database course. He says somebody in a small place like Riva or something who has never seen Sudarshan, heard him, has not studied. He has studied some other course somewhere else but is required to teach from Sudarshan's book. He says the psychology of the teacher and the students will be different. When Sudarshan teaches this book and he comes across such a thing, he will non-challengedly say or this is mistake. Even while teaching, okay, he might discover that some problem the way it is solved in his book is not correct. Or a better method exists now. He will immediately agree, oh this is a problem, this is a mistake. He has no issue with that and therefore the students who study here will understand that teachers can make mistakes and it is okay for them to make mistakes. There is nothing wrong in it. As long as the mistake is understood and seen to be a mistake, it needs to be corrected. So neither the teacher nor the students are either afraid of making mistakes or admitting them. Does this happen at indoor? He says when I teach it, first of all I'll hesitate. Even if I notice a mistake, I'll say oh how can Sudarshan make a mistake? So I must be wrong. So I will relook at it, relook again and again and sheepishly maybe after 15 days I will send a mail to Sudarshan. Sir, is there a mistake there? So this is the difference. And my students will also take that whatever is written in the book to be correct because I will never point out an error. They will also hesitate. He says go to a third-turn college in Riva. There the students and teachers will believe that the book is gospel truth and they will in fact think that Cartesian product is like this. It converts computer science to physics. It increases people's salary. Do you understand this? And I think of the reason I'm elaborating this is I want you to make it a point that we will take this example only to tell all our thousand teachers who come there saying that such mistakes can occur. They will occur in fact and therefore it is important for us as individual teachers to examine each point and say it is wrong. Now look at the psychology. Even I instead of telling professor Sudarshan that Sudarshan what nonsense you got a mistake there. What did I say at 10 o'clock in the night also? Sudarshan isn't there a mistake on this page? You see the 10 o'clock. Because even I am perplexed unless I re-look at it. I am not very sure. Very obviously if this is the Cartesian product of these two tables is no way you can get this Cartesian. So I have actually I try to first correct it. Unfortunately this is not typed Cartesian product. It is a photograph of that page and in photograph I can't change physics to computer science. So I have done something else. So this is that slide. So what I have done is I have corrected it like this by hand and that is what you should do. So remove this physics, this should become computer science, remove this salary, this should become 65,000. The reason I want to do this is I am preserving this with the hand marks and I will get it included in the soft copy so that you can just carry it with you. But don't you think it's a good idea to tell our young teachers coming for the December workshop that look such things will happen and I may make a mistake while solving an example itself. One of the problems with the younger teachers these days is that they hesitate in admitting their mistakes and this is this is bad for us. As teachers the first thing we must tell our students is that look mistake is possible even at the level of God. So mistakes are okay. As long as we understand correct them and go forward. And this hesitation unless we as peers try to remove amongst our colleagues because suppose I am a teacher in a small colleges nobody would tell me that it is okay to make mistakes. In fact if I make mistakes my students I will feel they will laugh at me. My colleagues will not sympathize with me and my head will shout at me. So this is a typical environment in a small college where I live and that is why it is important for the community like ours to tell them it is perfectly fine. Okay and this is a good example to see the great professor Sudarshan also makes mistake no proof reading. So we have therefore the joints which will say select name course ID from instructor comma teachers where instructor dot ID is equal to teachers dot and this simple condition from that Cartesian product will actually examine and out of millions of rows I mean you should also emphasize the size expansion that happens when you do a Cartesian product. So m rows in one table and n rows in other tables the Cartesian product has m into n rows whereas in actual practice the useful information will be either m or n depending upon what is relevant. So this simple condition makes it perfectly fine and this is another you can write section dot course ID is course course dot course ID and department name equal to computer science. So you can actually put a variety of conditions for the joint and then we can okay so using this table these tables you are expected to write some queries and then you come to natural joint. So in the regular workshop when we do this regular workshop professor Sudarshan most probably will stop at this stage and will tell the people that okay now given these tables see luckily since everybody will have a book you don't have to make Xerox copies and Cycle style copies and circulate them even for class or lab they can use this book and of course there will be lab assignments given but during the class most rarely he will write some query and say okay write these escalate you will agree that that is what how we would like to teach students like unfortunately in many small places this does not happen it just is a monologue is it just a lecture so some queries will be written will of course share those with you and then we will ask the teachers to write those queries so even in a classroom which will be coordinating there you should actually watch out just like if you are a teacher you will go around know because from here professor Sudarshan cannot see these thousand people but you can see the thirty people who are with you and you should encourage people that don't just sit there don't wait for the actual query to come on the slide later but sit down and write because that is another thing you know unless we tell our teacher friends that look unless you act as students you will not be able to put yourself into the mind of the student and and and do a better continue so as I have told some of you earlier I will repeat this that our idea behind conducting that December workshop is not any kind of a knowledge transfer I mean that will happen but it is also a psychology transfer of what we feel about teaching how the teaching should be conducted what collectively we discover to be good practices and we would like every participating teacher to invite part of it so that that person becomes a better teacher than what he or she is when one goes back continuing with this the natural join you have to be careful that the attributes on which you join are related are meaningful otherwise I mean as somebody said it is possible to find a correlation between the number of trees in China and number of electricity poles in united states statistically you can always find a correlation but if you try to do a join of this kind of disparate things you will get something worse than Cartesian proof in fact the meaningless join invariably results in a Cartesian product so join has to be amongst the meaningful things but so here is an example is the names of instructors along with that titles of the courses that they teach so this is an incorrect joint it course dot department name with instructor dot department if you say select name title from instructor natural joint teaches natural joint course you're joining three tables okay the correct one is select name title from instructor natural join teaches Cartesian product course where teachers dot course ID is equal to course dot course you can examine what happens when you when you when you do this kind of selection what do you get and when you do this kind of thing of course any query in a scale can be written in a number of different ways and we should in fact encourage teachers to find out alternate ways of writing the same query so here is another way of writing the query the queries become more interesting and more useful when we consider nested queries or there is a rename operation so this you can actually rename relations and attributes using the as clause the keyword as as you know is not mandatory but it is often used to increase the meaning so for example from instructor as t instructor as s because I want to do a join of the same table with itself all to put conditions on one version of instance of one table and other instance of a table I have to give it different names TNS many students find this concept very difficult to understand and I have found many young teachers also do not have a complete clarity on this as clause so consequently at least one of the exercises in our lab should actually have this important because t dot salary greater than s dot salary so the natural question is that am I comparing my own salary how can it be greater than my salary because my salary here and my salary there is safe so there you have to tell them that look it's like imagine a sliding window on one table and another version of the table so you are actually taking one row from another instance this row from this table and you are trying to find out whether there is any tuple in that other instance of a table where this condition is met and therefore you want to extend the slightly more difficult concept to convey but with some understanding it is it is possible in fact I had thought of one animation here that you actually take a small table containing some 10 values create another instance of that table and show a moving thing as you compare you know this 10 this 10 and make a comparison and show what results you get strings are required in data processing and the like predicate permits us to do very extensive string processing again not many people appreciate the fact that regular expressions can be written here so in fact unless our teachers have actually participated or studied either courses like compilers for example they may not be completely familiar the notion of a regular expression so it might be useful because I think databases are still taught by teachers studied computer science or IT right so they would generally be familiar with regular expressions but this is not emphasized in the database class very often that you have a very rich and powerful construct here so you have for example concatenation upper to lower so all all the standard see like functions that you see in string processing are implicitly available in in a square then sorting so you can order by in descending order or ascending order all that uppers that you have selected this of course makes sense whenever you want to create an output for the benefit of people that you would like things to be in sorted orders sorted students list sorted by a roll number or the course marks sorted in descending or ascending orders in top performance or whatever usually you need to sort within subgroups of data and that requires a slightly different handling so first okay there are there are a variety of other features which are included in these slides so between is is nothing but a simplification of a compound condition you can write a compound condition instead you write this between you can compare tuples so it is not necessary to have a single value compared with a single value you can have a pair which being compared with another pair this is something which is not commonly taught in a course because many other books do not emphasize this this is a small theoretical discussion and depends upon how much of relational algebra is explicitly covered now this something I would like to have your feedback when you teach a database course relational algebra will be mentioned somewhere but to what extent is relational algebra covered means do our students develop a full understanding of the mathematical foundation they generally don't so which means relating more complex concepts in queries to the more complex corresponding concepts in relational algebra is perhaps not done to the same extent the same question that I have to ask which Sudarshan will ask later is that when you discuss normalization then normalization of course we talk about the first normal form, second normal form, third normal form etc but do we explicitly discuss the complete topic of functional dependencies and finding out closure of functional dependencies and teach them algorithms on how to do that that is done but relational algebra is not covered with the same rigor perhaps is it or is it not because that is something that we need to understand and appreciate what we feel is that in our main workshop we should cover relational algebra to the extent that is required to be understood and we will also very quickly discuss these topics such as these which they come up but we may not insist on a tutorial session on these topics because that we don't know as it is we have 10 days it's not too much of a time really that 10 days by the way I am digressing but it is important for you to understand if you recall we did a 15 day workshop last time for computer programming the first pilot was done for the 10 days on the other hand the feedback from the teachers invariably is 15 days is too long a period now in fact many people would prefer one week workshop but we know that one week nothing substantial can happen also when we are discussing with professor Shetty IST president so he says for that you can't conduct a workshop for five days and give a two-week certificate so as a compromise we have decided that 10 days is two weeks because five days and five days are the five working days in every week but physically we restrict it to 10 days I hope you agree that 10 days is a okay period so that is how all our subsequent workshops in other subjects like basic mechanical engineering etc etc will run for 10 days so that physically people don't have to be away from their place for too long but what it also means is that 10 days are power-packed 10 days so there is possibly one Sunday which is semi-off semi-off in the sense that we may not have lectures and practicers but we will have enough work for the teachers wherever they are staying to that's the compromise so if we said 10 days we will we can call it a two weeks workshop is what the IST interpretation what it means that in 10 days we will not have as much time to do as many labs exhaustively as we would have had in 15 because that time period is compressed and to that extent then we'll have to sort of do some compromises on where to emphasize where not to emphasize why I say this again is that you'll notice I you remember I said that we have some presentations that I want you to make so within today tomorrow whatever you I would like you to sort of make yourself into some teams it is not mandatory but it is preferable that if you make teams in the same region where the syllabi are likely to be similar okay so it's the commonality of the syllabi and the examination pattern which is what is important so maybe three or four people form a team each team will be given 10 minutes of presentation so only three to four slides where you have to combine and we'll have enough time to prepare these on your netbooks I think so I would like each team to make a presentation just one slide which will say the affiliations of the four members of five members were there and then one slide saying the common syllabus the other slide saying the teaching and evaluation pattern and the third slide showing some sample questions or something maybe a few additional slides as a next year during the presentation will present these three or four slides in 10 minutes additional slides I would encourage your teams to make because these additional slides from each team will be again made combined and will be distributed to all of you so we will also look at that will help us to decide what should be the syllabus of our final workshop because that final workshop syllabus has to comprehensively cover whatever is actually taught on the field and in addition whatever slight differences or additions that we would like to make on on special topics etc etc the relation algebra discussion will skip for the time being but there are queries which do a lot of set operations for example find courses that ran in one semester of 2009 and also another semester of 2010 okay incidentally in most of the teachings that I have seen in smaller colleges set operations are rarely covered so you cover select star from this where this group by order by end of man and nested queries with are also very preliminarily covered is very interesting the same book is followed but to what extent that book is covered and in what spirit that book is covered differs widely now you see the impact is on the students if the students are not challenged sufficiently they will not learn something on there so the issue is that we tell our teacher that no you have to cover these things you have to discuss these things are important thing and then hopefully change the style of the question paper rather than asking simple questions start asking hard questions teachers will become unpopular but students will benefit union intersect and accept and depends on how you want to handle this you will use one of these things appropriate I believe there is a lab assignment which will involve these figures we can experiment those on the machine null values as I already mentioned null values are important and null values are handled explicitly by the database management systems today you can in fact check for a column value being null by saying where salary is null but the null value as I said participates in logical operations rather differently so true and false which are the two-valued logic become a three-valued logic when something is null and therefore you have an unknown value there so null values and three-valued logic is important so all and and not are to be redefined for this unknown value so unknown or true is true unknown or false is unknown unknown or unknown is unknown in and true and unknown is unknown but false and unknown is false so here in all the true and in and the false are the conclusive results of this others are unknown similarly not unknown is unknown and P is unknown evaluates to true if predicate P evaluates to unknown so that's the final bottom line of decision making in in database so result of hair clause predicate is treated as false if it evaluates to unknown so the interpretation is important because this will decide whether a tuple or a row from a table is extracted as output or not extracted as output then these aggregate functions average minimum maximum sum and count usually you will put count star to indicate counting of all the column that star indicates all rows whereas you can also put a non-star thing there whether counting will be restricted or summing will be restricted to that particular summing will be done on a particular attribute that you mentioned it so here is the calculation of average salary and you can do that for individual department here is counting distinct number of courses which are taught for the semester spring and year 2010 so aggregate functions perform very useful functions in actual application and needless to add database management system is meant for data processing applications it is not to be considered as a puritanical programming language of some something like that and indeed that is the reason why the examples have to be from a domain which people can appreciate in fact the reason I like this edition of the book those of you have seen earlier edition they will find that the example database that was taken for explaining all those things was some banking thing and all the cities and other things are completely unknown to us so some period is something happening there and whatever also checking account and this account is not the terms that we use in India so I used to criticize I when I talked to professor Avi Silver says many a second I said that you are writing a book for global consumption and you have only the American terminology use is not good so he says well people are using the book professor Fatak in spite of that now that is unfortunate but in the last two editions actually it is professor Sudarshan who has who has taken extraordinary pace to modify that book complete so in fact this edition is almost Sudarshan edition and therefore you will see that you now have Einstein you have one senior and you have an example where we can relate the student community can relate that's a very useful fact but generally there is something we do here the first course on databases which is an application oriented course and not so much of a database internal course although we do teach some internals there is an advance course on databases I think many of you in computer science might have such a distinct so generally we have this course project in practically every computer science course which is application oriented the course project is done by a team of about three to four students and this course project runs for almost half the semester so just before the midterm exam the course projects will be decided in the teams take up that project and do and they actually build an application and believe me building an application where you write the code write some cc plus plus code connected through JDBC write some front end form and implement a year design even though it is a small application development that really teaches our students what is involved so writing meaningful queries first identifying what queries are required from a business perspective then translating them into a square is this done at most places what is your feedback on that this kind of group projects is it done at most places or only some exercises and lab value how is the teaching done in your opinion lab assignments are there you have you have many project but do you see the same thing in colleges around you see our problem unfortunately is with those four thousand colleges your colleges are are still special in the sense that you have most of your colleges have evolved have a stature have a standard and have a keenness to follow as much rigor as is possible but look at the let's say 50 colleges in your vicinity do you think that those 50 colleges implement this kind of project idea and not just in later but also in spirit that's the question many of you may not know the exact answer because we have perhaps not looked at our neighboring institutions from this perspective but this is one question which we should ask our teachers and we are making some modifications based on our experience we will be asking all the teachers to fill in a survey form before they register in that we would like to ask them these questions about the style of teaching the evaluation mechanism and one of the things that I would like the teams to do apart from presenting what is their syllabus and what is the kind of question paper pattern if each team could suggest a few questions that we could ask the participating teachers in December to answer before they come such that that will benefit us in correctly emphasizing the appropriate do you agree that will be useful yeah some preliminary question so this is one question that we would like to ask who knows may be following your examples that vicinity colleges are also doing something similar but then the difference is what is the rigor with which a mini project is done and what is the quantum of what that the mini project in at some process you know just create table create table create table insert five rows and write three queries is a mini project some groups will be doing that will happen even in IIT or in your colleges also some groups will do extraordinarily great work some groups will do small but are there examples of this group work available for students to appreciate that yes this is the kind of group project we do so for example suppose you have been teaching these let's say for past three or four years and different groups have done different projects or different groups have possibly done the same project suppose now you teach the new students in the next semester do you make available to them the projects which have been done by past students is that routinely done or that is not done one of the reasons why that is not done is I might ask the group to do exactly the same project again now if I tell them this is the project they will do cut and paste the approach we follow is we actually open everything and as you cut and paste you get zero marks because we know what exactly has been done so something new has to be done the students should be monitored that there and then they write they understand the query list the queries well then they write the queries the scope of the project is large enough to include large number of queries write the queries in algebra SQL calculus or various form of that and those things are monitored rather than the there is a risk of shifting my point is risk of shifting towards programming what we do in IIT we also have a very similar problem but because of the lab assignments every week we ensure that the other aspect that you mentioned are covered very rigorous yeah that is so they are required to run let's say required to write relational algebra expressions required to write SQL queries required to run those queries against the database lab 1 lab 2 lab 3 typically we have about 10 or 12 lab sessions throughout the semester in which students are required to do that so this mini project is a work which is over and above that and therefore we don't mind if people slightly freak out on building the front end and so on but the this point I mean the balance I think is important the point that he makes is important I mean people should not take this course to be a Java programming course we have in fact in IIT we have another lab CS 696 or something it's called it's a lab course and we we don't curiously we don't have a course separately a theory course in object-oriented program we have a first course in programming called CS 101 which is for all branches and which has a mix of C and C++ a very preliminary object-oriented concepts are taught in that course not much the computer science students have therefore this lab is called 296 so this is no lectures they are supposed to build on whatever has been taught in CS 101 by reading books on object-oriented programming and become experts programming in Java but every week for three hours they have to do a lab and in that lab we not only force them to write large Java program there are group projects and with their we teach them JDBC they have not been taught a course in databases but what we do is we make a set of sample ER model prepare table popular that table with data and say okay here is the data that you have to access through JDBC so build your front end using Java access the data and use that so this this is covered in IIT in a very peculiar style they don't know they have not done a database course yet but they are very comfortable it is like they know the files we tell them treat these as if their files only thing is instead of writing file open file read file compare these are simple queries that you can do we teach them simple queries we teach them in fact up to join teach them in the sense that we have one leg one hour lecture before that lab and in that one hour we explain to them and we say now experiment is like swimming by jumping into the ocean if you survive we'll give you some marks and grade otherwise you sing come back next year but this style is is hard style I mean we can't implement this style in every college there'll be chaos and of course there'll be more charge against vice chancellor you know outside syllabus etc but so you might you might want to consider that if we have enough emphasis on labs as regular lab sessions for each of the points that you mentioned rigorously then perhaps building an application if they spend a little more time on the front end that may be okay but not at the cost of understanding basic yes yeah sorry we can maybe introduce them to this front-end concept only towards the end of those lab sessions maybe lab 910 or when they already have built the back end of which is say solid normalized and everything is taken care of and then they go back and look at the front-end and then use the JDBC or something to connect as a matter of fact in the database course even here that is the approach that is so initially the basics of database are covered and the front-end portion is discussed only towards the end so JDBC itself is discussed towards the end not at the end