 SQL query is now clear, how simple and elegant it is. There is no harm in using either of the forms as I said. So, natural joint schema as I mentioned will not be all of this. This particular column will be removed and do not ask me which one this or this. It does not matter really because both are exactly identical. It only happens that one comes from one table another comes from another. Now, are you convinced that the relational algebra is powerful enough and a strong enough mathematical foundation to give you the basis for extracting information from either one or multiple tables. If you ask me the basic SQL is just this. There are whole lot of things which we shall now discuss more formally because there are lots of other features. But fundamentally the ability to extract information from one or more tables in a database based on a very sound mathematical elegant theory is what is so beautiful about relational databases. All the databases which come out of relational model are called relational database. They are called relational because they are based on relational algebra which incidentally is based on the relationship between multiple things. Theta joint we will just discuss it very briefly. It is not much relevant. Theta is any meaningful predicate. So, you put a natural joint operator here. This is the way you demonstrate theta joint. You put a natural joint operator here and put a theta as a predicate. You do not require a predicate to be defined for natural joint. But when you put a theta it means that this is something like natural joint but not exactly as strong. It is a diluted version where theta usually has a joint condition. So, for example, theta is student dot S roll equal to range dot S roll and SCPI greater than 9 and SH equal to 8. Now, this technically would be described as a theta joint. Although you could always say that this is a natural joint followed by a select. Select starts from student comma range where student dot S roll equal to range dot S roll and SCPI greater than 9 and SH equal to 8. If I had written this as select starts from student natural joint range where SCPI greater than 9 and SH equal to 8, I would have divided this query into a natural joint and a select predicate. Both would be equal. This is one question though. You remember we mentioned about efficiency. We said that SQL query does not require us to define in which order what should be done. It figures it out. Can you guess how exactly this operation would be carried out? Suppose I had said select starts from student natural joint range where SCPI greater than 9 and SH equal to 8 or alternately I would have said select starts from student comma range where student dot S roll equal to this and this equal to this. Will it do a cross product? Very obviously the answer must be no. If actual cross product has to be performed the machine would be dead. So it will do some elegant operation. Will it do an elegant operation to do this joint first and then apply this criteria or it will first do the selection. Remember if there are 5000 students and if there are 300 sorry 30,000 registrations then I will not get 30,000 into 5000 as a cross product. I will get 30,000 elements in that joint. Natural joint will have 30,000 elements. It is obvious that all 30,000 elements are not relevant. The number of students who stay in hostel 8 itself is 250. Those of them who have CPI greater than 9 is may be 5. 5 students on an average would have registered for 6 courses each, 30 courses. So not 30,000 courses only need to be done. Why not then I first select these 30 jokers from the student table not jokers nice people and then form a natural joint of these 30 people with their registration records. That would make lot more sense. There is an element in the modern database management systems called query optimizer. Professor Sudarshan will describe it later. In fact, Professor Sudarshan's real global expertise is words leading researcher in query optimization. A query optimization takes this query from you. The database engine finds out what are the possible alternative ways of answering this query. These are called query plans and then selects one which costs minimum. Cost is in terms of disk accesses, computational time etcetera. All that is taken care of. Imagine as cobalt programmers if we have to do it, we have to do the thinking as to what will be the most optimal way of doing it. So that is another important component of the database management. So is this clear? The query is natural joint. Sir, I wanted to myself. Sir, in natural joint it has to be based on primary key only. It has to be based on common attribute. Invariably as we shall see later in the database design, invariably the common attributes will be part of either the foreign key or the primary key as we shall see. What my doubt is if the attribute is same, the name differs in the other. That is what you have to take care of during the design. In fact, one of the reasons why you may still use this form, student dot s roll equal to raise dot s roll is because if this is p roll then you have to say student dot s roll equal to raise dot p roll. Natural joint will not recognize it. The common name should be common. In fact, that is the reason why you remember I mentioned data dictionary. The data dictionary should be having a unique set of attribute names across the entire enterprise. So that there is never any confusion. Given an attribute name it means exactly the same thing in whichever table it has been used. Unless there is a student name, teacher's name, but t name for teacher name if you put anywhere in your enterprise t name should be in teacher. So that is of course a design issue, but you are very right. So please remember natural joint I forgot to emphasize this. Natural joint depends upon the requirement that the schema contains the same name for the common attribute. If it does not then you can still prescribe natural joint, but you will have to give the expanded version of the natural this older version. It need not be primary. For example, in rage s roll is not the primary key. s roll plus c code is the primary key. In student s roll is primary. Invariably when you join multiple tables, one of them will be a primary key and the other will be an element of the primary key in the other table, but it is not necessary. In the other table it may be just an element. It need not be a part of the primary key also. In one table in the student table s roll is the primary key. In rage table s roll is not the primary key. In rage table s roll plus c code is the primary key. No, no, no. It does not pick up anything. This is what you are prescribing. This is the beauty is that you see. This is a good point. He is still thinking in the standard algorithm. That is a problem with all of us. When we write programs in conventional programming languages we think as the computer thinks. So we are thinking what the computer would be doing. When you write SQL queries you have to keep yourself miles away from the computer. Never bother about what the computer does. Mathematically none of them are required to be primary keys. Any common attribute name, if they mean the same thing, natural join can be done. However, in 99.9 percent of the cases you will not get any meaningful information. Other way round if you want to extract meaningful information out of tables, obviously those tables will be related in some way. The relationship is more than likely to be only on those fields one of which is the primary key in some table. But it is only likely to be from a semantic point of view. Mathematically the relations do not distinguish. It cannot be. More than one attribute in the same table cannot have same name. No, not in the same table. More than one attribute in two different tables are having the same name. Then how natural join is going to be mined? How can it be same name? Suppose S roll and S name I have defined in one table. S roll and S name and some other then the natural join means S roll here should be equal to S roll. S name here should be equal to that S name. It will take all the. Yeah, all the. That is the difference between natural join and theta join. Natural join insists on equality on all common attributes. Whereas theta join you can relax that condition. So now we begin a sort of formal study of a square. Already only 15 minutes left. Let me see how much I can complete. This is supposed to be only introductory part. Professor Sudarsher has promised that whatever I leave he will carry on from there. But the idea is to, I think with this information alone you should be able to play around with the machines today. Some schema has been prepared. Amit, the schema is there, right? And you will give them an introduction to net beans. So you may take some time to become familiar with net beans with the development environment. My suggestion is on first day don't spend too much time in investigating nitty gritties of net beans. Just understand that as an environment how to login and how to invoke a square through that net beans. Then they have defined some schema or populated some data there. So you will give them some print out of the schema or something. Yeah, yeah. So my PAs are their IIT students. So they presume that people will remember a whole lot of complicated things in their mind. We are all grown up people. You are kids. So you can remember things better than us. It would be nice if the schema is printed on a piece of paper and is circulated. Wonderful. So you will have that schema and you try to run these queries just to see what happens, what you get. Have fun today basically, but learn as much as possible. So here is a formal introduction to SQL now. For this we have given some relations. These are relations from Professor Sudarshan's book. So you can correlate directly to… Here is the account relation, which has an account number, a branch name and balance. So branch name is like our LIC branch. Account number is like our policy number. Of course, there is no equivalent of balance, but there is a balance amount in that account. This is one relation. Here he defines attribute types. So this is a formal definition. Each attribute of a relation as a name, the set of allowed values for each attribute is called the domain of the attribute. This is a formal mathematical name. Attribute values are normally required to be atomic. That is indivisible. So value of an attribute can be an account number, but cannot be a set of account numbers. We already discussed this. Domain is said to be atomic if all its members are atomic. It is only a mathematical clarification. The special value null is a member of every domain. So you say grades are only A, A, B, C, etc., etc. By mathematical definition, null is a possible value. Indicates that the value is unknown. The null value causes complications in the definition of many operations. We shall ignore the effect of null values in our main presentation, but we shall discuss these later. Relation instance. So relation schema is one thing. Relation instance is the current values of a relation specified by a table. So this is a table. Jones, Main, Harrison, Smith, North, Rye, Curry, North, Rye, Lindsay, Park, Pittsfield. This is a customer name, customer street, customer city. These are attributes or columns and these are tuples or rows. An element t of r, r is the relation schema. It is a tuple represented by a row in a tuple. It is only a mathematical definition of whatever we discussed. r is the relation schema and t is an element of the relation. So t and r are now being distinguished. We so far talked about relation, relation schema and the values in the same breadth. But relation schema is r. The relation itself is the relation instance. Anyone row in that relation is a t. Order of tuples is irrelevant. Very important. In Cobalt, we distinguish between sorted and unsorted phi. Mathematically, the mathematics of relation algebra does not give a dam on how the relation elements are stored. So this may be a101, a215, a102, a305. It does not matter. It just does not matter. This is a relation. In fact, in a database also, it does not matter how the rows are stored in a table because ultimately it is the expression for extraction that you give which will determine what is to be extracted. Mathematically, a relation does not require sorting. So account relation with unordered tuples. This is an example of that unordered tuples. A database consists of multiple relations. Common sense, information about an enterprise is broken up into parts with each relation storing one part of the information. Here we have account which stores information about accounts, depositor which stores information about which customer comes with, owns which account and customer stores information about customers. Very clearly, account and customer are two independent entities and depositor is some kind of a relationship between these two entities. Similarly, there could be a loan account or something like that. I could define other things. The normalization theory of how to get the best set of tables to describe a database. There is a theory called normalization theory. We shall have a hint at the normalization theory on fifth day, the last day of our course. I will again come back on the last day where we will discuss this and the performance issues as I mentioned and this will come from the seventh chapter of the book. Normalization theory is an exquisite mathematical theory. We need not understand the entire mathematical theory, but we need to understand what we call functional dependencies and the first, second and third normal forms which are fundamental in any database design. Here is a customer relation. So, it is a customer name, customer street and customer city, something like address and city. Here is a depositor relation. You can notice that depositor relationship is like that range, course code and student number, customer name and account number. So, this is the relation. Same customer may have multiple accounts, Johnson has two accounts, keys. I will omit this. This is nothing but a mathematical definition of keys. If you recall, we discussed super key, candidate key and primary key. Those of you are mathematically inclined, you can say let K be any subset of R. What is R? R is the set of attributes. It is the relation schema, not the relation. R is the set of all attributes, say 10, 20, 100, whatever. K is a subset of R, which is a super key if values for K are sufficient to identify a unique tuple of each possible relation R on R. R is a schema. Remember, if I change, add, subtract, I get a new relation. So, I will have a series of relations R, R1, R2, R3, R4, all defined on R. For any such possible relation, if these K values are sufficient to uniquely identify a tuple, I call it a super. In plain English, a super key is nothing but one which uniquely defines a row in any relation. By possible R, we mean a relation R that could exist in the enterprise we are modeling. Example, customer name, customer street and customer name are both super keys of customer. If no two customers can possibly have the same name, just an example. In real life, an attribute such as customer ID would be used instead of customer name to uniquely identify customer. You can see exactly the same reasoning, but we are mathematically defined. K is a candidate key if K is minimal. Since we are now talking in terms of set theory, K is nothing but a set of attributes. If K is minimal means, if some key is super key and no subset of it is super key, then that is a minimal super. Subset of super key means any subset element. Primary key is a candidate key chosen as the principle means of identifying tuples with a relation should choose an attribute whose value never or very rarely changes. That is the reason why combination of room number and hostel number is not a employee code, etcetera. So, email address is unique, but may change, never use it as a primary. Foreign keys, this is an important concept and a concept which will be very relevant in most of our database activity. A relation schema may have an attribute that corresponds to the primary key of another relation. This attribute is called a foreign key. Go back to our old relation. In the rage file, if you visualize the rage table, in the rage table, roll number plus course code is the primary key. But that roll number is also called a foreign key because that roll number refers to the roll number of the main student table. Similarly, the course code is a foreign key in the referring to course table. What is the importance of this foreign key? It represents an association and it also defines a very stringent condition which actual entries in my table must satisfy. Namely, unless a student is registered in the student table, the registration record should not appear in the registration table. Unless a course is approved by CNET and finds a place in my course table, no student can be permitted to register for it. Similarly, here if you see customer name and account number attributes of depositor are foreign keys to customer and account respective. Exactly the same situation, customer table and account table and the depositor table is nothing but the two common primary keys. So, I must refer to this. Here is a schema diagram. Here is a branch. So, let us look at just the customer and the branch. Customer is here and branch is here. Account has a depositor as customer name and account number. The account number refers to an account which has a branch name and balance and in here the branch name refers to this branch. Just look at these two relations, account and customer. The primary key of depositor is both of these. Account number is the foreign key within the depositor table which refers to the primary key of account. Customer name is the attribute of depositor which is a foreign key referring to customer name of the customer. So, is the notion of foreign key clear? Foreign key is that attribute of a table which refers to primary key of some other attribute of some other table. Now, this will answer your question. Since a natural joint is effectively the same thing and since most meaningful information out of cross product can only come because of the relationship and since the relationship is defined by foreign key which essentially maps into a primary key of something else there will be some primary key in the joint. Query languages is a generic term. Language in which user requests information from the database. Cobalt program is a query language but it has to be specially written. Ordinary user cannot write it. So, there are procedural languages or non-procedural or declarative languages. There are pure languages. Relational algebra is a pure language that means it is a complete mathematical structure. Tappal relational calculus is another equivalent but more powerful way of describing the relational operations. The SQL incidentally implements features of both as we shall see. Domain relational calculus is another form of calculus. Since algebra itself is sufficiently complex for us. After all we all left math not me but most of you left maths long time ago and calculus will be difficult thing. So, we will not get into that but those who are mathematically inclined there is discussion on both the relational algebra and the relational calculus. However, our main purpose is to acquaint ourselves with the power of databases and the practical use. So, it would not be required. The reason I am emphasizing this is the beauty of relational database is the fundamentally sound mathematical foundation on which the whole thing is based. Pure languages form underlying basis of query languages that people use. Here is the history of databases. I already mentioned IBM SQL language developed as part of the system R project at the IBM San Jose research laboratory. It was renamed structured query language. Instead of SQL, it was called SQL and C and ISO standard SQL came in these forms 1986, 89, 92. The next standard was not called 99 but 1999. Jokingly people said that the language name itself became Y2K compliant. So, four digit theorem and then SQL 2003 which is the latest standard. Commercial systems offer most if not all SQL 92 features plus varying features sets from later standards and special proprietary features. So, for example, when LIC decides to use a database to take let us say a corporate license, we should obviously insist on having as many as the latest features as possible. Although this is a relational algebra based system it is called a relational database in consonance with the demands of the time where object oriented philosophy and paradigm is used heavily in programming. Object relational features are being implemented in most databases and many of those are standard of SQL 2003. So, object features are possible in a conventional database line. The main part of the database management system which permits us to define our tables, attributes and schema is called the data definition language part DDL. DDL, DML, data manipulation language and DCL, data control language are the three components of SQL description. DDL permits you to describe your schema and other things. DML permits you to manipulate the data which includes querying the data. So, the SQL queries that we saw are all part of DML. DML obviously has other statements which will permit you to insert data, delete data, update data and DCL which is the controls thing which we shall study later which will permit you to define excess control. So, promo should be able to access some tables but I should not be it. SEMA should not be able to access some view but others should be whatever general control mechanism sorry SEMA it did not mean to prohibit you from accessing anything. These are the data types in SQL. You are all familiar with cobalt types picture 9, 4 picture x something whatever whatever whatever. Here are the data types let us quickly go through that you can look at it in greater details later. Int stands for integer it is a finite subset of integers which is machine dependent. So, the largest value etcetera will be dependent on the machine. Numeric p comma d is a fixed point number with user specified precision of p digits with n digits to the right of decimal point very much like picture 9, 9, v, 9, 9 kind of. Please remember that cobalt is very special in this respect from other procedural languages. There is no mechanism to represent exact decimal integers and fractions to be represented. In all other languages like FORTRAN and C and all you have you know conversion to binary system cobalt permits that the database also permits exactly the same. It permits CARE n and VAR CARE n. VAR CARE n is a character string with varying length maximum length is defined as a CARE is a fixed length string this is naturally understood. N CARE is another thing called national characters because CARE normally means one byte for one character, but you cannot represent many bytes many many characters like German language Japanese language characters will require two bytes represent. Unicode requires 8 bit representation, but it could be a 2 byte representation. If you want such thing then you have to say n CARE also n VAR CARE. This allows two byte unicode characters. So, all Hindi Devnagari Telugu Tamil which are represented in unicode now can be represented by a 2 byte. Some people ask whether I can make I can enter Hindi or Devnagari things into my present database. If I have defined that as CARE I cannot have a unicode represented. So, I will have to change it to n CARE. Small n is a small integer typically like 8 bit or 16 bit versus 32 bit or 64 bit that is a minor variation you can use that. The real and double precision are floating point numbers again you are familiar with those. Float n is another representation of this whether it is floating point where user specified precision of at least n digits is guaranteed. Theoretically n can be even 100 and the language compiler is supposed to implement a 100 digit precision floating point number for you, but that is stupid nobody does that. A create table statement and although the people will give you a schema you should try to create your own schema and see how you could insert data or something like that that should be possible through net means. So, the way you create table you say create table say branch, branch name CARE 15, branch city CARE 30, assets integer CARE 15 and then not null this not null is a constraint it is called an integrity constraint. It specifies that the value of branch name can never be null. So, typically any such attribute which cannot be having null value you have to say not null. The general form is create table followed by a relation name. So, create table branch branch is your own chosen name for the table like file name and this is nothing but your FD entry in details actually, but with many more thing prescribed. So, A1 is the attribute name whereas D1 is the data type of attribute that is what must be minimally prescribed like name and picture minimum, but there are so many additional things which you can prescribe either here or separately through what is known as integrity constraints. So, apart from all the attributes in the table you can define a large number of integrity constraints later. These integrity constraints we shall see in a moment. These are constraints which will guard against accidental damage to the database by ensuring that authorized changes to the database do not result in a loss of data consistency. For example, no two customer name same ID number branch name attribute of the account relation must contain a value corresponding to the actual branch in the branch relation. Student role number in the range must contain a value which is actual student role foreign key. So, foreign key is a constraint a grade must be either a a a b b c c d etcetera etcetera, but not z z that is a data integrity constraint. These constraints can be defined at the level of schema definition when you declare a table what is most important once you define these constraints no programming validation or check is required. The database engine the SQL engine will guarantee that these constraints will not be flouted. If any insertion attempt is made by any program through of course, SQL statement error will return. So, try to insert a student record in the registration where the student is not there in the master database the record will go. Try to insert hostel number 39 against a student the record will be thrown out. Try to update the value of the salary of a professor from whatever 20,000, 25,000 rupees to say 25 lakh rupees. If the maximum salary is defined to be salary between this range and this range you have a problem. So, this is something really really important from a programmer's perspective. How many times you have found that somebody has forgotten to check something and that has resulted in the bad data in the fight because human beings can forget here is a guarantee you define a constraint nothing can happen to your database. So, integrity constraints can be of various types the most important constraint is primary key. So, for example, create table branch branch name is so and so branch it is so and so assets integer primary key is branch name here is another table customer customer name customer street customer street primary key in bracket customer name. So, this is a primary key automatically means it is unique etcetera is customer name as primary key realistic no we already said it should be customer idea or something. Referential integrity we have already seen it ensures that the association or relationship. So, here is an example the foreign key clause list the attributes that comprise the foreign key and name of the relation referenced by the foreign key by default the foreign key references the primary key attributes of the reference table. So, here is create table account account number cal 10 branch name this balance this primary key account number foreign key branch name references branch is a deposited account is customer name and account number. Now here primary key is customer name comma account number foreign key account number references account foreign key customer name references customer. There are many variations for example if the name of the primary key here is different from there you can write references account in bracket that name theoretically you can define a foreign key which refers to a non primary key in which case you can say that name. So, all the details are available if you do not say anything then by default it refers to the primary key of that it. I am Mahesh this account in the earlier statement this any account number you have stated that not null. Yes. But when you say it is a primary key you are not stating that it is not null. Yes. So, that is that is that is that is. That is exactly means. Yeah. Exactly means if it is a primary then no null where you will be accepted. Correct. In fact the primary key subsumes the requirement that it should be not null. Yes. So, primary key implies that it is not null it is unique. These two things are implied. Yes. You may still have another column which is not null and which also is required to be unique for example a candidate key. But there is no statement in sql which says this is my candidate key. So, to prescribe candidate key you say not null and unique as that way. So, this is this is clear. There are other integrity constraints not null is one of them. Unique is another. You remember we said distinct actually in the select statement, but in attributes when you characterize you say unique. So, unique means these form a candidate. Check constraint is a very strong constraint appears after attributes are declared. For example, check balance greater than 0. Now, I have a balance of 100 rupees and my wife goes and tries to withdraw 120 rupees. Any update request there will reduce the balance to minus if this constraint is defined that transaction will be thrown off. The check constraints are very generic constraints and you can put any kind of check constraint actually can be written as a sql query also inside the bracket. Every time a transaction is done which affects that particular attribute the entire query will be run and if that query based on that condition if it is satisfied it is otherwise not. So, we will of course, discuss more details on these constraints and how they are implemented later. Sir. Yeah, sorry. Sir, I am Amit. Can we modify these constraints any number of times without changing the actual. Yes and no. It is a very ticklish issue. Generally modification of constraint reflects very poorly on the design, but yes design can always be changed and in fact if you notice this that is what I was discussing here drop and alter table. The drop table command deletes all information about the drop relation. The alter table command is used to add attributes to an existing relation alter table or add AD. The alter table command can also be used to drop attributes of a relation alter table or drop a. Dropping of attributes is not supported by many databases. So, you may want to define a new table copy the contents of this table into that table and drop the entire table. In general whether you can redefine those that you cannot redefine the integrity constraint checks what you can do is you can temporarily dilute them. You can say for this particular operation do not insist on this integrity constraint. You may regularly need to if some constraints is being expanded or something we may require to redefine that constraint or we may have to visualize that for long term we will have to. That is the reason why I said that if the total time spent in building a relation database based application is one and half years at least 6 months should be spent in analysis and design. Invariably people are very eager particularly people like us who are programmers you know we are itching to dirty our hands on the terminal. In any professional software development thing if without the complete documentation of the designed algorithm anybody writes a code that person would be fired during learning you experiment you learn the tricks of the trade etcetera. But formal system never ever without design just like without adequate testing will you release a version you might because you are forced because of the circumstances but you know the fun that awaits you there. When we design a system it is not that we are doing something hard coding like the values are not defined in the constraint again as you mentioned that this constraint might be coming again from another select query. Correct. So, some valid values as you see in the plan. So, we will define the relation of plan and there you just go on updating the values. Yes, yes in another relation that will be taken that is how you go on. It is not that every time you have to change the schema. Just like the same principle like you do not hard code things in your program. Nothing is there. Similarly, you try not to hard code some of the business related constraints like that. Those are not. That is a very cute way although purists will say that Pramod is cheating the system but that is the way going forward because you require flexibility but you still require those constraints. A curious question is you impose a constraint 6 months after the system is operational. The due values will be of course pertaining to that constraint. What happens to the old value? Think about it. Sleep over it and read the book to find out the answer. I will close this. I am sorry you are already late but this one is an important thing. In any procedural programming language including COBOL temporal attribute handling is very difficult. Temporal means time wise. Putting a time stamp, putting a date, putting a time, calculating date arithmetic. How much of Golagiri you have to do in your COBOL programs to find out what are the three days date after this date? Luckily SQL takes care of it. There is a type called date which contains a 4 digit year, month and date. This is the date example. There is a type called time and any attribute can be defined to be of the type date or of the type stamp which is the combination of date and time. You have an interval which is period of time. For example, interval one day and if you subtract a date, time, time stamp value from another gives an interval value. If you add or subtract interval into a date, you will get another date and that will be the exactly correct date. So half your maramari problems are resolved. You can extract values of individual fields from date, time or time stamp. Here is an extract verb in SQL. Extract year from r dot start time or you have given a string value expression. You can say cast that string value expression as date. Now these are some nice cities about SQL for practical point. There are large object types. One is called blob or binary large object and the other is called club, the character large object. Suppose you have a digital picture, 1.9 megabytes, but what are those? Those bits represent pixel values. A single pixel may be 6 bits, 8 bits, 20 bits, you do not know. How many pixels are there? You do not know. You cannot make sense out of those pixels. It is a blob. The interpretation of those pixels has to be left to a software which will interpret these pixels and show a nice picture on the screen or print a nice picture on the screen. SQL has nothing to do with it. So it is not a data type on which you can make comparisons or make relations or nothing like. It is a value. So blob and club permit this kind of a value. Whenever a query returns a large object of this type, usually a pointer is returned and the interpretation of this value is left to some other program to which you must connect the output of your blob. But you can store blobs and extract them.