 Assalamu alaikum dear students today we are in the lecture number 18 of the database management system course. Course code is CS403 I hope you are understanding well this course still if you have any problem write to us we will try to help you out. In the previous lecture that is the lecture number 17 we were discussing the data manipulation part of the relational data model. We studied that there are two major languages relational algebra and relational calculus. These two are theoretical languages they are not used for the manipulation of the relational databases. The language that is practically used for the manipulation of relational databases is SQL however these two languages that is the relational algebra and relational calculus they provide basis for the SQL. We are studying the relational algebra and we will study it in detail but we will also touch the relational calculus briefly and later we will study the SQL in very much detail. In the relational algebra we studied that there are two major types of operators one are one is unary operators and other type is binary operators. In the unary operators we studied project and select and in the binary operators we studied union, intersection, difference and lastly the cross product. One special form of cross product or one type of cross product is join and I told you in the previous lecture the join is one of the most important and most widely used operations in the relational algebra. So I suggest that you try to understand it very clearly. So let us start today's lecture. Join can be considered as a special form of the cross product there are different types like theta join, equi join, natural join, semi join and auto join we will study all these types in detail and let us start with the first one theta join. In the theta join we apply the condition through select on one relation and then only those rows are used in the cross product with the second relation. Dear students you must be remembering that in the cross product each row of one relation is joined is merged with each row of the other table and we studied in the previous lecture that in most of the situations as such there is no need for the joining every row with every row of the other relation. So now we are in the process of forming a special type of cross products that are most that are more useful for example first one is theta join. See the basic philosophy is same that each row from one table is joined is merged with the row each row in the second table. The thing is now the rows from the first table they are restricted they are reduced how by applying a select operator and select operator basically restricts by applying a condition. So first if you have got 300 rows in the first table and you apply a condition a select condition so it is possible that as a result of that condition those rows which reduce may be 5, 10, 15, 100, 200 even may be all of them then these selected rows only these selected rows will be merged with the every row of the other table. So this form of join is called theta join and the journal format of this statement is here. Here you see this symbol of the join and the theta here it represents a condition. Condition as you have discussed before it is a predicate a sentence that results as either true or false. So this condition is applied on the on one table and rather each row of the first table and the rows for which this condition becomes true that are selected and then these rows are merged with every row of the second table. In order to explain you the working of the theta join here are two example tables. In this example you see there are two tables one is faculty other is course and you see different rows in those two tables like there are four rows in faculty table and four rows in course table and with different number of attributes. Now if we apply simple cross product then as we have discussed before that each row of one table will be merged with each row of the other table. So since both these tables contain four rows each so in the cross product there would be 16 rows fine. But when we apply a theta join here you see the statement now we have applied a select operator you see the sigma there rank is equal to associate professor and on faculty. See see the brackets as we have discussed before as per the rule in the mathematics the innermost operation is performed first. So since the select operator is enclosed in the bracket in the braces. So what happens this operation is performed first so first of all select is applied on faculty and condition is rank is equal to associate professor. So when you apply this condition on the faculty table you can see that two rows fulfill this condition it means that we will get two rows out of four from the faculty table and after that next operation is cross product. It means that each of these two rows will be merged with each row of the course table. So now we have got two rows from the faculty table selected and four rows from the course table and as per the rule of the cross product we get eight rows in our output and as far as attributes are concerned the same all the attributes in both the table. So it means if we have got one two three four five attributes in the faculty table and three in the course table then we have got basically eight attributes in the final result. So I hope now it is clear what is to you what is the difference between the cross product and the theta join. Let us move forward this tables shows you the output. The output is that if you see on the left side the faculty part of the table it contains the first four rows contain the same value that is faculty ID F3456 name is Tahir department CSE salary 23,000 and associate professor. Now this row is merged with each row of the course table. So four rows of the course table are merged with one row of the faculty. So there are four rows in this output table and same is the case with the second row of the selected row from the faculty. So in this way we get the first four of the course and after that the second row of the faculty is merged with the course. So this is the output for a theta join operator, equi join. Rows are joined on the basis of values of a common attribute between the two relations. Rows having the same value in the common attribute are joined. Dear students, yesterday we discussed the importance of the role of join and from here the actual join that is basically used is being discussed and the first form is equi join. We will join the primary in the many side relation. Now here the logic of joining them is that we will see how the common attribute comes together and how the common attribute comes together generally as a result of a one to many relationship when we transform it in relation to the model. Now the common attribute when equi join, the basis of equi join is the same as the value of the common attribute. Now what will happen is that the rows in which the value of the common attribute is the same will merge with each other. If you look at the database point of view, we will say that the value of the primary and the foreign values will merge with each other. And if we talk to the point of view of a real world system, then those things, those instances of different entry types that are linked to each other will merge with each other. For example, there can be a lot of examples of this. As we said that the work of the employee and the project is linked to each other through the project ID, which is the foreign key in the employee and the primary key in the project. Similarly, if you look at the employer department, the department ID will be working as the foreign key in the employee table and it will be a primary key in the department table. So, this is linking to each other. So, in both the rows where their value will be the same, the department ID will merge with each other. So, basically, the equi-join links with each other that are related to each other. And obviously, this cross-product or your theta-join does not merge with each other but it does merge with each other that are related to each other. And this relation is established based on the value of primary and foreign key. Let us move on. Common attribute appears twice in the output. This is a small thing but it is also a little important. What is your common attribute? The common attribute is that the two relations that are linked to each other, the foreign primary key, the output of the equi-join contains both of them. So, obviously, the value of the common key is the same. This means that the same value will be repeated from both sides. From that table to the primary key and from that table to the foreign key. Now, dear students, I have already told you that this is not required for the foreign key that it has got the same name as of the primary key. So, the primary key and the corresponding foreign key, they can have different names but they must have the same domain. But first, you have defined, you have used the same name for both of them. The name of the primary key is the same as the foreign key. The output of your equi-join, we have discussed that both sides will have the same attributes. Since you have the common attribute, the name is the same in both the tables. So, how can you think, how it is possible that the same attribute name appears twice in an output? Because this output is also a relation. So, obviously, the same name attribute cannot come twice. So, in that case, we call it qualifier. That is, if that attribute name is the same as the primary foreign key, in that case, your attribute name will also come as the name of the table to be distributed as qualifier. So, for example, if you are discussing the employee and department and the department ID name is being used the same in both of the tables. So, when you take the equi-join, then when the department ID comes from the department, it will become department.department ID. And when it comes from the employee, it will become employee.department ID. And if you have used the same name, then obviously, there is no need for the qualifier. Again, you see that there are two tables in which we have two tables of faculty and course, but if we apply equi-join to it, then we will give the same command that faculty has used the symbol of the join. And here, we have given the condition that faculty.facid is equal to course.fid. Now, in this case, we have different names for the primary foreign key. So, specifically, we do not need to use the qualified names. We would have simply used as faculty ID is equal to fid and the system will automatically identify that they belong to two different tables. But in order to explain it, I have used the qualified names. If you look at the output of this, that is our statement. And look at this. In this, both of your tables, both of your relations, they are only merged in the same way in which the primary foreign key value is the same. So, faculty ID is the attribute of faculty and fid is the attribute of course. But it is acting as a foreign key as well. Now, both of these values are the same. If you look at the leftmost table in the first row, it is f2,3,4,5. Similarly, if you look at fid, which is the last attribute of our course and what is written there? f2,3,4,5. Similarly, if you see at the second row, the faculty ID value is f4,5,6,7. Similarly, the value we have in the course is fid, which is f4,5,6,7. So, you have seen that those attributes are merging in the same way in which the common attribute value is the same. What is its logic? The logic is that you can see that if we suppose that we have those courses which are being offered, which are being taught, and we have different faculty members in the faculty, then when we have these two days together, in which the faculty ID that we had in the course and the faculty ID that we had in the faculty were the same, this means that these courses were taught by these teachers. Now, here in Equi-Gine, you have seen that we have all the attributes of both the relations and both the tables. This is not necessary. We will read the SQL later and we will see that you have to know what is the reason for this. Tell us the name of each course and also tell us which faculty member is being taught. Now, in this situation, you need only two attributes as output. Who are there? Course name and faculty name. Although as an output, you need only two attributes. But the thing is, the theory underlying, the technique underlying will be you will merge only those rows that have the same value in the common attribute. That is, you will use Equi-Gine. You will use Equi-Gine, but also using projection, you will take only that attribute from both the tables, which is your requirement. And how and why do you do this? Always depending upon the user requirements. The user will tell you what is the requirement, that I need a flat attribute. According to that, you define different operators. I hope Equi-Gine is for you. Let us move on. Equi-Gine is in front of us again. But the only difference is that the common attribute has the same name. As a qualifier, the leftmost column is faculty.facid. And the rightmost column is the table-category. The course-dot-facid. Natural-Gine. Natural-Gine is again the most commonly used operation, even the joints. And it is so common that when we are simply joined by this, we mean natural-gine. The behavior of natural-gine is of Equi-Gine. The only difference is that when we started Equi-Gine, we said the common attribute appears twice from both the tables. Whereas in the natural-gine common attribute appears just once. And at this point, my excellent students ask which table will it come from? The thing is that it can come from any table. It does not have any significance to which table it will come from. The thing is, the value is same. So, either it comes from table A or table B, it does not have any significance. The thing is that the major difference between the two is that in Equi-Gine, the common attribute appears twice from both the sides. And in the natural-gine, it appears only once. Otherwise, the behavior of the two is exactly the same. In this, for the example of natural-gine we used the same table again. First, we used the faculty and the course. We applied it like this. In this, we named the common attribute because when we use this symbol without any specific addition, it is understood that it is being used as a natural-gine. As I said before. So, we have simply mentioned the names of the common attribute and it will always be equal to and what is the difference? The common attribute will appear once and here is the output. Look at this. The behavior of the Equi-Gine only appears twice in the output. They are in the Equi-Gine which is the same as the common attribute. The difference is that the common attribute appears once and that is why it does not need a qualifier. Dear students, these are some types that we have left outer-gine, right outer-gine, outer-gine and semi-gine. These are different forms of joints or different forms of natural joints but there is a little difference Let us see what this means. Left outer-gine. Keep all of the tuples from left relation join with the right relation pad the non-matching tuples with nulls. When you talk about the natural joints both the tuples of the common attribute have the same value. The left outer-gine will have the command on the left side of the tuple. All of the tuples will join. All of the tuples will join but after that they will join. Join means natural joint and in natural joint the common attribute has the same value in both the tuples will have the same value. The table A on your left side which is written on your left side will join with the table on the right side which has the same value. What about those rows matching row in the second table? Those rows will appear but the attributes on both the joints will not have matching rows in the other table. Those rows will appear in the output but their corresponding against the columns on the right side will be null or blank. The output will be seen on the next screen. Let's see. Here we have two tables, faculty table and course table. Faculty left joint course. Faculty on the left side of the operator faculty will be joined in rows. The first record which has the name of the faculty member and the second one is F4567 We have matching rows in the course. These two will be found and we will have three records in which two of them match with F2345 and one match with F4567. Now faculty table has two rows which have F3456 and F5678. These two matches are not in the course. These two rows will appear and the course against attributes i.e. course ID course title and faculty will not be in these three. This is the left auto joint operation. Right auto joint Right auto joint its behavior its working is exactly the same that of left auto joint but it is upside i.e. your right side table in which you will write your table on the right side all the rows will come which will match with your left table they will match with columns. But the rows of your right table which do not match with left table will appear with in the column of the left side table. As a result you can see on the screen. These two tables are faculty and course. Faculty right auto joint course. The three rows of the course which are first, third and fourth i.e. C3456 C5678 and C3425 their matching rows are present in the faculty which are first and third. But the second row of the course which is C4567 which is the financial management since it has null value in the faculty i.e. its matching rows not in the faculty. But all the other attributes which are not matching with the faculty this is how the right auto joint behaves. Let's go to the auto joint. Obviously you must have noticed that we have already studied left auto joint and right auto joint. Obviously you have seen that their behavior is applied on the left hand side table or it is applied on the table on the right side. When we simply say the auto joint it means the same the same attitude the same behavior is applied on the both of the tables. i.e. they only do left or right. Both will be applied on the left and right. The matching rows will come which have the same common attribute. After that the table on the left will be applied on the left but the columns of the right table will not be applied. After that the remaining rows of the right table will be applied and the columns of the left table will not be applied. This example will come on the next slide. The same table is applied on the faculty that have the same martial elements in the subject of a match. After that we have faculty on the F goedries also on F here but the attribute on the course side course title faculty id we have category 3 after that So, first take the natural joint of 2 tables and then take the projection on the attribute of first table. And here is the symbol for the semi-joint, i.e. the basic involvement is of the same natural joint. First of all, we have to take the natural joint, in which you will get the same common attribute of the two tables. The second step is that you have to do the output of the natural joint, the first table table, i.e. the first table on the left side is projected on the first table on the attribute of the first table, i.e. after the natural joint, you have to take the first table of the natural joint, which appeared on the first table. Now, the thing to think about is, you should think about the benefit of the semi-joint, what is the need for it. Obviously, you should immediately think about the needs of the semi-joint, what do we get from the semi-joint? We get the same table of the two tables that are involved in a relation with the rows of the second table. So, this will also mean that whatever is involved in your relation, you will get that, for example. If we have a link or relationship between student and book, you want to see which are the students who have issued a book. What will happen with that? If you take the natural joint of the two tables, both of them will come daily in the natural joint, in which the values are common. But this is showing that this student has issued this book. If you take the student out, you will get to know which students have issued the books. Similarly, if you see that the employee and the project. If you take the natural joint between the employee and the project, you will realize that this employee is working on this project. But if you take the semi-joint, that is, take the projection on the employee, you will get to know which employees are working on it. You can reverse this on the student and the book. If you take the book, semi-joint, student, what will happen with that? It will be that this book is with the student from the natural joint. But if you only take the semi-joint on the book, what will happen with that? You will get to know that these are the books which have been issued. So, you will realize what you can use for the semi-joint. So, all these are different types of operations that are available to you. And in that, you will be curious to see. Because the user does not have to tell you that you apply this statement or you apply that operator. So, the user will simply tell you the requirements. And as a designer, as a developer, this is your job to fulfill, to establish those requirements. Now, for that requirement, all these are different operators. These are different tools. And when you look at the data, when you look at the link, you will apply different forms of operators, different operators. And what is required output from that, you can give that scope. This is an example of the semi-joint in front of you. In this, we applied the semi-joint on the faculty and the course. In this, if you see that when we said faculty and semi-joint course, you have four faculty members in the faculty table. Let us say. Now, when you take this as a national joint, then you have a record in your row number one, i.e. the record of Usman and the record of Aisha. These two merged with the course for a few days because of the common value, the same in both. When you take the semi-joint on the faculty, then you will see the record of Usman and Aisha. All this will lead to Usman and Aisha, they are involved in some course. They are involved in teaching of some course. So, in this way, you have seen the semi-joint utility. Rational calculus. Dear students, as I told you earlier, that the relational calculus is also a theoretical language which we study on the relational data model which supports different operators. It is considered to be a non-procedure language. This means that it simply specifies what data should be retrieved, not how to retrieve. Whereas, we explain this in the relational algebra. We will briefly touch on the relational calculus. The relational calculus has two major forms. One is a tuple-oriented relational calculus and the other is a domain-oriented relational calculus. Let us see what tuple-oriented relational calculus is. Finding relational tuples S for which a predicate is true. A predicate is a statement which results in true or false. So, a tuple variable is a variable that takes on only the tuples of some relations or relations as its range of values. You defined a tuple variable and its range, its possible values will be relation or its values. Specify the range of tuple-revalable by some statement. For example, it is written here that range of S is student. If you take a variable, a tuple variable S, the range of that will be your student relation. S is the tuple variable and student is the range. S represents the tuple of student. Express as, look at this. Here, you have S. You have a tuple variable and you express S or find S where P S. P S is a predicate. This is an expression which results in true or false. Find the set of all tuples S such that P S is true where P implies the predicate condition. Again, you specified that range of R is student. For example, when you have a tuple-oriented relational calculus statement, you specified that R is student. So, R where R dot credits is greater than 50. Here, we are giving a condition that I want that student whose credit value is greater than 50. This is its explanation that we have given this statement. The second type of relational calculus is domain-oriented relational calculus. In this, we use variables that take values from domain instead of tuple of relations. In this form, we have said that if P x1, x2 up to xn stands for predicate, then variables x1 up to xn. This means that we will write that x1 up to xn where P is our predicate, x1 up to xn. This means that the set of all domain variables x1 up to xn for which predicate P or all values on it is true. So, we test for membership condition to determine whether values belong to a relation or not. Dear students, we have just briefly touched on relational calculus. Whatever we have studied from the relational algebra, that will provide us the basis to discuss the SQL. We will read the SQL a little later. Before that, we have two topics of normalization. In addition, we have designing the physical database. After reading these two topics, we will come to that situation when we have to define the tables and at that time, we will read the SQL. But whatever you have studied in the relational algebra, please remember that and the best way is that you will get the exercise, assignment, etc. Apart from that, you can take some tables from any book and as a practice, enter some data and practice of different relational operators. This way, you will get a clear relational algebra and you will understand the SQL based on it. Now, let us move on to our next topic which is normalization. Dear students, if we look at the stages of the database development process, after conceptual database design, we have covered the logical database stage as well. In which we said that we have transformed conceptual database design into logical database design which is in the relational data model. Next step is normalization. Normalization is that you can see that it will be at this stage after logical database design. This is an important process. You can see what it is and how it is performed. This step-by-step process to produce more efficient and accurate database design. The purpose is to produce an anomaly-free design. We have the logical database design of the design database but there can be some anomalies in it. What are the anomalies? We will see this later. But the normalization process as a result of that, on the logical database design, we do some processes which are more efficient. What does it mean by efficient? It means that on the basis of that design or on the basis of normalized database design, processing and manipulating becomes easier and more efficient. What does it mean by anomalies? It means that an inconsistent, incorrect state of database means that the database which is made for the real-world system does not represent or reflect properly. We will say that it is inconsistent, incomplete or incorrect. What do we mean by the state of the database? The state of the database means the extension of the database at a particular time. Whatever database data is present, the columns and rows and records represent the state of the database on a specific time and stage. So, what is required of the database is that it reflects the environment properly. It reflects the situation the database is displaying that it should be legal according to the real-world system and should be correct. For example, if we have a banking system, we say that our balance is balanced by a customer and a particular transaction amount is drawn from the account. The bank rule is that you cannot get more than your balance from your account. The data is reflected and it should never be represented in any transaction that your amount is drawn from the bank that is more than your balance. Unless there is no such condition that it is a different thing otherwise as a general rule. So, we say that incomplete or inconsistent state means that your data should be in a database and the real-world situation cannot be like that. So, there are four different types of anomalies that can be in the database and they are known when we read the anomalies then you will read them. These anomalies are redundancy, insertion, deletion and updation. What does it mean? It means that these are different situations and this situation can appear on a different stage that is redundancy means that the data is unnecessarily repeated and insertion, deletion and updation are based on anomalies that when you want to do this operation the error is given or it is not properly done. We will discuss its details when we read the anomalies. Nomalization is a strongly recommended step. Normalized design makes the maintenance of database easier or normalization applied on each table of a database design. Dear students, once again, normalization is not compulsory. Normalization is not a must. But this is a process that is strongly recommended. The experienced designers or the theory we have recommends that you should do normalization. Why? Because normalization makes it easier for you to maintain the consistency of the database. To keep the database in a correct state or the anomalies that we discussed earlier which are 4 types of anomalies will be less than normalization or will have less losses if you have a normalized database or you have a design that will be normalized. What is the normalization process? The normalization process is that you have a database design that has 4, 5, 10, 15, 20, 100 tables. So the normalization process is applied individually. Again, what is normalized database is that you have all the tables in that database design and it depends on the size of the database. Some can be just 1 some can be 20, 20, 20, 20 tables. So the normalization process is that you have all the tables in the database design individually by applying that process that we are going to study further. Performed after the logical database design that we have discussed informally also performed during the conceptual database design. This means that if you think properly about the stage of normalization what is the stage of our design process or rather the development process then its proper place is after the logical database design. But it happens that even if you are in the process of designing your conceptual database design after some experience your mind is tuned in such a format that whenever you are designing a table or an antitype so define in that your mind the normalization condition or the situation or the anomalies which can be in an unnormalized antitype your mind is doing the work. What happens is that unconsciously even when you are designing a conceptual database you are doing this process especially after some experience but otherwise when you have to do it specifically especially when you have to check whether you have done it during the conceptual database design or not but when you have to do it specifically it is that stage the conceptual database is done now you will practically normalize it different forms or levels of normalization different forms and different levels of normalization and we call them first number form second number form, third number form and obviously different stages so we call them different levels or different forms of normalization every form like we said different forms some are ahead every form has some conditions some requirements if a table fulfills the condition for a normal form then the table is in that normal form as you have seen on this side that every normal form has different requirements and conditions now when we said that all the tables involved in a database design we have to normalize each one what is the meaning of this process it means that we have to apply those conditions on every table i.e. first number form have different conditions second number form have their own and third number form have their own so we have to do this we have to hold one table we have to check it in the first number form if it fulfills if it fulfills then we see whether the table is in the second number form how? if it is fulfilling the conditions, the requirement of the second number form then we will say that it is in the second number form then third and fourth and like that now this process as we said that these conditions we have to apply it on every table on all the tables so the table which is fulfilling the requirements that table is considered in that particular normal form yes, process is applied on each table the minimum form in which all tables are in is called the normal form of the entire database now what we will say about our design since we have to apply the normalization process on every table so it will start like this first, second, third, fourth, like that so if we say that all the tables if we say that all the tables are in the second number form or the requirements of the second number form if all the tables are fulfilling the conditions this means that our database design is in the second number form or we will say that we have normalized our database to the second number form if first of all our database design is in the third number form then we will say that our database design is up to the third number form the normal forms as I have discussed they are 4, 5 and 6 5 and 6 which we see in literature they are defined but generally what you normalize is up to the third number form so initially the target for you is to be in the third number form or you can say that all the tables must be in the third number form and here I will repeat as I said normalization is strongly recommended but it is not compulsory this means that you may also have a database design which is not normalized which means that it is not normalized to the third number form and obviously it is not normalized to the level then it can be anomalies which are dangerous because it is not in the third number form but although your database design is not in the third number form even then your database can do the right thing don't get confused because I am just telling you to explain that normalization is not necessary you may have a database which is not normalized but it is still working why? how? I have said that your normalized database design helps you to keep the database efficient and consistent but if it is not normalized then what is the problem? the problem is that there are many chances that the inconsistency or inefficiency will come but if the developer knows that my database design is not normalized to the level then he will know that this error can come then he has to be more conscious more careful and he will have to take care of those conditions he will have to do extra steps on the basis of which those errors, mistakes cannot happen this means that if your database design is not normalized then to make it function properly the designer and the developer they will have to do more efforts otherwise it is not that your unnormalized database cannot work but it will need more efforts it will need more careful behavior on behalf of the developer designer I hope this did not confuse you but again I have explained the role of normalization objective is to place the database in the highest form of normalization as I have told you generally you keep it in the third normal form dear students we will end today's lecture in today's lecture we started about the important operator of relational algebra we have different forms including equi-join natural-join and further natural-join left outer-join right outer-join outer-join and semi-join and we have seen that basically we have a tool we have a operator in which we join the tables and then we can take the output from it as you have seen in semi-join, left outer-join, natural-join in natural and equi-join you can take the output of natural-join and as I have told you that you have tools and how do you use them to fulfill the requirements of your organization of your system after that we touched on rational calculus and we started it that we have enough knowledge of rational algebra and its basic terminology of rational calculus is enough to be familiar with which we have in database design is normalization we have touched on that and so far we have not started the normalization process but we have discussed the significance of the process and its role in our next lecture we will discuss the different forms of normalization and how to design and normalize the database so this is our discussion will continue in the next lecture and I would like to say goodbye