 Dear students, today we are going to discuss the lecture number 19 of the database management system course. The course code is CS403. In the previous lecture, we were discussing the relational algebra and especially we were discussing the join operator. We studied the importance and the way of working of the join operator. We also discussed different forms of the join operator that is equi-join, natural-join, semi-join, left-auto-join, right-auto-join. We also touched briefly the rational calculus. After that, we discussed normalization. Normalization is a very important process of the database development process. In this process, we transform our database design rather our logical database design into more accurate and more efficient, a smarter database design. Normalization is a process that you apply on each and every table of your database design and overall you think this is the normalization of your database design. A normalized database design helps you to maintain the consistency of the database, the correct state of the database. See, you can make a correct database system, you can develop a correct database system even with a not normalized database design. But I told you in the previous lecture that would make extra burden, that would put extra effort on behalf of developer, the designer. They will have to be extra conscious to maintain the consistency of the database. With an unnormalized database, there are more chances that your database may be in an incorrect state. A normalized database helps you. It makes it easier for you to keep the database consistent. Why? Because you provide such a design, you provide such form of tables to the DBMS that DBMS helps you to maintain their consistency. In that case, DBMS is more helpful to you to maintain the consistency of tables data in the database. How? You will see how a normalized database design keeps the data in a consistent form. It manipulates it more efficiently. Let us start today's lecture. Nomalization process, we have to understand, what do we mean by the functional dependency? Why? Because the normalization process is based on the concept of FBs, when you want to do the normalization, you can say the input to the normalization process is your logical database design, i.e. all the tables that you created during the logical database design and plus FDs. So this means that you want FDs for normalization. So an FD is a type of relationship between attributes of a relation, you will see. Functional dependency, how do we define it? If A and B are attributes of a relation, then B is functionally dependent on A. If each value of A in R is associated with exactly one value of B, it is written as A and this arrow are again B. Dear students, again I request that you make it very clear in your mind or what is its purpose, what is its role, because this will be used very frequently in the normalization process. So if the concept of FD is not clear, then your normalization will not be clear. The definition of FD is that we have an attribute, an attribute of any relation. If you know the value of attribute A, if you know the value of attribute A, then on its base, on its against, you can determine the precise unique value of attribute B. Whenever you know the value of attribute A, then you are sure that you will get the exact value of attribute B. You should be sure that there is a mistake. And what is the meaning of the mistake? The mistake is that there is no chance that you are going to get two values of B against one value of A. An example is that, for example, if you say that if we know the student ID or the registration number of the student, then it is certain that we will always get the name of one student. If I broad this example, then it will be very easy for you to understand that let's say if we have got the national identity card number of any person, this is sure that against that number, we are going to get the name of a unique person. We are going to get a unique name. This means that the national identity card number of two students cannot be the same. Hence, we will say that if we know the number of a person's national identity card number, then against that number, we will get the same name, we will get the same address, we will get the name of a person's father. Apart from this, if you look at one more thing, first of all, if in a class, we say that Imran Ahmad is the name of a student, tell us the name, the father name of the student. You cannot be sure that if we know the name of a student, name of any student, then we can determine precisely, uniquely the father name. Why? Because we can have multiple Imran Ahmad in a class. So, you cannot be sure. So, when we talk about FD, in FD, when you say that we know the value of FD, then based on that, if you get the precise value of the second attribute, you will get a unique value. This means that the B attribute is dependent on A. So, we will say that this is a functional dependency. And in this, your A attribute, the value of which you can determine the value of the second attribute, is called determinant. When we write FD, we write an attribute and then arrow. The arrow's tail, on the side where there is no arrow head, the attribute on its tail, whether it is an A attribute, and we will see in the examples that in some cases, we can have a multiple attribute on the left side of the arrow. So, if there is an A or a multiple attribute, they are called the determinants or determinants. And the right side of the attribute, the arrow head, the attribute on that side, they are called the dependents. So, this is an FD. We will read it like this. If we read it on the side of the determinant dependent, we will read that A functionally determines B. And if we read it on the side of B, we will say B is functionally dependent on A. I hope that you have clear this concept. Let us move forward. It does not mean that when we have written an FD, it does not mean that you can derive, you can calculate the value of B. In some cases, this can happen, but not always. Most of the time, this is sort of relationship, a mapping. If we know A, when we relate it, the link related to it is the same value of B. It means that if we know value of A, then we can precisely determine its unique value of B. Attribute or set of attributes on the left side are called determinant. And on the right side are called dependents. For example, if we have a relation, R, and there are attributes A, B, C, D or E. So, against this relation, we have got two FDs written there. And obviously, you might have noticed that we can have multiple FDs. We can have, not always, not necessary, but we can have multiple FDs against a single relation. As you can see here, it is written that A determines B, C, D. And we have another FD, D determines D and E. One more thing is that if you have different FDs, if you look at them, D is dependent on one FD and it is determinant on the other FD. What does this mean? This means that the attributes in this relation depend on one of these FDs. If you know the value of A, then you can get the value of D. One more thing is that if you know the value of D, then you can determine the value of E uniquely. For example, with proper attributes and relations, as you can see here, FD has a proper example where we have used proper relations and attributes names. If you look at this, we have defined a relation, student. The attributes are student ID, student name, student address, program name and credits. What this means is that how many credits are in that program. If you look at this, the first FD is written as student ID determines student name, student address, program name and credits. We also have another FD of this relation, program name determines credits. What does this mean? This means that if we know the ID of a student, the ID can be any number, it depends on the institution what ID they give, it could be even registration number. So, if we know the ID of a student, then we can uniquely determine what the student name is. And there will be no confusion whether the ID of S1015 is Shakur Ahmad or Hamid Ali Khan. It will never happen. There will always be a single name against this ID. And in this way, we will be able to determine a particular, a unique address against this ID. Likewise, program name. Which program is registered or read in which program. Similarly, talk about credits, which program is enrolled which program is read, how many credits that program contains. Or it consists of. The second FD is functional dependency. It says that if we know the program name, then we can tell how many credits this program consists of. So, this FD is obviously explaining it. So, if you pay attention to it, relate it with your real-world situation. This situation which is shown here, you can think of any relation or any table in your mind. You can write it. And if you look for FD from there, you will understand what is the role of FD. In this, the table in front of you which we have just defined, you can see its data. In this, you have the student ID which is S1020, for example. In this, you can see the name. Let's say Sohail Dar. And there is an address too. In addresses, we have written briefly. But obviously, you will store it in the database. It would be a complete address. So, here, because of the place, we have written it short. After that, look at program name. First of all, it is MCS. And after that, program credits. That is, how many credits are in this program. In that, you can see 64. Similarly, the second record we have, if you look at it, we have S1038 student. And in this, the name is Shoaib Ali. And in some sector, in Islamabad, it is enrolled in BCS. And its credits are 132. You can see that all the records we have, which have the same degree name, the program name is the same. For example, wherever MCS is, so, whenever you say MCS, it is certain that the credits against MCS, the total of the program, you will always get 64. It can never be that MCS is written somewhere and the credits there go up and down. Similarly, when you say BCS, so BCS against, you will always find 132 credits. So this means that you can understand that when you know the program name, you can for sure determine a unique value of the number of creditors on which this program is available. And likewise, if you apply this concept on the student ID, then you can determine all the attributes of the student relation. Dear students, another role of FDs which you will understand in this is that using the FDs, based on the FDs related to a table, you can determine the keys of that table. If you look at a table and look at FDs, they can help you, they can guide you to determine the keys of that table. How? Such a FD in which the determinant is determined by all the attributes. Think about it again. So the purpose of the determinant is that it uniquely identifies the dependent. So if you have a determinant like this, whether it is a single attribute or multiple attribute, if a determinant is like this, in which all the attributes are included in the table, this means that you use its value. If every attribute can determine the value, well, that is the role of the key which we have discussed earlier, that you identify a particular record based on the key value. How does the record become a record attribute? So when your determinant determines all the attributes, this means that it is helping you to identify the complete record. It means it is a key. So in this FD, the determinant determines all the attributes of that relation. It means the determinant is a key. Now the key type can be a super key, a candidate key, a primary key. Let's see how it will identify what type of key it is. A minimal super key is the candidate key. So if a determinant of an FD determines all attributes of that relation, then it is definitely a super key. This is certain that when you have a determinant that determines all the attributes, then at least it is a super key. Now the question is whether it is a candidate key or not. Let's see this. If there is no other FD where a subset of this determinant or the super key is a super key, then it is a candidate key. The simple thing is that if your determinant is a single attribute, one thing, and for that it is dependent, they are all of the attributes of the table. It means that definitely it is a super key and a candidate key. But the thing is if you have a situation where you have a super key of an attribute, then if you attach an attribute with that determinant and make an FD, then obviously they will determine all the attributes together and it will be a super key. We have already discussed this in the reference of super key. Now we are discussing the same thing in the frame of the FDs. This means that if you have a super key of an FD, then on the determinant side, whatever attribute you add from that table, from that relation, then it will still remain the super key. Now what about the candidate key? He says that whatever determinant you have in any FD and he is determining all the attributes, it is a super key. Now look at the other FDs. When there are remaining FDs related to that particular relation, if there is no such FD in which the subset of this determinant is also a super key. A proper subset. That is to say if you have two attributes in the determinant, in one FD, there are two attributes in the determinant and in its dependence, all other attributes are included. So it is a super key. Now look at the other FDs. If there is no such FD in which one of these two is either this one or this one, then he is determining all the other FDs. In that case, he will also call this as candidate key. But mind it, first of all, we have two attributes, A and B and we are determining C and D and E. A and B are also determining C, D and E. And your table has these five attributes A, B, C, D and E. Now, if we have another FD in which A is alone, A is determining two or one. The candidate key will not disturb him. Because key will be determined if the determinant determines all the attributes. This means you may have multiple FDs. But the key will be the one where all the attributes are dependent. Now, what will be the candidate key whose determinant has any proper subset which does not determine all the attributes. Then the candidate key will be dependent. And your super key consists of a single attribute where the determinant is a single attribute and it determines all the attributes. So FDs help to identify keys. Now, in this case we have an employee table. We have employee ID, employee name, employee address, employee department, project ID and project salary. We have an employee working on this project and the project salary means what is the salary on that project. And here there is an assumption underlying that an employee can work on multiple projects. And this assumption would have been much clear about it had we seen the year diagram of this design. Now, here we see that the project ID is not written because the employee can work on multiple projects. The second FD we have is if we know the employee ID and the project ID then we can determine the project salary. This means how much salary this employee has on this project. This means if you see that the employee ID alone is not working. But if you see the employee ID and the project ID together then all of these are determined because employee name, employee department and employee address are determined by the employee ID alone. The rest of your project ID is determined in the second FD and the project salary is determined by the employee ID and the project ID. This employee has a super key which is employee ID and project ID. This will determine the super key. Now is this candidate too? For that we have to see that it has a proper subset. It has two proper subsets possible. One is employee ID alone and the other is project ID alone. This project salary is determined. Similarly if you see project ID alone then project ID alone cannot determine employee ID employee name employee address. This means the employee ID and project ID both of you need to determine all attributes of this relationship to determine all of the candidate key is employee ID and project ID. And because this is the only candidate key then you will select this as primary key. This way you will know how you can identify the keys of a table. Here we have a table of students. Here we have attributes program name address NIC national identity card number and CGPA. These are the attributes of this table. Now we have two FDs related to this. Which ones? First student ID determines student name program name address NIC and CGPA. This means the student ID which determines all of them means this is a super key. Consist of the single attribute then for sure it is a candidate key. Why? Because it does not have a proper subset because it will be an empty set. So this means you have got a candidate key that is the student ID. Look at the other FD which is related to this table. What is that? NIC determines student name student ID program name address and CGPA. This determines all the attributes of the table. When I say all you will not get confused that the determinant itself is not included. We have some properties of FDs so you will know that the determinant is self-determined so it will automatically be included. So it determines all the attributes of the table. It means this is a super key and since it consists of a single attribute automatically it becomes a candidate key as well. This means in this table we have got two candidate keys and what is important in this case is you should understand how we can identify how we can establish the given set of FDs about a table. Let's move on. Inference rules we call inference axioms or Armstrong axioms and these are the rules that establish certain FDs from a given set of FDs or these rules are sound. Inference rules we have some FDs some FDs some FDs So these inference rules automatically deduce some FDs from them and remove them. If we have some FDs after applying inference rules certain FDs can be removed and assume. And the assumptions are proved to be sound. Meaning the way you have a given FD similarly those FDs are authentic accepted valid true that you remove from inference rules they are as good as your actual FDs that you have identified. These inference rules you need to understand FDs and their process. That helps you so let me discuss some inference rules. The first rule inference is reflexivity. And when I explain them they are very obvious meaning there is no problem you will understand it immediately. If B is a subset of A then we have two attributes A B A B A B B A B A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A automatically depends on your right side. So the requirement or definition is fulfilled that the key is to determine all the attributes of a table or a relation. Determinant is there automatically by the rule of reflexivity. For example, if we have got student name and student address, then it will determine the student name. Since we have talked about the subset, since an attribute is also an improper subset, it is written here that if we know the student name, then we know the student name. That is, the student name will determine that student name. Simple. Augmentation. If we have got an FD, A determines B. So the augmentation is that if we add an attribute on both sides, then that will also be a valid FD which is from this FD. As we have written that if we have A determines B, then it will also be valid that AC determines BC. For example, if we have got student ID determines student name. So if we take an attribute, first of all take the student address, add it on both sides, then this will also be a valid FD. Why? Through the augmentation and reference rule. I hope you have understood. Let us go ahead. Transitivity. This is also very simple. If we have got two FDs, in which we know that A determines B and the second FD is that B determines C. So this rule or FD is that A determines C. This is called transitivity. Here it is written that if the student ID determines the program name. If we know that the student ID is this, then we will get the program name. Which program is enrolled in. On the other hand, if we know the program name, then we will get to know the credits of this program. There are credits in MCS, in BCS, in BIT, in MB, etc. If these are FDs, then this also comes out of these FDs. And this will also be a valid FD that student ID determines the credits. How? Through the transitivity and reference rule. Let us go ahead. Additivity or union. This means that if A determines B and A determines C, then A determines BC. You can say that this is also a method of writing the FDs. Dear students, as you saw in this last inference rule, that if we have two FDs, A determines B and A determines C, then we can combine them. That A determines BC. Another thing is that if we have got an FD, in which multiple attributes are on your determinant side and on the dependent side. This FD means that the attributes on your determinant side you should think of as a joint thing, a single thing. They are determining each and every attribute on the right side. The dependent side can be considered separately that A determines C, D, E. This means that A and B together, together, not separately, you will not study this that A alone can determine C, D and E and B alone can determine C, D and E. This FD means that A and B together, both of which are your determinants, are determining C, D and E separately. The attributes on your dependent side can be considered separately, but the attributes on your determinant side will be considered together in the context of this FD. It may be that you have another FD that says A alone can determine all of them and B alone can determine all of them. That is another thing. But as far as this particular FD is concerned and any particular FD is concerned on the determinant side, if you have multiple attributes then this FD means that they are determining each and every attribute separately on the dependent side. Let's look at the rest of the inference rules. Projectivity and decomposition. As I have just talked to you that if we have A determines B, C then this means that we can separate it in such a way that A determines B and A determines C. And mind it, as I said earlier that if you have a determinant here then there are multiple attributes. I have shown an attribute here. If you have multiple attributes or our determinant is a composite form of an attribute, then you cannot separate it. For example, we have separated it in the decomposition form. If we break that FD into these two FDs it would be 100% legal, correct. But again, if you have multiple attributes on the determinant side then we cannot automatically do it in such a way. In inference rules you do not say this because it does not happen. For example, if you have an employee ID if you have employee name and qualification then if we break this FD into two and say that employee ID determines E name and employee ID determines qualification then that would be legal, that would be correct. Pseudo-transitivity. In this, if you have got two FDs how is it that A determines B and C determines B. Since we have the first FD it is called A determines B and the first FD's dependent B is determining an attribute C with another attribute D. So the place where your dependent is the place of that dependent on the determinant side the determinant of that dependent can come. For example, A determines B and C determines B. So the second FD in this FD, we can place A instead of B. This means that we can remove this FD from these two FDs that A determines C. For example, if you have got student ID determines student name and we have another FD that student name and let us say father name they determine student address. If we remove this FD from these two FDs as an inference student ID and father name they jointly determine the student address then this will also be a valid illegal FD. We have studied the FDs and as I said these FDs are used in the normalization process. So before going on normalization I will again ask you to be very clear about the FDs see in the book practice with yourself but clear the concept of FDs is based on normalization. Dear students before discussing the different forms of normalization I will tell you again briefly why do we do normalization to produce smarter, more efficient more accurate tables. We have a logical database design how? We transformed our conceptual database design following very well defined rules and then we got our logical database design and in this case it was in the relational data model. Now normalization has come to an end. Normalization has different forms different levels and every level every form has its own requirements. So our database design on every table individually we have to try all these forms and when we see that our table is fulfilling a particular form a particular form requirement this means that this table is at that level. And if all the tables are in a particular form in a particular level it means our database design is in that form on that level. So this means that the normalization process we will do is that we will check every table individually according to the requirements of those normal forms. Now let's start what is the first normal form. A relation is in first normal form if and only if every attribute in every table contains an atomic value. It is also defined in some places that there is no multi-valued attribute or we can also call it the repeating group in the relation. Dear student, if you remember when we were studying the relational data model there we read the 6 basic properties of the relational table and the first property was that a cell of a table should contain an atomic or a single value. So this means that the first normal form it basically matches it transforms it makes a table a proper table because we have read that a table cannot contain a multi-valued attribute or we cannot have multiple values in a cell. So obviously as we are saying that this is the first normal form when you take a single value in all the cells of a table of a relational table of all records actually you are completing the basic requirements of that table. So this thing is sort of repeating but anyway this is a part of the normalization process to make it sure that this is its part and obviously if it does this it is fulfilling the basic definition of that table as well. So multiple values create problems in performing different operations like select or join we will look at this example and you will remember when we were studying the mapping process from the ER database design to the conceptual database design we saw that the repeating attributes or the multi-valued attributes we applied a separate treatment a special treatment to those attributes and you will remember that we created a separate table for them. So let us first do as we said that if we skip or if we cannot identify that this is a multi-valued attribute or if you have an idea that this is basically a multi-valued attribute then you have to take care of it during the normalization process and convert it into a single-valued attribute Now what you will see here is that we have this table you have a student and it has student ID, student name student address, program name book ID means how many books this student will issue and obviously if we allow multi-books to be borrowed by a student in that case your book ID becomes a multi-valued attribute in this you have the third row in which we have student S1015 Tahira Ijaz then program name is MCS and here we have two books which means that this student has an issue Now the question is that whatever you have read in the last lectures of relational algebra and you remember one was select if you want to apply the select operator on the value of the attribute book ID so what should you give the value against third row either it is B08945 or B06352 or both together so from here basically you can see that there is confusion in it it makes it difficult when there are three days left there is no problem because if you say select and you say book ID is equal to B00129 and you select it forward it will match what about these things there are two values so you cannot easily find the proper value to be selected which means there are three possibilities first one is this, second one is this both are not the same basically there are two different values and here there are two if there are three the join operator you have to join if we have a book ID if it is a foreign key here and where the primary key is the remaining three rows first, second, fourth about them we are sure we have got a particular value of the foreign key and we will find a specific value in the primary key attribute in the book table but what about this should this row be 945 or should it be 06352 or should it be both how will we find out that there are two values are we separating common but no, comma can be a part of the value so if there are multiple values here then the operators of the rational algebra when we transform them then it is difficult to perform them and it is not as efficient as possible so if you have multiple values in any cell in an attribute then you do not have a proper table nor you can properly operate it you will transform this table into the first normal form first normal form is that every cell contains the single or atomic value and this rule is that any row that contains multiple values for a particular attribute you repeat that row in this case if you look at S15 on the next screen we have given a treatment that S1015 we have put it twice there because once it will come for B08945 and once it will come for B06352 now obviously what is the benefit of this the benefit is that all your attributes all your cells they contain atomic values they can mind the key because in this case S1015 has come twice and obviously because before this the student ID was called key or an assumption in this case it is not the key and generally when this type of situation happens in the practice session in that case this is the primary key and the value of your repeating attribute will be the primary key and you will remember the same behavior taking the mapping process from ER diagram to the relational data model so I hope that you will get the first normal form and mind it that in this you have definitions of normal forms that you should know their importance I will clear this importance in the second normal form the second normal form has a concept of functional dependency what is full functional dependency and attribute B is fully functionally dependent on A if the B can be determined by whole of A not by any proper subset of A although here in the definition we have taken the attributes B and A they seem to be single attributes in the case of single attribute we made sense to talk about the subset of A A and B both can be set of attributes B as I told you earlier that we have a FD we can have multiple attributes on the determinant side so what is full functional dependency if we have a FD and there is a determinant and there is a dependent so the dependent all those all those determinants are dependent on B if they are dependent on any subset of them then you will say they are not fully functional but they are partially dependent for example if we have a relation A, B, C, D and E we have these 5 attributes we have one FD that A, B determines E, D and E but there is another FD that B determines E if we look at these 2 FDs then C, D and E the first FD these 3 are dependent and A and B are determinant but C and D they are fully functional dependent why? that there is no other FD because C and D are dependent on any subset of A, B but E we have an FD which is the second FD which says E is dependent on B if you look at these 2 FDs then you will say C and D are fully functional dependent on A, B but E is partially dependent on A, B because we have another FD and obviously if you look at the other FD then B determines E because in this case the determinant you have is a single attribute so obviously in that case it is certain that its dependence will be fully functional dependent full functional dependency and this is based on second number form let's see what it is this is an example as discussed we have a course ID student ID, student name faculty ID room and grade if you look at this course ID and student ID these 2 are determinant student name, faculty ID room and grade but we also have 2 more FDs student ID determines student name and course ID determines faculty ID and room this means that if you look at all these FDs in the first FD you will say student name, faculty ID and room they are partially dependent on course ID and student ID because the other 2 FDs this attribute is dependent on the subset of this determinant however grade is fully functional dependent on course ID and student ID when you see an FD or all FDs in every FD you can see which dependent is fully functional dependent it is partially dependent on determinant dear students in today's lecture we have a very important topic which is the database design process which is normalization we have started the discussion on that today's discussion we have started on FDs what are functional dependencies functional dependency it is necessary to understand that it provides a base for normalization always you need your logical database design and also you need FDs this is my logical database design normalize it you cannot do that you will say what are the FDs because I cannot normalize the database without knowing the FDs they guide you they provide a base without which you normalize your design so we have done the discussion we have read the inference rules sound well defined and their objective their utility is that if you have some FDs where do they come from? they come from the system some FDs you have based on inference rules you can establish some more FDs and as I said the FDs you establish with inference rules they are as good as your FDs which you have defined you are not identifying with your system after the inference rules the normalization process the actual definitions the actual normalization forms their discussion started we discussed the first normal form and as I said the first normal form basically is a repetition as I said the basic property of the relation it repeats the first normal form then we are coming to the second normal form the base of the second normal form is full functional dependency at the last moment we discussed how to use it in the second normal form we will discuss it in the next lecture with another normal form now I would like to say goodbye