 Dear students, today we are going to discuss lecture number 20 of the database management system course. The course code is CS403. We were discussing in the previous lecture the normalization process. We discussed the first normal form and then we moved to the second normal form and for that we discussed the full functional dependency that provides the base for the second normal form. Today we will start our lecture with the discussion on the second normal form. So let's start today's lecture. You have got the idea of the full functional dependency that is used here in the definition. A relation is in the second normal form if and only if. It is in the first normal form and all non-key attributes are fully functionally dependent on key that is there is no partial dependency. As I told you in the previous lecture the definition of the normal forms is also important. And if you see the definition that we just discussed it has got two parts. One is that the table must be in the first normal form and mind it. If you do not mention this part in defining the second normal form then your definition will not be correct. So always say this thing first that the table has to be in the first normal form and then you say that there is no partial dependency. That is all the attributes of the table they depend upon whole of the key not any part of that key. Now in this slide you see a table class. Now in this slide you see a table class again this table we have discussed before this may have course ID, student ID, student name, faculty ID, room or grade. As far as the requirement of the keys are concerned the key is required to determine all the attributes of a table. So the key that is required to determine all the attributes of a table is the course ID and student ID. Because the grade is such that the course ID determines the faculty ID and the room and the student ID determines the name but the grade is determined by both the states. Similarly the course ID determines the student name and the student ID determines the faculty ID and the room. Now in this situation you see that in this table we have got the partial dependencies. If you see at the student name attribute this attribute is partially dependent on the key. Similarly if you see at the faculty ID it is also partially dependent on the key because it is dependent on the course ID that means it is partially dependent on the key. Likewise the room is also partially dependent on the key. The second normal form requirement is that along with being in the first normal form all the non key attributes should be fully dependent on the key. In this case you see that we have three attributes which are not fully dependent on the key and mind it. If you have 20 attributes in a table and the keys are composite and if one of the keys is not fully dependent on the key then it is the violation of the second normal form. If the rule, if the requirement is being violated by a single attribute means that form is being violated. If the rule, if the requirement is being violated by a single attribute means that form is being violated. Table is not in that form. In this case table is not in the second normal form. Now the question arises that if this second normal form is not there then what is our problem? We have already discussed that an un-normalised table, a table that is not in the proper normal form is more prone to error. It is more open to error. There are more chances that it can lead to an inconsistent state. Now obviously we have studied that when the table is not normalized it leads the anomalies. Anomalies are errors or wrong state of the database. We have studied that when the table is not normalized it leads the anomalies. Anomalies are errors or wrong state of the database. Now what are the four anomalies? First one is redundancy, insertion anomaly, addition anomaly and updation anomaly. Now this table that we have said is not in the second normal form. There are four anomalies in it. Let me explain. This table has been populated as per the form given in the previous slide. The table given in this table is in the previous slide. First thing is, the table is in the first normal form, because all cells contain atomic values, as is clear from the diagram from the table. Now, look at how anomalies we have in this table. The first thing is redundancy. How is that? It is that against a course ID, we have a faculty ID and the room is fixed. For example, if we have a course ID C3456, then the faculty ID assigned to it is F2345. The room being taken is 104. Similarly, if you look at C5678, the course ID is faculty ID F4567, and it is being taught in room number 106. Now, this information is being duplicated, this is being repeated. This leads to the redundancy. Student ID determines the student name. In this, one student enrolls in multiple courses. When there are multiple courses enrolled, obviously the data of the student will appear in this table, and that will appear repeatedly. Now, look at the student ID, the student name is fixed because the student ID determines the student name. We also have an FD. The number of times we are giving the student ID, we have to give the student name as many times as possible. So, the student name that we are writing multiple times here, that is redundancy. This means that if a student enrolls in six courses, then the student name has to be written six times. This is wastage, redundancy, unnecessary duplication. Similarly, if a course enrolls in 50 courses, then you are writing the faculty ID and room 50 times. However, this is sure that when you have given the course code, then you can tell what faculty ID and room are. But in this table, you are storing that information, that data repeatedly, that is redundancy, wastage of storage. This is one anomaly. Secondly, now you have insertion anomaly. This insertion anomaly means that if we want to insert something, then in certain situations, we will not be able to enter data into this table in this form. For example, first of all, we want to enter a course, and until now, no student has enrolled that course. We have defined a course. We have allocated to a faculty member and we have assigned a room to that course as well. And we want to enter that course in this table, but no student has yet enrolled that course. What will be the problem in that? We cannot enter that course into this table. Why? Because student ID is the part of the key of this table. And if we do not have any student who has yet enrolled this course, it means that there is a null in the student ID attribute. And because the student ID is the part of the primary key, and that we have already studied, that primary key or no part of that primary key can have a null value, we will not be able to insert this record into this table. Same is the course with student. Student has registered in your institution. And at a particular stage, he has not yet enrolled any course and role. It means we have got the student ID, we have got student name, but no course ID. Obviously, there is no course ID. So, no faculty ID, no room, no crate obviously. The problem here is that, we cannot enter the data of this student in this table. Why? Same reason. We do not have any course associated with this student. It means that we do not have the course ID part of its primary key. Because the primary key of this table is again that student ID or course ID. So, until a student enrolls in any course, we cannot enter the data of that student into this table. This is called insertion anomaly. A state, a situation that is incorrect, that does not reflect the real world situation properly. In your system, we have got a student that is not enrolled in any course. But where you cannot enter the data, when you cannot enter the data, it means you cannot represent that student here. It means you are not allowing that student to be a student, whereas in the real world situation, in the practical system, which you are implementing, it is legal, it is allowed. But the design that you have the database, it is not allowing that thing. This means this is called an insertion anomaly. An anomaly, an error, that arises in a situation where you want to insert something into this table. You have the two anomalies, namely redundancy or insertion anomaly. Similarly, in this table, there is a deletion anomaly. Deletion anomaly is like this. Let us say there is a course that is enrolled by only one student. And let us say, due to any reason, we decide to delete the record of this student from this table. When you will delete the record of the table, obviously, the course part of that student will also be deleted. Its loss is that you have lost the information about the course. Similarly, let us say that there is a course which is enrolled by only one student. And you want to remove that course. You say that if we do not offer this course, then we do not allocate the faculty ID. Obviously, there is no room. And if you delete this information about the course, the student that has enrolled only in this course, the data of that course will also be lost. Because you have the part of the primary key of your student ID and the course ID. So, if you want to finish the course ID, obviously, you will have to delete the student part of that record as well. So, if a student has enrolled only this course, which you are going to delete, then along with the information of the course, along with the data of the course, you have the student's data. A situation, a state that you do not want to represent, that is not reflecting the actual system. Because there is no such thing that if you have a course, then you let the student's data go. So, this is an deletion anomaly. Now see, there is also a deletion anomaly. How is that? First of all, we suppose that 50 students have enrolled a course. So, the information of this course is present there with the reference of 50 students. You have got course ID. You have got faculty ID. You have got room number. First of all, you want to allocate this course in another room. You say that it was in the flan room, it was in 106. You do it in 108. See, the problem now is with this table that you will have to update this data in 50 records. In all those records in which this course has been enrolled. Fine. If the designer thinks that he has to do 50. He will do so. But let's say there is a situation, he forgets or makes a mistake. That something happens and nothing happens. This is more a wrong situation. It is the same course. But some students are offered in one room. One faculty has an idea. Some students are offered in another room. This means that if you are handling it properly, let's say. In that case, at least you have to do the extra work. If you have done your table store, if you are changing the room number, then you have to change the room in 50 records. If you are doing it properly, then you have to do it 50 times. But if you don't do it properly, then even more wrong situation happens. An inconsistent state. A state that cannot happen in your system. Your system doesn't allow the two different rooms to be offered. If the rule is like that. This means that now your database is reflecting the situation. What is the situation? One course is being offered in different rooms to different students. That is an inconsistent state. This is the example of the updation anomaly. These four anomalies which can cause a situation. These four anomalies we have discussed individually. Now, the approach is, the solution is that we transform our table into the second normal form. When we do that, then these four anomalies will give us freedom. This will end. Let's see how we will bring the table to the second normal form. We have discussed the problem and how it will end. Relation is decomposed. We break it and break it. Based on the FDs. We have one relation. We have multiple relations. How do we get a clue from this? We will get a clue from FDs. The first three lines are showing us our original table and our original FDs. If you see the FD with the student name based on the original class table, you took the student ID and the student name and created a separate table called the student. After that, you made another table called the course. The third FD in which the course ID determines the student room. You made a table and declared the course ID as the primary key. Similarly, you made another table, class. Now the class table contains only the course ID, the student ID and grade. Now, if you look at these three tables and your FDs, these three tables are in the second normal form. Why? They were in the first normal form because the original table was in the first normal form. They are all in the first normal form. If you look at the first table for the student, you have the student name and the key is the student ID. This is obvious because the student ID is a simple key. There is no question of partial dependency so definitely it is in the second normal form. Look at the course. Again, the course ID has three attributes. The non key is the faculty ID and the room. The key consists of simple attribute and single attribute. There is no question of partial dependency so definitely it is in the second normal form. Look at the third table of the class. In this, we have the composite key course ID and student ID. But if you look at your FDs then you do not have any FD in those FDs in which you come to know that the grade can be determined by a subset of the course ID and student ID. This means that the grade is fully function dependent on the key. This means that this is in the third normal form. So this is in the second normal form. So how did we get rid of the anomalies that we were encountering, that we were facing in the original table. Let us see. On this slide, the previous slide which was in the same table, that data has been split into three tables. The student data is in the student table. It is in the course table. And the class which is in the enrollment is in the class. Look at the four anomalies we discussed. The first thing is the redundancy. We noted that if we have a course that has been enrolled 50 times then 50 times we are repeating the data of the course. For example, we were repeating every course. What is happening in this? The faculty ID and the room are only once. How? In the course table. And the enrollment of the 50 students enrolled that is being represented only by the course ID or the student ID. But the faculty data and the room number that is not being repeated is just once. And the interesting thing is that it is applicable in the 50s. How? Whenever you want a student to enroll in this course but which faculty is taking this course and which room is it? You simply refer to the course table and you will get information from there because the course ID is primary. So that is there just once. So you will get this. Records, 50 Situations, 50 places. Similarly, same happens with the student. If you do every student first then 5, 6 or 4 courses are being enrolled. So in the previous situation what was happening was the student's name was being repeated 4 times or 6 times or 5 times. That was redundancy. What is happening now? In this design your student name that appears just once. Student table. Now it is saying that when you want a student to enroll in 6 courses that is called a course class table. 6 times. And whenever you want to know the name of the student you simply refer to the student table. So if it is enrolled in 6 courses then the name of the 6 places will remain the same. So you will get that from the student table. In this way you will see how we got rid of the redundancy. In the next case insertion anomaly. First of all we have a student who did not enroll in any courses or we have a course which has not been enrolled by any student in the previous situation we were not able to enter that course or that student. But in this case put the course data in the course table whether any student enrolled or not enrolled. In the same way the student data is simply written in the student table whether any student enrolled or not enrolled. No problem. They are independent of each other. In this way the insertion anomaly is also over. Now come to the deletion anomaly. First of all we will remove a student. So what we will do we will go to the student table and simply delete that student's record. Keep in mind that when you delete the student's record from the student table all the courses enrolled were in the class table. When the student's record went from the student table all the courses enrolled went to the student table. But there is no effect on the course data. The course data is also there in the course table. So when the student's record went to the student table then its enrollment doesn't make any sense. This is quite logical. It will go anywhere. Mind it. This can never happen that a student's record enrollments are present in the class table. Why? Because the student ID in the class table is a part of the primary key. But it is also acting as a foreign key. Whose reference is this? Student ID in the student table. So it cannot be that you delete a record from the student table and it cannot be in the class table. There are different situations but Dbms itself Dbms itself will take care of that. And if you have told it that it is a foreign key then it will not let it happen or guide you. If you want to remove a course then you delete the course from the course table. Obviously all the students enroll in this course that data will also go away. But if first, as we talked about in the last session when it was not normalized then we said that if a student enrolls in just one course when we will finish this course then the student's data will also go away. Now if in this situation we have one student who enrolls in just one course then if we delete that course then the course data will say that enrollments which were in this course but there is no effect on the data of the students. It is their untouched safe and sound. This means that the deletion anomaly is not there. Now we come towards the updation anomaly. Earlier we said that if you want to change the faculty ID of the course or change the room you will have to do it 50 times if there is a course enrolled then in all the records you will have to change it. That was the updation anomaly. But in this case first you want to change the faculty ID of the course if you want to assign any other faculty you will do it just once. The course table you can go there and change the faculty ID and the beauty of this concept and now this change is applicable on all the tables on all the records which you have in the class. For example if you enrolled 100 students we will do this change in one place and this change automatically all the records which were enrolled will be applied automatically. You can see that a minor change in the design or by bringing the table into the second normal form how you got rid of all the anomalies that were there in the table that was not in the second normal form. Each of these tables that we have created by decomposing our original table is in the second normal form as you have discussed and obviously all those tables are free of all those anomalies that we faced in the original table. How do we get into the second normal form and what are the benefits of that and mind it, FDs that your normalization is based on FDs. Third normal form A table is in the third normal form if and only if it is in the second normal form and there is no transitive dependency that is no non key attribute is dependent on another non key attribute. Dear students while defining the second normal form that you must see the proper definition of the second normal form same applies here the third normal form it first condition is that table has to be in the second normal form and then it applies the second condition that means that until a table is not in the second normal form till then it cannot be in the third normal form that means that first you take care of the partial dependency and then you come towards the transitive dependency because this is how it is defined and as far as the transitive dependency is that we have already discussed when we were reading the sketch rules what is there that if A attribute A is determining B and B is determining C so this reference A is determining C and this is an easy way of transitive dependency as it is written here that whenever you see that in any table in any relation a non key attribute is determining any other non key attribute that means there is a transitive dependency now here we have the same student table student ID, student name student address program credits defined in this under the first FD your student ID attribute it determines all the attributes of this table it means this is a primary key this is a candidate key and we do not have any other attribute that determines the rest it means this is the only candidate key and because it is the only candidate key we select it as the primary key that is done this is definitely in the second normal form why? first normal form is that we assume that all attribute contains atomic values and second normal form is that there is no question of partial dependency because your primary key your determinant key it consists of a single attribute so for sure it is in the second normal form but the second FD it says program is the program credits this means if you know the program name you can for sure tell the credits of that program now this is a situation that is called the transitive dependency if we say student ID is A program name is B program credits is C so student ID is determining the program name and program name is determining the program credits so like this you are reaching the program credits but through program name see although through student ID the credits of the program can be told directly but this root also exists this is a form of transitive dependency or it can also be said that the non key is determining the non key so this is a violation of the third normal form the third normal form is that in the second normal form and there should be no transitive dependency or the non key is not determining the non key or the non key is not dependent on the non key now the program name and program credits are also non key so since this dependency is there is a transitive dependency that is the violation of the third normal form dear students in this only 4 anomalies exist when we are violating the third normal form or when the table is not in the third normal form then only 4 anomalies in the second normal form exist here but anomalies are there but this time they are not due to the partial functional dependency rather they are there due to the transitive dependency first of all if you see redundancy, duplication about this table the FDs that we have got one of the FDs says that the program name determines the program credits now when you enter the students in this table first they could be 100 or 1000 of them so you store the program name in which the student is enrolled and also the credits of that program now these 2 things they are being duplicated unnecessarily when you can determine the credits of a program by knowing the program name so you repeatedly store the credits it means that this reflects the redundancy unnecessary duplication first anomaly second insertion anomaly first you want to introduce a program and you have defined this is a program and these are the number of credits that this program consist of first when you start MBAIT now you have defined it you want to enter that program in your database you cannot unless a student is enrolled in that program why? because the student ID that is the primary key of this table until you have a student ID means until you have not got a student that is enrolled in this course you cannot enter the data into this table so this is an insertion anomaly that until a student is not enrolled in this course until then you cannot enter that program's data into this table this is insertion anomaly look at this as we have discussed in the second normal form there is an updation anomaly first we have a program we have 100 students enrolled let us say we say BCS in BCS we have got the program credits are 132 the BCS consist of two credit hours due to any reason you have changed the program's credits first you have made it 134 or 130 this change has to be reflected in all the records in which the students have enrolled in the BCS program if you do it right you have to do it repeatedly many times and if you cannot do it right due to any reason the BCS program in the same table those two program credits will be represented one record is of 132 credits and some of your records it consists of 130 or 134 credits this leads to an updation anomaly same is the case with the deletion anomaly let us say there is a course where just one student is enrolled or few students are enrolled if we delete one student's record along with this program with how many credits this data will be lost the data about your program information about the program how many credits are there they are related with the student enrollment and when you cancel a student's enrollment they will be unnecessarily deleted so these are the 4 types of anomalies which we said if a table is not normalized then it can happen which we have studied and discussed and mind it anomalies types are the same but at that place when we discussed in the second normal form that was due to the partial functional dependency and here it is due to transitive dependency what was the transform then we removed the partial dependencies and now when we transform it in the third normal form then we will remove the transitive dependency let us see we have shown you the table and you can see this is our original table and it has two FDs again based on the FDs we will decompose our original table how? that you have another FD so the determinant and the dependent we will remove all of them from the original table and we will create a new table as in this case we have made an electable table in which we have added program name and program credits the primary key is program name and at the same time we kept the program name as the foreign key in the student table that will establish a link between the student table and the program table now what will happen is whenever you want to know the program credits for a particular student you will get the program name from the student table and then you will refer the program table and from there you will know the credits of that program so this is how you will work how you will use it how will you get rid of all those anomalies which you have already read from those anomalies your profit with this design with this new design in which you have decomposed one table in two tables how you will get that profit the screen in front of you you can see the first thing you have there is no redundancy how is that? how many credits of a program we have stored just once in the program table now first thing if we have a program we have a sacro student so every student that has stored just once but they are available to all of them in the program table so your redundancy will be over second thing your insertion anomaly if you want to introduce a new program in which no student has yet been enrolled simply put the data of that program in the program table give it a program name it will come to your program table no student has enrolled when they are there the data will come to the student table and the link will be with this program and secondly if your program name is in the student table obviously you cannot put a student with any such program that is not there in the program table because they are linked through the referential integrity constraint so this program has been solved another thing that we have mentioned is the addition anomaly for example if you want to change several students are enrolled but you will make that change in just one record and that change will be applied to all those students who are enrolled in that program this way you got rid of the the addition anomaly as well and similarly we have said that our addition anomaly is that if a student was enrolled in one program or you say that in one program there was only one student enrolled so in the previous case we had the data of the program which was information about the program but in this case the data of the program that is there in the program table so now you have that data in the same way no difference so similarly you saw that when you decomposed a table and made it more efficient there were 4 anomalies how did you get the result from those 4 I hope that this whole process has been cleared let's move on what we have decomposed are the relations in the third normal form and all anomalies are not present in them Dear students now we are going to read Boy Scout Normal Form and we briefly call it BCNF BCNF we also call it the fourth normal form but it cannot be called the fourth normal form it is a general form of the third normal form and it has a lot of similarities and some differences so now let's see what are the similarities and differences first of all let's see what is the definition of BCNF every relation in BCNF is in 3NF vice versa is not always true see BCNF and third normal form both are almost the same I can say like for more than 90% cases when you transformed the table in third normal form then it is also in BCNF but if we look at the table in BCNF that is 100% sure that it is in third normal form again what is in BCNF that it is in third normal form in third normal form in most of the cases it will be in BCNF more than 90% cases it will be in BCNF there are very few very specialized situations in which the table is in third normal form so you should identify those situations and you should take special care to make sure that it is not in BCNF one more thing the third normal form it is checked in steps there is a certain order that it will be in first normal form then you will get it in second normal form and then you will get it in third normal form but in BCNF you will see its definition BCNF you can do directly how? in second normal form there is a prerequisite that it should be in first normal form in third normal form there should be two prerequisites indirectly that it should be in second normal form in third normal form you will check straight away without any prerequisite in BCNF it is in third normal form so obviously if you are familiar with it if you are friendly or convenient with it then instead of doing it in first or second instead of doing it in third you can directly do the BCNF when you do it in BCNF obviously it is in third normal form a table is in BCNF if every determinant is in a candidate key then it is your normalization based on the FDs you will see the table you will see the FDs if all the FDs are given all the determinants if all the candidate keys are there then this means that the table is in BCNF and if it is in BCNF then it is certain that it is in third normal form when it is in third normal form obviously it is in second and if it is in second then it is in first so this means that when you want to see the BCNF directly simply see that the table is in FDs all the determinants are in candidate key fine this means that it is not in BCNF you transform it into BCNF and when you transform it into BCNF then it is automatically in third normal form obviously you are familiar with in third normal form it is generally in BCNF it is certain that it is in third normal form this means that you should especially know about those situations that when you have such a possibility that the table is in third normal form but it is not in BCNF see when you have only one candidate key and that is your primary key then it is certain that it is in third normal form so it is in BCNF when you have multiple candidate keys but all those multiple candidate keys are either used on single attribute again that is multiple candidate keys two or three they are all used on one attribute so this means that it is also the same if it is in third normal form then you can say that it is in BCNF in the same way you have multiple candidate keys and they are composite they are also used on multiple attributes but they are totally disjoint they do not have any common attribute in this case you can say for sure that if it is in third normal form then it is also in BCNF in which situation you should be aware in which situation you should be aware that is when you have multiple composite keys one thing multiple composite candidate keys you have multiple candidate keys and they are composite attributes and the third thing is that they are overlapping overlapping means that there is one or more attributes that are common between these different candidate keys this is the situation when the table is in the third normal form but it is not in the BCNF in that case you have to specifically check whether the table is in BCNF if you want to do it so now this situation is written here that a non key term is a part of the composite primary key here is an example of faculty in this the primary key is faculty ID or department considering that a faculty is in different departments that different ranks can be data hide so faculty ID and department are the primary key but the second FD it is given that office determines department now this is a table that is in the third normal form but not in the BCNF how? that your second FD that due to the inference rules of pseudo-transitivity is generating a new candidate because your office determines department according to the second FD so if you can place the office in the first FD the reference is pseudo-transitivity that means you have got another candidate and what is that? faculty ID and office they are determining department, rank and data hide so what is this situation? multiple candidate keys one thing component key third thing is overlapping this is the situation that reflects that particular situation and there is another thing that your non key attribute is determining a part of the primary key this is not the violation of the third normal form because in third normal form the sign of transitive dependency is that non key determines the non key so we will break this here we will break the faculty faculty ID and office rank and data hide and we will make a table and we will make a table of office and department will come both tables are in BCNF and in BCNF in third normal form dear students there are normal forms and normally what is recommended is that if you do it in third normal form or in BCNF then it is sufficient to do it in 4th and 5th normal form we don't need this here briefly we have written about 4th or 5th or 6th normal form interested students can see the literature it is written in books especially in the record book for you the third normal form or BCNF is enough for an efficient design in today's lecture we have completed the discussion on the normalization process we have seen how those steps which we will apply individually on every table and we will check whether it is in first normal form or second or third or we can take another route that we can directly check the BCNF which we have done in third normal form obviously there is a very minor difference between the BCNF and third normal form and I will tell you the situations in which this can happen that your table is in third normal form but it is not in BCNF because you know that there is no other round which is in BCNF it is certain that it will be in third normal form in today's lecture we have completely discussed the normalization we will get the environment we will give you its practice and you can practice it yourself in the next lecture we will do a complete example of it I will wish you more Allah Hafiz