 Assalamu alaikum. Dear students, today we are in the lecture number 21 of the database management system course. The course code is CS403. In the previous lecture, we concluded our discussion on the normalization which is a very important stage in the database development process. Today we will start our lecture with a brief summary of the normalization process and then we will continue with our discussion. Let us start today's lecture. As you have been listening during last two lectures, normalization is a step-by-step process to make database design more efficient and more accurate. And now you have got in your mind how does it make more efficient and more accurate your database design because it reduces the chances of different form of anomalies that may occur in an unnormalized database design. It strongly recommended activity performed after the logical database design phase. Unnormalized relations are more prone to errors or inconsistencies. We have read that the database design that you do not have will have more chances of error while the normalized database design will have less chances of errors. Normalization is based on the FDs and from here you have different types of FDs and their characteristics and inference rules. FDs are not created, rather identified by the designer or analyst. As a designer, you do not create, you do not establish the FDs within an environment, rather you identify them. And how do you identify them? You identify them by the analysis of the existing system, by interviewing the users of the system, by saying the requirements of the of the users, by analyzing the requirements. So from there you identify them, you do not create them. They are already existing there. Unless there is a relatively less situation that when you are recommending them something or if they agree to do your recommendations and they apply some sort of changes as per your recommendations, then you can say that you have created an FD, but still that FD will be implemented in the system and then you will implement it in your database design. So it will be written in the mind that FDs you have to identify. And the example we will discuss today about normalization will make you aware of that. Normalization forms exist up to 6NF. However, for most of the situations, 3NF is sufficient. These 4, 5th or 6th form, these are more of a theoretical purposes. In real life applications, they are not so much used. So if you are very much clear up to 3rd normal form, or plus if you are clear about the BCNF, or if you bring your database design up to 3NF or BCNF, that would be sufficient. Your design will do the right thing. Perform through analysis or synthesis process. Let's not discuss this point. Right now I am going to present an example that will explain you the whole normalization process. This example is present in Ricardo's book and if you want to see it as a reference, you will get it there, you will get it in your slides and you will get a reference on the lecture notes. If you understand this, you will get an idea of how the normalization process is and what are the steps. If you look at this, you will see a table that you can see on the screen. This table's name is work. In this, you are looking at different attributes. For example, this has project name, project manager, employee ID, hours, employee name, budget, start date, salary, employee manager, employee department, or rating. This is a table in which all the attributes which were required in a system were being used in the system. They were being referred, they were named during the requirements analysis by the users. Mughalif users group said that we need this attribute, we need this information in our output. During that process, the designer, the analyst collected these attributes and formed this one relation. You can say that he created this list of attributes and he named it as a relation. After this, the different facts are given here. You can see that from here, you can say that how would you identify the FDs. Mind it. In the real systems, no user will tell you these are the FDs. You will not get that attribute name, arrow, or attribute name. These are the FDs that are applicable within our environment. No. In that, you will have to identify them. How? I have said that during the interviews and the questions that you distribute, the system that you are going to use, the requirements that you are working on, they explain to you. The requirements that you have, whether they are written in a paragraph form, whether they are in different points form, whether you are collecting them by interviewing, the tools that you have used, like the DFT, the cross after matrix, you have to refer all these things from where you have to identify the FDs. Even within the exam, when we talk about the exam, you would be lucky if you find FDs defined as such. Most of the time, students are not that lucky. In that, if you see the point of view of the exam, you will see that in a paragraph form, the environment, the rules, the business rules, will be given to you. Or in a different point of view, as we have this example, in a different point of view, you have given the different business rules. From there, you will have to establish the FDs. And then, on the basis of those FDs, you will normalize it. If you see the table that you have seen, in this form, this table is full of anomalies, all sorts of anomalies. The number of anomalies that you have read during normalization, whether they were due to partial dependency, whether they were due to the transitive dependency, all sorts of transitive dependency anomalies are present here. You can observe the data, but not. When you identify the FDs, you will be able to see that this table is full of anomalies. Let us see how we identify the FDs in this example. Here, you have different rules in this example. In the different business rules, it is written that each project has a unique name, but names of employees and managers are not unique. You are told that each project has a unique name. This means that the same project name will not duplicate. It is unique. When it does not duplicate, it is unique. This means that the project name can be used as identifier. Because we have studied in our discussion in the keys and FDs, the identifier or the keys has to be unique. Since the project name will always be unique and not duplicate, this means that we can use the project name as identifier. Another thing is that the name of employees and managers will not be unique. This means that we cannot use the employee's name or manager's name as the keys or identifiers. Each project has one manager whose name is stored in Proj Manager. We have said that each project has a project manager. This was told in this case. Another thing is that the name of the project manager will not be unique. The first condition is that the name of the project will be unique. When each project has a project manager and the name of the project is unique, this means that if we have got a project name, then we can determine the project manager associated with that project. This means that your FD is made like this. The FD that has been made is that the project name determines the project manager. I hope that if you link these two things, you will understand how do we establish this particular FD. The third business rule is that many employees may be assigned to work on each project. Many employees will work on one project. And an employee may be assigned to work on more than one project. In this basic sentence, the relationship between project and employee is being determined. As I told you earlier, the real-world system, the system that you are developing, no one will tell you in this form that there is a man-to-man relationship between the project and employee. They will tell you this way. As the sentence is written here, many employees can work on one project and one employee can work on most of the projects. From where you will apply your own thinking. You will describe this when in your own terminology, when in the language of the ER data model, in that case, you will say that there is a man-to-man relationship between project and employee. You will see how we identify the relationship and their cardinality. It is written here that hours tells the number of hours per week that a particular employee is assigned to work on a particular project. You can see here that hours means the number of hours that an employee works. The question is, does the attribute of hours attach to the employee because the employee works in hours or we assign it to the project because a certain number of hours is word on a project. This is what we think. I told you that to determine and mention we need the reference of both the things both the participant and the types. The attribute that will not be associated with the participating entity types, that will be an attribute of the relationship. Here it is said that hours stores the number of hours that an employee works on a particular project. This means that to know the hours, we will need to tell both the things on which employee has worked on which project. From here it will be noted that it has worked for so many hours. This is a part of the relationship and to determine it, we will need both. This means that the FD we will create here will be employee ID, the project ID they determine the hours. If we properly mention this we will have project name because project name is the identifier is the key of the project which will be our table. project name, employee ID they determine the hours. We need both to know the hours. How did you identify an FD? The next point is that budget stores the amount budgeted for a project and start it gives the starting date for a project. Again these two attributes they are associated with the project. We have already established that project name is going to be the identifier and the key. This means that the determinant is the project name. The project name will determine the start date and the budget. Again the project name is unique for every project. Identifier. Every project has a manager because the project name is unique so if you know the project name you will get the project manager. Project name determines the project manager. The fourth rule is that we have a budget and a start date for every project. Since we have a unique project name the project name will determine both of them. The FD we have made is project name determines project manager, budget and start date. The fifth rule is that salary gives the annual salary of an employee. See there was that hours. An employee works for hours on a project. The salary is the annual salary of an employee. Again this means if we want to know the salary of an employee we simply need to know the ID of that employee. If we want to know the ID because the ID of the employee is attached to the ID we generally say that is a unique thing because it is used as an ID so the ID becomes a part of the ID. If we have another identifier like the project name then you cannot assume on your own that it should be unique. Until you have a specific story where it is said that the project name will be unique then what are the names that you cannot automatically assume that it will be unique. But about the ID like we have been using department ID, student ID, course ID, program ID these are the things that we assume that it will be unique. Here it is said that salary gives the annual salary of an employee. This is an attribute of the employee and we are using it implicitly as an identifier that is an attribute of employee ID. We will calculate this as employee ID determines salary. Employee manager gives the name of the employee's manager who is not the same as the project manager. Here we have used the project name FD we have already included an attribute project manager. Now here in this particular rule we are talking about the employee manager. It is the same thing that every employee has an employee manager. How will this FD be? FD will be like this because employee ID is unique and every employee has a manager which means employee ID determines employee manager. Employee department gives the employee's department department names are unique. The employee's manager is the manager of the employee's department. Now from here you can see that the employee department is unique when the department name is unique which means that we can use it as an identifier. We can use it. The employee department means that the department where the employee works is the same thing because we have an employee ID unique so if we know the employee ID then we can determine the employee department. This means that the employee department attribute will also be part of this FD. One more thing that you are going to add is that department names are unique. The attribute is employee department. The employee department works in this department and the employee manager is the manager of the department. The manager of that department works in this department. This means that if we know the employee department we can determine employee manager because the employee manager works in this department. So if we know the employee department which is unique then we can determine the employee manager. This is another FD of that department. The eighth point is that rating gives the employee's rating for a particular project. You will come to your mind that rating is such an attribute that to know two things. One is project name and one is employee ID. What does this mean? What is the rating on this project? Obviously, if you try to associate with the employee then you cannot. The rating of the employee is multiple with the project. You cannot rate only the employee's name but you have to specify the project as well. What is the rating attribute with the project? Because many employees work on one project. Obviously, you cannot determine the rating by just knowing the project name. We will say that for that we need to know both the employee ID and the project name. What will this be? Employer ID, project name determine rating. You have established these FDs regarding the business. You are coming to your mind. The first is project name determines the project manager. How did we identify this? We identified this from Rule 2. Project name also determines the budget. We identified this from Rule 4. Start date from Rule 4. The second FD is employee ID determines employee name. This is such a thing that it is implicit. It is possible that it has not been mentioned explicitly but this is something that you can assume. Employer ID determines employee name. You have a list of attributes in the table. You have employee ID and employee name. You can assume from this and this is not a big assumption. This would be a valid assumption. What is the project name? You have assumed salary. This is given to you in the fifth when the salary determines employee annual salary. Employee manager determines from sixth and employee department can reduce this from sixth and seventh. After that project name and employee ID determines hours and rating. You also get another FD from seventh and that is Employee department determines employee manager. Dear students, from these 8 business rules you have seen how you have created different FDs and when you have different inference rules you have merged them like Union's and when you have combined them then you finally end up with these 4 FDs. Now you have FDs and the main relation you have is given to you. One question is created here that now we enter into the normalization process. We have already entered and now we try to normalize them properly. In this you have a relation where all the attributes are combined. On number 2 you also have FDs. Now we start normalization. First of all we have this in the first normal form that you can see the data in front of you second is that you have attributes or you have established that none of the attribute is multiple attribute. This means that your table is in the first normal form. Now we move to the second normal form. And you will remember the definition of the second normal form that table has to be in the first normal form and secondly all non key attributes should be fully dependent on the key. The first thing is in this table what is the key? Now if you look at FDs then you do not have any FD based on which you can establish that the determinant of this is determining all attributes. This means that no attribute or FD is establishing your primary key. So if you pay attention then some attributes are determining the way you have the second FD. Some attributes are determining the project name when you have the first FD. In the third FD some attributes are determining both attributes together. And as far as the fourth FD is the determinant and the dependent both are included in the top FDs in the dependents. There is no role in the determinant of the FD. So if you establish the employee ID and project name then you can determine all attributes. How? If you have both employee ID and project name then you can determine the project name from the project name from the project name from the salary, employee manager employee department from the employee ID from the hours and ratings from both. What is apparent from this is that our key in this relation is a composite key which is made with the employee ID and project name. So this is our key. Now if we look at the normalization process again then we have the first normal form. The key requirement is that all non key attributes should be fully function dependent on the key. Key has been ratified. You have immediately noticed that your table is not in the second normal form. There are so many attributes that are partially dependent on the key. For example, your first FD in which you have those attributes which are dependent on the employee manager in this key. Similarly, all those attributes have a full FD with some attributes dependent on the employee ID. This means that those attributes are partially dependent on the key. What are the employee ID and project name? The hours and ratings are two such attributes that are fully function dependent on the key. So the first decomposition rule we will decompose this table. With the help of the first FD we will take all the attributes that are in the first FD and what are the project name, project manager, budget and start date. We create a new relation called the project. We include the four attributes in this question and underline the project name as the primary key. After that, the second FD which is based on the employee ID we take out all the attributes and create a table called Employee. In Employee, because of this FD the employee ID will be the primary key. We underline it. Like employee name, salary, employee manager and employee department we place them there. These are two relations. The original relation if you see there are two attributes which are hours and ratings. But in this you have project name and employee ID. These two will come with two references. How? The hours and ratings are determined by both. These two can stay there with that reference. The second thing is the work table which is reduced to establish the link with the employee table and with the project table. You will need the keys immediately because the primary key is the employee ID and the project table is the primary key project name. Together the work key will be the primary key and at the same time these two attributes of the primary key they are functioning as the foreign key as well. So they are playing the primary key and the other two tables are establishing their link. This means the work table we have is in the second number form. What are the work tables? Employee ID, project name, hours and ratings. Now we have three tables. If you look at these three tables these three tables are in the second number form. I hope you have understood. You can check these three tables individually whether they are in the second number form or not. Look at the first table which is your project table. It was in the second number form from the beginning and because the primary key consists of the single attribute there is no question of violation of the second number form requirement because it does not have the question of partial dependency. If you see the second table the employee table definitely second number form because the key is of single attribute. The third table which we have is the work table. The third table is FD and it is written that the hours attribute and the ratings are fully function dependent on the two attributes. This means that there is no partial dependency. In this way, the three tables that we have created from our one main table are in the second number form. Now we have to check the third number form. The third number form if you remember that the table has to be in the second number form and the second number form there exists no transitive dependency or no non key attribute which means another non key attribute. This is definition third number form. According to this if you see the first table which we have project table it is in the second number form and the second thing is there is no transitive dependency which means given FDs there is no FD in which the non key attribute of this table determines the non key attribute this table is in the third number form. Now this is on the employee table. If you see here the first requirement of the third number form second number form it is the second number form as we have seen but the fourth FD where we said employee department determines employee manager if you see here this table there are two attributes employee department and employee manager what is the problem if you keep the table in this form then we will face all those anomalies which we have discussed before if the table is not in the third number form so the rule of decomposition as we have applied before to bring the table in the second number form we will apply the same rule and we will decompose this table we will do the same FD in which the employee department determines employee manager we will remove the source from there and we create a fourth table name the department this table is made but to establish the link between the employee and the department so the department name we will keep it we will leave it in the employee table as a link as a foreign key so that the department table is linked to it of an employee so what we will do you will get the employee table the name of the department from there you will come on this table which table? the department table and from there you will see the name of the department so in this way the link will be established and all the anomalies of the third number form will end which are hours and ratings no one of them would do the same you do not have any FD this means there is no transitive dependency so if you see this is your table in the third number form so in this way finally the table that we got are these four tables now in these tables each one is in the third number form which means the entire design is in the third number form or you can say that the database design is in the third number form now if you want to check the BCNF of this design as well then again you will apply this rule the condition of the BCNF on each and every table now you have four FDs you keep those FDs in front of you and keep these four tables in front of you according to them if you see the first table then what is the requirement of the BCNF BCNF says every determinant is a candidate key so if you see the first table the project table so for the project the FD related to this which is the FD in which the project name, term is so in that the FD related to this the determinant of this relation is the candidate key this means that this is fulfilling the requirement of BCNF so your first table is in the BCNF let us move on if you see the employee related to the employee you have only one FD that the employee ID determines all these things so what happened that the employee ID is your determinant and the candidate key so this table is in BCNF now here you may be a little shocked that this employee table has an attribute employee department that is a determinant of in an FD but and that candidate key is not here but mind it if here the employee department is here but the employee manager is not here this means that this FD does not relate to this because FD will only relate to one table all of it or at least of its determinant and the dependent there should be something here in the dependent the employee manager is not present this means that this FD it is not touching, it is not violating it is not having any effect on the employee relation so the employee relation is also in BCNF now here is the third table which is your work table if you see related to the work table then the employee ID and project name both are determining now this is an FD which is related to the work but this is also a candidate key this means that your work is in BCNF and again the same thing if you see that your work table has two attributes which are your employee ID and project name these are two determinants the first two FDs these are the determinants but the thing is that the dependent of those FDs is not present in the work so the effect of those two FDs will not be on it they will not be considered related to this particular relation so the FD related to this is only this FD in which the project name and employee ID determine hours or ratings so this is the only FD which is the candidate key this means this relation is also in BCNF now see the work table which you made in the name of the department the related FD is the same in which the employee department determines the employee manager in this case it is only one FD and its determinant is also in BCNF so you saw from one table the business rules you identified the FDs and you transformed one table step by step in BCNF or third normal form here I will tell you that the root we adopted the main relation the work relation first normal form second normal form the final relation which we had BCNF also as a practice I will recommend that the main relation apply it directly BCNF means BCNF you try to transform it directly in BCNF instead of taking this root first, second, third so from that you will be able to practice how it works and the basic thing is that the FDs you identified will provide you to break a relation to decompose a relation and when you decompose it on the FDs base it is fulfilling that requirement it is satisfying that the normalization level comes into normalization form so this example which we have discussed how to carry out the normalization process and how to carry out that activity you will get assignment you will get practice further and in the final exam I will tell you that you should refer to different books and do as many examples of normalization as possible so that the concept is clear physical database design Dear students the steps of the database are that we have covered are the analysis phase the conceptual database design the physical database design the logical database design after logical database design we had implementation physical database design there is one thing which is not explicitly determined obviously it is a very important thing that is normalization and we discussed it in the last lecture and I hope that you will be clear now we are coming to the physical database design what is the objective of this the objective is that the logical database design that you had looking at the business rules looking at the requirements then you have normalized it a general idea was that your design should be more accurate more efficient and smart now you are going to that stage when you have to implement this design and mind it the logical database design that you have is independent of the DBMS in which you have to implement it the logical database design and its normalization that you did in the relational data model there is no mention of the DBMS there is no role in the DBMS in which you have to ultimately implement it now we have started to consider that which tool you have to use and our physical database design if specifically it is said that the creation process that you have created that is not included in it but the related decisions the related properties their role is here so your physical database design now you are moving towards the implementation of your logical database design rather your normalized logical database design now let us see basic goal is data processing efficiency keep this in mind that you will make different decisions different steps what is your goal the goal is that you ultimately implement when the application is started when it starts to be put into the database so what is your processing any activity related to the data whether you are storing it and mainly when you are retrieving it you are reading it some changes some deletion all the steps are called data processing so the objective here is that when you implement and process it you will get more efficient data the logical database design our objective there is to produce a more accurate smart design the anomalies we discussed they can encounter but here you will see that sometimes we ignore those considerations the logical database design consideration there we need the perfection theoretically what we call a good design here we validate it why? because in some places your organization says there is a slow response or the level of activity the data processing is so huge so much that if you go by the rule you get a slow response you validate the rules so you get better efficiency so here what I want to say the objective of the physical database design is that how you implement the ultimate data is that your efficiency should be that the data processing should be more efficient faster waiting time should be less one more thing that the recommendations and especially those recommendations the previous rules we read these are the options those are not the rules these are the options that means you have to adopt them or not that is totally up to you no one will tell you this is your decision that you have to adopt them or not so we will discuss this topic please don't get confused that we have read something else you are saying something else they are strongly recommended here it is that these are the options if you feel like that you have these requirements based on the discussion or on the performance requirement otherwise don't do it and not all of them and the options I will point them out that these are the options transforms the logical database design into technical specifications for storing and retrieving data now you are going into technical specification does not include practically implementing the design however tool specific decisions are involved in this there are some things that are involved that are based on the tool concept that is why at this time or at this stage you should have finalized the particular tool that you are going to use in the implementation of your design now what is the input as I said when you come to your conceptual database design what you have input are the things that you did during initial study like your DFD your cross-structural matrix the input to that process so from there you established the ER diagram when you came to logical database design what was the input it was your design now when you have reached normalization input is your logical database design and your FDs based on these two you produced normalized logical database design now when you have reached physical database design input is normalization definitions of each attribute the purpose of each attribute what is the objective description of data usage how much data you have how many tables you have how much data you have requirements for response time data security backup etc again how you will get it by interviewing the environment the users and what tool you want to use especially your DVMS you should have an idea what decisions are involved in logical database design choosing data types that you have to store different attributes because your relations consist on attributes now logical database design you have to name the attributes after that you have to attach data type or you have to specify domain grouping attribute which in our logical database design belongs to different relations sometimes you group them deciding file organizations practically you have to store them in which organization and selecting structures that you have to ultimately choose which structure so that you can make your data processing and finally preparing strategies for efficient access curie processing is included in this Dear students we will conclude in today's lecture a very important activity is that we have discussed the normalization process on one side it is clear how we take a table step by step first normal form, second normal form and third normal form and then we have seen how you check your tables but another very important point how do you get FDs how do you establish the FDs because I have told you that you will never get FDs on a fixed plate because the users of the environment they do not know what are the FDs they have no idea about the FDs they simply tell you the business rules from there you have to establish and identify yourself that there is an FD it is its determinant so you have to identify FDs and then you have seen how FDs provide base for normalization as I have told you you cannot perform normalization without the FDs because normalization is based on FDs in this way I have also recommended that the example we have is in the book of Catherine Ricardo take the same example and directly apply BCNF after all this discussion finally we touched the physical database design we have discussed the objective of it now we will discuss the stages of physical database design in the next lecture Allah Hafiz