 So, the solution if you insist on dependency preservation for efficiency is to use third normal form which allows some redundancy. But, all functional dependencies can be checked well to be specific not all three NF relations are decompose or dependency preserving, but we can get for any schema any set of functional dependencies we can get a three NF decomposition which is dependency that is the main point that is always a lossless joint dependency preserving decomposition into three NF. So, just to remind you what is the difference between BCNF and three NF the difference is each attribute A in beta minus alpha is contained in a candidate key for R that is the extra condition. So, if you here if I take this condition J k goes to L it is a super key no problem if I take L goes to K it is not a super key it violates BCNF, but hey there is a third condition for three NF which says every attribute on the right hand side is contained in A candidate key there is at least one candidate key it turns out K is the only attribute on the right hand side and there is a candidate key J K which contains K. So, K is part of some minimal key and therefore, this schema J K L does not violate three NF if you apply the first test it passes directly if you apply the second test the third clause the escape clause that the attributes on the right hand side are part of a candidate key says that this one is it is in three NF. Therefore, we will not decompose it and we will let it stay access to and that ensures dependency preservation in this case. So, that was the idea of three NF and now let us see a realistic example corresponding to the artificial example we had. Let us say that we have a relation called department advisor which basically says that look a student may be taking courses in multiple departments may be they are taking multiple degrees multiple departments. So, there is a notion of a dual degree in some universities where you get let us say BE in two different branches or BE and an MSC or whatever combinations are there. So, in each department you may have an advisor and let us say there is a department advisor student ID department name and instructor ID. Now, may be the constraint is that in a department a student can have at most one advisor. Therefore, student ID department name functionally determines instructor ID that is one constraint. Furthermore, let us say that instructors are in only one department you do not want instructors to belong to two departments. Therefore, instructor ID determines department name this is exactly the same structure as what we saw before j k goes to L, L goes to K that is identical to what we saw. So, for the same reason if you take this particular one and decompose it you will not preserve dependencies if you keep it as is dependencies of preserve. In fact, it is kind of natural to say if I want to track you know who is the instructor advisor for a student in a department it is very natural to keep this relation. But of course, it is redundant because instructor ID functionally determines department name. So, there is some apparent redundancy here. So, if I decompose I will land up with two relations one is instructor department the other is student instructor which is fine I will have a student with two instructors. But to ensure that a student has only one advisor in a department I have to join it back. So, I join the student instructor relation with the instructor department relation on this joint result I have to make sure that for each student each department that is at most one advisor if it is violated then the functional dependency is violated and this relation as is is in 3NF. So, this again shows an example of redundancy in 3NF. So, what we have here is this as I said if you look at the L goes to K J K goes to L L goes to K. So, if I have a relation here L goes to K if L 1 appears 3 times K must be the same each of those times that is the redundancy. And there is one more issue if you do not decompose if I want the relation of L to K if I want to store it here I will end up with a null value what does that mean what that means is if I have that instructor is associate with the department if I do not store it separately what do I get if I have to use this relation to store which department instructor is in. But perhaps that instructor does not advise any student then the only way to represent it is to add a null here and. So, I will store L 2 K 2 and J which is the student is null. So, that is a drawback also if I did not. So, I did not decompose this because it is in 3NF, but on the other hand because I did not decompose it I am forced to use null values to store the instructor department mapping. So, that is the redundancy that is a drawback of 3NF. So, it has benefits has drawbacks. Now, how do you test if a particular schema satisfies 3NF it turns out testing it itself expensive it is NP hard well that is not a big deal by itself anything you do with functional dependency very often turns out to be NP hard exponential number of combinations. But it turns out you can ensure 3NF in one of two ways one is you take a relation check dependencies and decompose decompose repeatedly till you achieve 3NF. But there is no guarantee if you do this that the resultant decomposition is dependency preserving and the whole point of 3NF was to ensure dependency preservation. So, this is a bad idea I have seen a lot of students think that to do 3NF you simply start with a relation and start decomposing and that is the right way to achieve 3NF that is a useless way to achieve 3NF in general because you may not achieve dependency preservation. What we want is a dependency preserving decomposition and there is an algorithm for this which we are going to see coming up and it is based on a notion of canonical cover and what is interesting is this whole algorithm runs in polynomial time even though even just checking if something is in 3NF is NP hard. This guarantees the result is in 3NF but takes only polynomial time. So, now to understand what is a canonical cover basically as we saw given a set of functional dependencies you can infer others. Now, I start with f I infer some more I add it to f it is not minimal if I remove those the closure is still the same. So, canonical cover basically is based on idea of somehow minimizing unnecessary functional dependencies. One way of minimizing is to take a functional dependency and see if the remaining ones imply this then throw it out completely because it is redundant it is implied by the other ones but it turns out you need to do more because you may have a dependency which you cannot throw it as a whole but a part of it can be thrown out in effect and this idea of throwing out a part of a functional dependency is made explicit by the notion of extraneous attributes. So, as an example here given a b a goes to b b goes to c a goes to c I can throw out a c no problem but here is a more complex one a goes to b b goes to c a goes to c d can I throw out a goes to c d I cannot because d is not anywhere here I cannot infer a goes to d but a goes to c d can actually be broken up into a goes to c a goes to d. So, I can break it up and then I can throw out a goes to c that is one thing I can do but it is actually trickier than that here is another one which is a goes to b b goes to c a c goes to d now a c goes to d cannot be broken up anymore. So, I may say well tough nothing is redundant here I cannot delete anything but actually I can and the idea is a c goes to d if I remove c it is actually ok what I am doing is when I remove c from here and replace it by a goes to d I am doing something quite the opposite of what I did before earlier I threw out a functional dependency here I am taking a functional dependency a c goes to d and replacing it by something which is stronger a goes to d is making a stronger statement then a c goes to d there are cases where a goes to d may be violated but a c goes to d may be satisfied. So, this is stronger but if a goes to d certainly a c will go to d. So, a c goes to d is weaker. So, what I am doing here is replacing a weaker one by a stronger one now I cannot do this arbitrarily I may actually change the closure. So, I can do this only if the closure of the 2 is the same. So, what I have done is I have simplified it by removing attributes in this case it is making it stronger, but in this case it is actually ok because a goes to d was actually implied by the remaining dependencies by this dependency. So, from here I can go get a goes to c therefore given a what is the closure using this the closure includes a b c and then using a c to d includes d the closure of a is includes a b c d as a result a goes to d is implied by these. So, I can actually replace a weaker one by a stronger one, but that reduces the number of attributes. So, the idea is I want to minimize the number of attributes in the functional dependencies. So, the canonical cover is a minimal set of functional dependencies having no redundant dependencies or redundant parts of the dependencies ok. So, to continue we have 2 parts to the definition of extrinity an attribute a is extraneous in alpha given a functional dependency alpha goes to beta in f a is extraneous in alpha that is on the left hand side. So, if f logically implies I remove alpha goes to beta and replace it by alpha minus a goes to beta. So, I am taking something weaker replacing it by something stronger I can do this if this thing is actually implied by f. So, it is not really stronger it appears stronger, but anyway I can infer it using the entire set of dependencies f ok. So, that is the definition of extraneous on the left hand side of a functional dependency on the right hand side extrinity is symmetric. So, what I will say is a is extraneous in beta that is the right hand side if f minus alpha minus beta. So, what I am doing is I am replacing something stronger by something weaker this is ok if the weakened set. So, what is the weakened set the weakened one is I remove alpha goes to beta from f and add back alpha goes to beta minus a this is the weakened set. If the weakened set logically implies the original set f then I have not actually weakened it is still as strong. So, what I am doing is in either cases one set is obviously weaker than the other if I can show the weaker one implies the stronger one then I am not actually weakened it is equivalent and therefore, I can drop that attribute that is the idea. So, here is a coming back to our example a goes to c a b goes to c is b extraneous in this one. Yes, because this original set logically implies the result of dropping b which is a goes to c a goes to c can easily be derived from it is already there in fact. So, I can drop b from here now the other one next example is a goes to c and a b goes to c b those are the functional dependencies. Now, I want to see if c is extraneous in the right hand side here. So, supposing I drop c from here the weakened one is a goes to c a b goes to d now from that can I infer a b goes to c yes trivially because I already have a goes to c on this side. So, from a goes to c I can infer a b goes to c. So, I can infer back a b goes to c d. So, c is extraneous here. So, that is the idea. So now, how do I do this test I can do it each time by computing f plus and so forth that is little painful. So, again using attribute closure that is a simpler test. So, what I am going to do is I take any particular dependency alpha goes to beta to check if a particular attribute a and alpha is extraneous what I will do is I will compute alpha minus a remove a from alpha compute its closure using the original given set of dependencies f and check if this plus contains beta. That means the original set of dependencies already implied the stronger condition that alpha minus a functionally determines beta in which case a can be deleted and you retain you delete a from this and replace it by alpha minus a goes to beta. Now, for the other side it is similar, but slightly different now I am removing from the right hand side. So, what I am left hand side is not touched. So, I am going to compute alpha plus, but not using the original dependencies instead I am going to use the weekend dependencies which is I am going to remove alpha goes to beta from f at back alpha goes to beta minus a and then I compute alpha plus using this weekend set and if this alpha plus contains a then I know I can anyway derive a from the remaining ones. So, a is extraneous and beta. So, this is actually a very simple test. In fact, it takes polynomial time attribute closure is polynomial, polynomial number of attributes to be tested. So, I can very quickly apply this test blindly and keep deleting attributes from functional dependencies and quickly compute the canonical closure. So, your lab exercises today include several examples of computing the canonical closure at least one example. So, here is another example of doing it this is the same one as before, but this time I am going to use the alpha plus computation instead of trying to infer functional dependencies in general. So, can I delete a from a c I cannot can I delete c from a c I cannot can I delete b from a b goes to c well I will use this test. So, if I delete b I have a left now I have to compute a plus using the original dependency that is this one. So, using that I can see easily that a plus contains c which is the right hand side here. Therefore, b is redundant it is extraneous I will delete b what I am left with is a c a c. So, that was one now for the other side the other example last time we found that c is extraneous now let us do the same thing with attribute closure what do I do I am going to remove a b goes to c d and add back a b goes to d. So, the dependencies I have a c a goes to c and a b goes to d these are the two ones now the left hand side is a b I compute a b plus a b plus on this is what a goes to c. Therefore, a b plus includes a b c now a b goes to d therefore, it also includes d. So, a b plus is a b c d and a b plus now certainly consider contains the attribute which I just deleted which was c. So, this test succeeds therefore, c is extraneous I can remove c. So, what do I get I get a goes to c and a b goes to d that is the I cannot delete anything more that is the canonical cover for this one. So, there is one more step in canonical cover I remove all extraneous attributes and then the last step is each left hand side of a functional dependency in f c is unique this is kind of trivial what I do is after removing all extraneous attributes if any two dependencies have the same left hand side I merge them that is the left hand side is the same the right hand side is a union. So, that is a very simple union step. So, to compute the union canonical cover I remove extraneous attributes apply union and keep repeating this until nothing can be deleted and nothing can be union that is it. So, there is another example of canonical cover. So, we are kind of running out of time, but let me go over this very quickly. So, first of all given this set of dependencies a goes to b c b goes to c a goes to b a b goes to c can I apply union rule yes the first and the third have the same LHS a I union them to get a goes to b c. So, this is what I have now now can I find anything extraneous I am going to test in this one a b goes to c I am going to check if a is extraneous. So, if I how do I check it well I have to see if the stronger one b goes to c can be inferred, but in fact it is already there. So, it is kind of trivial to delete a from here and then b goes to c is duplicate. So, I am going to get a goes to b c b goes to c. Now, what else can I do on this is anything extraneous here let us see if b is extraneous in this one or c is extraneous actually b is not let us check if c is extraneous on this one a goes to b c. So, how do I do that I will delete it and see if a goes to c is logically implied by the remaining ones. So, if I delete c from here and compute a plus I will get a b and then from there c. So, I can infer c back. So, c is extraneous. So, what I get here is a b and b c that is the canonical cover. So, what is the point of all this the canonical cover is minimal it has unique left hand sides and its closure is exactly the same as the original relation original functional dependencies closure. So, now, which a once I have achieved the canonical cover the trick is I am going to use the canonical cover to construct a set of relations canonical cover can be computed very efficiently polynomial time once I have it I will construct a three n of decomposition. So, how do I construct that decomposition I am not going to start and decompose one at a time I am actually going to I took a initial set of functional dependencies on the initial schema R assume it is just one relation and using those dependencies I am just going to construct a new set of relations. So, I will construct the canonical cover now for each functional dependency in the canonical cover by the way f sub c denotes the canonical cover. So, for each functional dependency in it I will check if the attributes in that one alpha beta are not already contain in one of the existing schemas and if they are not then I will add it to the schema. So, I will add I 1 to I and make R I equal to alpha beta. So, in other words for each functional dependency in the canonical cover I am creating a corresponding relation one extra step is I may have two functional dependencies one of which is completely contained in another. So, here I will not bother creating a separate relation for that I will throw it out that is the idea. So, that is the redundant relation removal, but here is one more thing it is not enough to construct a relation per functional dependency. As an example let us say I have a schema a b c d and d does not participate in any functional dependency whatsoever. So, in the canonical cover d will not appear at all, but d is there in the schema what do I do about it that is one possible situation. So, in general what I will do is make sure that one of these schemas at least contains a candidate key for R. How do I ensure that it contains a candidate key simple I for each of the schemas I will do a closure and check if all attributes are contain in the closure of the attributes of that schema. For at least one schema the closure should include all attributes if not I have a problem what do I do then I will pick any candidate key for R and add that candidate key itself as a new relation that is one extra relation. So, once I have done that removed relation which are contain in others I am done. So, that is the 3NF decomposition this is also called 3NF synthesis algorithm because if you notice I just ignored whatever schemas were given and just took the dependency and came up with a schema from the dependencies. So, what does this algorithm ensure we will not prove it formally here, but it can be shown that each R i is in 3NF and furthermore the decomposition is dependency preserving and lossless which is what we want to this is why we went to 3NF from BCNF. So, here is an example of BCNF sorry 3NF synthesis using a new schema called CUST banker branch which is kind of like the department advisor slightly different. So, here I have customer ID employee ID branch name and type type is one extra attribute and what we are going to say is that a customer will have an advisor or whatever in each branch that they are associated with. So, if I have accounts in two different branches one of the employees of each branch is assigned as my personal banker let us say who will keep track of things connected to me. So, now what are the functional dependencies in this schema. So, first of all given particular employee ID each employee is in only one branch. So, employee ID goes to branch name then customer ID branch name function determines employee ID that is given a customer in a particular branch they can have only one associated employee. And finally, I am also going to say that customer ID employee ID determines branch name and type how did this come about this kind of trivial because employee ID determines branch name and type is something may be you can decide whatever you want it to be, but let us say that a particular customer ID employee pair that advisor may you know maybe there is some extra field which says that this person is this type of advisor financial advisor some other advisor, but you can only have one advisor in a branch that is a little artificial, but let us assume that. So, now this is the set of dependencies let us see if anything is extraneous. Now, you should have noticed that branch name here we know we can kind of infer it is extraneous because we already know employee ID goes to branch name. So, why include customer ID also here you can do this more formally using the procedure we have, but it is easy to show that branch name is redundant it is extraneous we are going to delete it and in fact this is what we land up with this schema nothing more is extraneous I will skip the details for lack of time and then generate three relations corresponding to these three functional dependencies again I am going very fast do not worry if you are not keeping up please read it afterwards. So, what we need to do is check if at least one contains a candidate key so we do the closure which one of these has a closure this one customer ID employee ID type its closure includes branch name therefore, it contains a super key it contains a super key it will also contain a candidate key. So, we are done and then finally delete any redundant schema. So, if you take the second and the third relation you can see that this is contained in this therefore, we will drop the second relation. So, the final three NF has two relations and one of them has a candidate key. So, that is it that is the three NF decomposition. So, to summarize we would like to get a dependency preserving lossless join decomposition to BCNF we cannot always do that. So, then we have a choice thick with BCNF or go with 3 NF practically speaking small note here SQL does not actually support the ability to enforce functional dependencies. So, it is fine to say dependency preserving, but when you go into the real world with SQL you cannot enforce the dependency the only dependencies you can enforce are key value dependency you cannot enforce anything else. Therefore, to say that you know I have a dependency preserving decomposition where one of the dependencies is not a primary key dependency is useless it cannot be enforced in SQL unless you put in triggers or whatever. Therefore, the moral in the real world may be that dependency preservation is not that important it is may be better anyway you cannot enforce it without doing more work. So, in many situations may be it is fine to just go with BCNF and forget about dependency preservation on the other hand sometimes you do want to store you know like the student department advisor it is very natural to create a table to say this student this department this is the advisor even though we know that this table is not in BCNF we may still keep it because it makes sense to store it even though it is redundant. So, that is intuitive it is not a theoretical reason it is an intuitive reason why sometimes you may prefer to go with 3 NF. So, that is this slide summarizes what I just told you and to wrap up for now we have a notion of multi-valued dependencies and to understand that here is an example I have a instructor who has children one or more children and I am recording those children I have a instructor who has phone numbers so there are potentially multiple phone numbers. Now, supposing I take these two relations which I am given and I do a join and for some reason I decide this is my schema not the original relations the combined relation is my schema. So, the combined relation is ID, child name and phone number on this relation are there any functional dependencies does ID function determine child name it does not instructor can have multiple children may be in China with the one child policy in force may be it does, but even in China there are multiple children certainly not in India or in most other places. So, ID is not going to determine child name is ID going to determine phone number no I mean most of us these days I have multiple phones a mobile phone landline at home phone in the office so it does not. So, the child name determine phone number or phone number determine child or anything of course not in fact there is no functional dependency on this relation if there is no functional dependency is it in BCNF yes trivially let me rephrase that there are functional dependency, but they are all trivial there is no non-trivial functional dependency on this relation and the trivial ones do not matter for BCNF. So, what we have just shown is this combined relation is in BCNF, but is it not redundant is that fine now you will notice that there is a lot of redundancy in this relation it is very clear that you know for the same instructor I am storing the children twice and the phone numbers twice that is pretty idiotic. So, intuitively it is clear there is redundancy here, but if I just apply BCNF it says everything is ok. So, moral of the story is BCNF may not be the end of the story there is more to life than just BCNF and this particular example shows what is called a multivalued dependency intuitively even though phone number is not uniquely determined by ID the phone numbers for an ID are completely orthogonal to the other other attributes in this relation. So, the phone numbers for an ID are completely independent of the child name. In fact, conversely the children of an employee are completely independent of the phone numbers. So, even though ID does not functionally determine phone number it multivalued determines phone number. So, this is a new concept of multivalued depending not uniquely determining it, but multivalued determine it. So, that results in the notion of a multivalued dependency and we will write it in our notation as saying ID double arrow phone number that is saying we say multivalued determines phone number. In other words there may be many phone numbers with an ID, but the relationship of the two is independent of anything else in that relation. So, once we have written these MVDs multivalued dependencies there is actually a whole theory with multivalued dependency we do not have time for that most courses probably would not have time to really get deep into it, but at least intuitively if you recognize a multivalued dependency like this ID goes to phone number it is very intuitive. And what you do about it you can decompose just like we decomposed using functional dependencies we can take this multivalued dependency ID goes to phone number and decompose. So, in our previous example if somehow you started with this schema and you realize that ID multivalued determines phone number we decompose what is the decomposition ID phone number ID child name exactly like BCNF decomposition. The question is is it lossless join it was not a functional dependency it was a multivalued dependency, but is the decomposition lossless join. And the answer we are not going to get into the details, but the answer is yes it would be lossless join and as a result it is ok to decompose. So, we use functional and multivalued dependencies and then decompose using both of these to get normal form which is called 4NF which are not shown here I have not got into the details, but practically speaking you most of the time even without getting into all the theoretical aspects of inferencing using multivalued dependencies you just identify the multivalued dependencies decompose and you are probably fine with that. So, I am going to stop here for the morning session. So, some of these questions were asked before the break. So, I am going to skip them. So, I think skip the only question which were already answered I will take maybe a few questions. Let us see if anybody has their hands raised. My question is regarding ER data. When we are trying to build your assignment like a tutorial at the time we are trying to model by using a terminalization shift that we it involves the entities like train, station and schedule as a terminalization shift. So, is it possible that in a terminalization shift one of the entities is a weak entity for both the entities. It is hard for me to answer that question without seeing the ER diagram and so on. So, what I suggest is since this question is not concerned with today's lecture please send that question by email you should have the email address. So, send it by email and I will answer it offline. If that question is of general interest maybe we can even discuss it during the lab session. So, please send it by email I am not going to answer that directly. Any other question? Please go ahead. What are the things I must consider when I am decomposing particular relation? The things you must consider. So, as we just discussed in great detail functional dependencies are the primary thing for decomposition. However, multivalued dependencies also do arise. So, occasionally you will need to use them also and we just went over the whole theory. Now, if you interpret your question as if there are alternatives for decomposition what do I do? How do I make a choice? So, we already saw that with BCNF we saw an example where there were two choices one of which was dependency preserving one was not. So, obviously you would choose the one which is dependency preserving. So, you may want to at any stage you may have multiple functional dependencies which show violation and if one of them ensures dependency preservation using one of them and the other one does not you would of course pick the one which preserves dependencies. So, that is one consideration. I cannot think of hand of others they probably are a few more. My question is that in normalization we have atomic values and in object oriented database we decompose the one relations and put some nested relations. The normalization is better and where we prefer the use of object oriented database over to you. Thank you that is a good question. The question was the whole normalization theory which we have been using is based on atomic values. On the other hand there exists object oriented and object relational systems which do not actually have atomic values because they allow sets and arrays and other such stuff. So, I guess there are two parts to the question. A is you may not have asked this explicitly, but implicitly is there a theory of normalization once you have non atomic domains and I believe this has been addressed, but you know we have not covered it here. It is a little complex, but if you are interested perhaps you will find some material, but none of the textbooks that I know actually cover it. So, can normalization theory be extended to deal with non atomic values? So, the other part of the question as I interpreted is if you do not have normalization theory then we have two alternatives stick to atomic values, use normalization theory or give up everything and use object oriented principles. So, how do you make this choice? So, now even if you use object relational object oriented systems where some of the values are non atomic, still most of the values are going to be atomic and you can certainly use functional dependency theory with those. Furthermore, it most of the cases which I have seen when people use a set valued thing in an object oriented system, it usually corresponds to a multi valued dependency in normalization theory. So, for example, if a person had a set of phone numbers instead of creating a new relation which maps id to phone number, they would create a multi valued attribute phone number, but this actually corresponds directly to a multi valued dependency from id to phone number. So, you can actually apply the theory pretty much unchanged using multi valued attributes. So, just because you are using an object oriented database doesn't mean you have to again start everything from scratch, you can actually use the theory we have covered so far and handle almost all of the issues without doing anything more specifically. There are a few more issues that is specialization generalization. Now, there if you recall when we created relations from specialization and generalization, we landed up in some situations where there was redundancy across tables. So, we had two tables each of which satisfies BCNF, but they repeat information. So, to remind you of that let's say there was a person table, then there was an employee and student table. Now, person had id name, employee had id name which is inherited plus salary, student had id name and credits. If I store the name redundantly in each of these tables, if somebody is both a student and an employee, the name gets repeated in the two tables. Does this mean BCNF is violated? If you look at individual tables, it's not violated, BCNF is not violated because id is a super key and there are no other functional dependencies on individual tables. So, if you start with this schema, it looks like there is no redundancy, but as I told you before just because BCNF is satisfied doesn't mean there is no redundancy. In this case, there would be redundancy and that's a trade off. So, we all had the alternative schema where I have a relation person id name, another relation employee id salary, a third relation student id credits. So, the employee and student relation don't store name and by not storing name they avoid redundancy. So, here is an example of redundancy which normalization theory would not have caught, but if you started with the ER model with specialization or you landed up at the same thing in an object oriented design with specialization, then you can catch the redundancy there and avoid it without having to deal with regular normalization theory. I hope that answered your question. Back to you for any follow up question. Thank you sir. So nice of you. Back to you. Okay, we have now half an hour behind. So, I think I will stop here and we will meet again at 4.15.