 Now, here is another related notion which is closure of a set of functional dependencies. What do we mean by closure of a set of functional dependencies? Well, the basic idea is that if I say that a set of dependencies holds on a relation instance, it will always be satisfied. Then we can infer certain other dependencies will always also be satisfied. So, take this very simple case which I mentioned earlier. This is an abstraction. There I said instructor ID determines department, department determines building. So, A determines B, B determines C. From this we can infer A determines C because if any two tuples have the same instructor ID, they better have the same department and because given a department the building is unique, they better have the same building. So, logically we can infer that A also function determines C. So, this is an inference we just made. So, given these two we can infer this. Now, is this the only one in this case? Well, this is the only non-trivial one, but we can also include all the trivial ones. So, if the schema has just A, B, C what are some of the trivial dependencies? We can say A, B determines B, A, B determines A, A, C determines C, A, C determines A and so on and so forth. A, B, C determines A, B, A, B, C determines B, C and there are many, many of these trivial dependencies. So, in general the set of all functional dependencies logically implied by F is called the closure of F. As I said the closure includes all the trivial ones because they are always implied even if we did not give any initial set of dependencies. The trivial ones are always inferrable and therefore, they are logically implied by any set of dependencies and they also include the non-trivial ones which we can actually infer from this. So, how do we calculate this we will come to it later, but conceptually if whatever we can infer logically is part of the closure that is the definition of closure and of course, the closure is a super set of F. We are going to use the notation F superscript plus to denote the closure of the set of functional dependencies F. In other words all the dependencies that can be inferred from F which of course, includes F itself. How do we do closure and so forth we will come to it later on, but the closure plays a very important role in defining various normal forms because sometimes there are dependencies which are obvious, but then there are others which are not obvious, but which you can infer and those must also be taken into account when we check if something is in suitable normal form. So, there were several questions about Boyce-Cord normal form. So, here is the definition of Boyce-Cord normal form. So, the way we are going to do this is first we are going to present a series of definitions functional dependencies closure Boyce-Cord normal form third normal form and so forth. And then we are going to go back deeper into the process by which we do the inferencing the process by which we generate BCNF and 3NF decompositions of a given set of relations. So, we are going to do it in two parts. So, in the first part we are going to cover the terminology and the basic idea. So, coming to Boyce-Cord normal form if you look at the kind of problem which we saw earlier in in step. What happened is that there was a functional dependency from department to building and budget, but at the same time department was not a super key of the relation that is exactly why you were allowed to have two tuples with the same different sorry the same department name. They may differ in other attributes for the instructor part, but they may have the same department name and as a result the building and budget got duplicated. If you want to remove this redundancy we put the following requirement on a relation which avoids such redundancy. So, we say that a relation schema R is in BCNF with respect to a set F of functional dependencies. If for all functional dependencies in F plus the closure how do we do it? We will hold off on that, but if suppose we have computed F plus. Now, for any functional dependency alpha determines beta and F plus one of the following must hold where of course, alpha is a subset of R, beta is a subset of R. Now why do we talk about this? Because we may have taken a initial relation and decomposed it. So, there are many functional dependencies which cover attributes only from this relation. There are also other functional dependencies which include attributes which are not in one of the broken up relation. So, what we are saying here is we are going to consider functional dependencies where both the left and right side are part of R that is these are the ones which hold on this particular relation schema R. So, for all such dependencies either it is trivial or alpha is a super key for R. So, in our instance department example department name determined building and also budget. So, we had these this functional dependency by the way here is another example of inference. We know the department name determines building we know the department name determines budget. We can infer automatically that department name determines building and budget. Why? Because if you have two tuples with the same department name they will have the same building they will have the same budget as a result the pair building budget will also be the same. In fact, the other way also holds if you have two things with the same pair sorry if you have this functional dependency department name determines building and budget then the whole pair is uniquely determined. So, department name occurs twice both these must match. So, both these match obviously building and budget individually also match. So, this is another kind of inference we can do. So, but coming back in this case we had this dependency, but department name is not a super key because you can have two different instructors with the same department. Therefore, department name does not determine the idea of the instructor it is not a super key it violates this condition it is not a super key at the same time it is not trivial. Why do we want to eliminate trivial ones? Because if you take any schema even one which is actually in good form let us say we had instructor name salary. We can say that name comma salary determines name it is trivial and name is not a super key, but this really does not cause any problems because it is trivial. So, we are going to ignore the trivial ones among the non trivial ones if you have a dependency where the left hand side is not a super key then there is redundancy in the relation and we do not want this to happen. So, what we want is a schema where every relation is in satisfies this property then there would not be redundancy I hope this concept is clear we are going to do a lot of examples with this. Now, how do you come up with a relation which is in BCNF? Again the basic approach is that we start with the universal relation or some other set of relations which we got from ER model we will check if it is in BCNF if it is not we are going to decompose and we are going to decompose in a way that does not lose information. So, in particular if we applied this test for BCNF supposing BCNF is violated then we can be sure that there is a non trivial functional dependency of this form which holds where alpha is not a super key. So, what we do is we decompose R into 2 relations one is a relation alpha union beta that is it contains attributes of alpha and beta the other is a relation where we remove those attributes of alpha which are not in beta because you may have things which have attributes repeated in alpha and beta that does not make it trivial. So, essentially we remove the replicated attributes and then we the resulting alpha which is so sorry what we want is this other way what we want is to keep the attributes of alpha in R and in case some of the attributes of alpha also occur in beta we are going to retain them. So, what we are going to remove is those attributes in beta which are not in alpha if we decompose using this particular functional dependency we are going to keep all of alpha in R and to ensure that we only remove attributes of beta which are not in alpha. So, R minus beta minus alpha means remove all those attributes of beta which are not in alpha. So, all attributes of alpha remain in R, but the other attributes of beta go out we could have equally well done this by removing the attributes of alpha from here we could already remove from beta we could have already removed all the attributes which also occur in alpha and that would have been fine actually. So, anyway this is a formal definition. So, it allows some repetition between alpha and beta and still works. So, these are the two relations we create an our example alpha is department name beta is building budget. So, that in step we are going to remove building budget and create a new relation with department name building budget that is the first relation alpha union beta and the second one is id name salvi and we remove building budget and retain department name. So, this is what we get. So, this is the basic idea of how to decompose the next question is how do we know that this particular decomposition is lossless we do not want the lossy decomposition and I have some slides coming up on that which will show you that when we do such a decomposition the decomposition is lossless. In particular when we have a situation where the common attributes of these two form a super key for one of the two relations here the common attribute is alpha in our example department name is alpha the common attribute is a super key for this relation why because we started with the functional dependency alpha determines beta and create a relation containing exactly alpha and beta. So, obviously alpha is a super key here by the definition of super key it determines all the attributes alpha determines itself and everything in beta. So, that is basically how we generate BCNF schema by decomposing. Now, you could say that BCNF is good enough, but theoreticians said that hey there could be some problems in BCNF and the idea is that if we want the database to be consistent we want to enforce all the constraints on it. Now, in SQL you have primary key constraints you have foreign key constraints those are enforced by the database functional dependencies are a more general kind of constraint. Now, the idea was that we want databases to enforce functional dependencies also if we say that that constraint holds you should not allow an update that violates the functional dependency. So, the it is a it was supposed to be the databases job to enforce these. Now, as a practical matter databases today only enforce unique constraints that is they enforce super key constraints they do not enforce any other foreign functional dependency constraints, but you can kind of work around that with a process we will see, but the point is that if you know assuming a database enforces functional dependencies we want to have a schema where all functional dependencies can be checked efficiently. Now, you can always check a functional dependency by if you have got any lossless decomposition you can always join the things back and check the functional dependencies. However, if what you would like is if you have a decomposition what we want is that on each individual relation in the decomposition we want to check functional dependencies and ideally that is all that we need to do. So, what are we saying here if it is sufficient to test only those dependencies on each individual relation in a decomposition if by doing so we can ensure that all functional dependencies hold then we would say that the decomposition dependency preserving and we will see examples of this coming up it may not be clear now it is hold on, but the other result which came up is that there are certain relations for which there is no way to ensure dependency preserving decomposition which is also in BCNF and as a result people came up with another normal form called third normal form which is slightly weaker. I am going to cover all this business of you know how to what is dependency preservation and so on more formally later on, but the goal at this point was to motivate why people came up with yet another normal form and this normal form is defined as follows. So, the second schema r is in third normal form if for all functional dependencies in F plus that is the closure of F at least one of the following holds either it is trivial or it is a super key these two conditions are exactly BCNF, but 3NF adds one more or you know these two may not be satisfied as long as at least the last one is satisfied and that one is quite unintuitive it says that each attribute a in beta minus alpha. So, everything in beta which is not in alpha must be contained in a candidate key for r this is kind of non intuitive do not worry about how people came up with this, but we will see that this is a useful relaxation. I should also note that if beta has multiple attributes each of them can be in a different candidate it does not matter, but they should be in some candidate if they are not part of any candidate key then this condition is violated. So, what 3NF does is it weakens BCNF by allowing a third option here it turns out that the original way in which 3NF was described was a little bit different there was a notion of 2NF which somebody mentioned 2NF is second normal form was defined using something called partial key dependencies and 3NF was defined using what are called transitive dependencies, but it was shown later that it is equivalent to this definition. And we use this because it is cleaner and it is the connection to BCNF is a lot more clear here otherwise in the earlier definition of 3NF the relationship between BCNF and 3NF was rather confusing does not obvious. So, we use this particular variant if you use a different textbook you might find a different definition using the notion of transitive dependencies and that was actually the original definition of 3NF when cod define 3NF he use that other definition later other researchers figured that this is equivalent to that and they did not need to use this notion of transitive dependency. So, that is what 3NF is and basically the idea is the third condition is a minimal relaxation of BCNF to ensure dependency preservation if you did not understand the details it is hold off I am going to cover all of this in more detail I want to give the big picture first and then go into details. So, ultimately what are the goals of normalization if you have a relation schema are with a set of functional dependencies we want to decide if it is in good form if it is not we want to decompose it such that each of the this should be schema. So, again earlier on there was a many people use the word scheme to mean schema, but now it is kind of abandoned. So, in earlier editions of our book we had use the word scheme later we switch to schema uniformly, but this slide somehow did not get updated. So, scheme here should mean schema I will fix it in the later versions I will upload a corrected version. So, now each relation schema should be in good form the decomposition must be lossless join we do not want any lossy decompositions and preferably the decomposition should be dependency preserving. So, we are going to see all of this in more detail. So, in order to do this we first have to spend some time on functional dependency theory which let us infer dependencies from other dependencies and then we will see algorithms for decompose. So, we already said what is the closure. So, remind you the set of all functional dependencies logically implied by f is the closure of f. Now, how do we compute the closure? There are several ways of doing it one way is to use basically logic and come up with proofs to show that if this holds then these other things must hold. That is kind of tedious and it is always good to have a clean procedure for making inferences and this set of three inference rules are called Armstrong's axioms and they are as follows. Given a set of functional dependencies f we repeatedly apply Armstrong's these three rules to infer new things. The first rule is actually a trivial rule and in fact it is goal it is called reflexivity its goal is to to create all the trivial dependencies. So, what this rule says beta subset of alpha then alpha determines beta. So, using this we can generate all the trivial dependencies and we can do this up front even if ignoring the set of dependencies f we can apply this to create all trivial dependencies. The other two are augmentation and transitivity augmentation says that if alpha determines beta then gamma alpha determines gamma beta. So, what we are doing is adding something to the left hand side in order to determine to infer a new dependency. The last one transitivity is what we have seen if alpha determines beta beta determines gamma then alpha determines gamma. So, it is I think fairly clear that these rules are sound this one is obviously generating only trivial dependencies this one is again fairly easy to see. You can logically say that given any relation state if you have two tuples which have the same value for gamma alpha then obviously, they have the same value for alpha which means because of this dependency they must have the same value of beta. Therefore, already the two tuples have the same value of gamma and now we are also saying this they have the same value of beta. So, whenever two tuples have the same gamma alpha they have the same gamma beta. So, what I have done is verbally prove to you that this augmentation rule can is correct. So, it will hold if this holds this will also hold. So, this rule is sound similarly I can argue by saying that if two relation two tuples have the same value of alpha then they must have the same value of beta. Now, they have the same value of beta they must have the same value of gamma. Thus we can infer that if that is given that these two are already these two dependencies are known we can easily show that this must also hold that they must also have the same value of gamma. So, we can easily show that transitivity holds. So, those are the easy parts soundness. The really hard part is completeness. How do I know that just these three simple rules are enough to infer all possible things which we can infer from F. This is not at all obvious I am not even going to try to prove it here it is very non trivial to show that these are complete, but somebody has done that work for us we do not have to worry about it. So, bottom line is since somebody has taken the effort to show that this is complete we can just use these rules repeatedly and generate all possible things we can infer and that is F plus. So, that is a good news and it is not too hard to program these rules. So, now let us try to apply this to this particular example. So, let us take the schema R A B C G H I as I said these letters could stand for some real world attributes, but we are just treating them abstractly and given this set of functional dependencies A determines B A determines C. If C and G are the same H must be the same similarly C G determines I B determines H. What all can you infer from this? There are many many things one can infer from this there are a huge number of trivial dependencies which can be inferred. If we start doing all that you will get really bored. So, we are not going to do that what we are going to do is pick a few members of F plus which we can infer from these dependencies again using Armstrong's axioms. So, the first one which we shows A determines H it is not in this list, but we can determine it. How do we determine it? We can use transitivity just using Armstrong's axioms A determines B B determines H therefore A determines H. Now, here is another one A G determines I how do we do it can transitivity hold is that enough transitivity by itself is not enough. There is no way to get A G goes to I using only transitivity, but augmentation comes to the rescue. So, given A determines C we can augment both sides with G our goal was to get a right hand side which is C G. So, we do get that. So, by augmenting this one we get A G determines C G, but now we have C G determines I. So, by applying transitivity of this one A G determines C G with C G determines I we get A G determines I this is the one which we wanted to infer. So, that is the process there are many many more which we can infer using Armstrong's axioms. So, now here is a small quiz question this is a good point for you to check if you have understood this material. This particular dependency is actually a trivial one because the right hand side is a subset of the left hand side. So, you can infer it using reflexivity if you go back one slide reflexivity is the one which let you infer all the trivial dependencies. So, that is the right answer C and 106 centers have responded. So, many people have said can be inferred using augmentation. In fact, that is the majority. Now, certainly if you started with you know some other thing you might end up with this using augmentation. If you started with A determines nothing and then augmented both sides with B you will get A B determines B. So, in some sense you know D is not a totally wrong answer, but the best answer here is reflexivity because you can directly infer it without having any intermediate steps. So, C is the best answer here, but I would say that D is also not an unreasonable answer. So, many people have chosen one of C or D. We have some additional rule or the first one is the union rule if alpha determines beta holds alpha determines gamma holds then alpha determines beta gamma holds. So, what are we doing if the left hand sides are the same we can union the right hand side. This is actually very easy to show logically, but it can also be determined using Armstrong axioms. If you could not then we have a problem they are not complete, but it is been shown that Armstrong axioms are complete. So, let us take the first one the union rule. So, the question is how can we show the union rule using Armstrong axioms, but before I do that let me finish up the other rules decomposition says that if alpha determines beta gamma then alpha determines beta and alpha determines gamma separately. In fact, I argued this union and decomposition rule logically for a particular schema what are that schema I said department name determines budget department name determines building. So, I said we can also infer department name determines budget building that is the union rule and similarly the decomposition. So, it is very easy to see that they hold. What we need to show is that what will be nice to show is that they can be inferred from Armstrong axioms. And finally, the last rule is if alpha determines beta holds and gamma beta determines delta holds then alpha gamma determines delta holds this called pseudo transduce. And you can kind of see that it is a combination of transduce with augmentation. So, if you augmented this one with gamma you get alpha gamma determines beta gamma. And then if you look at this one and the augmented one transduce holds from which you can infer that alpha gamma determines delta. So, that is actually very easy to infer. What about union how can we do it using our rules? So, what we have is the union rule alpha determines beta alpha determines gamma that is given. So, can we use Armstrong axioms reflexivity transitivity and augmentation these 3 things can be used to infer this. So, what can we do here? How do we make this inference? So, what we can say is alpha using what we want is beta gamma on the right hand side. So, what are the steps we can do here? Can we augment this in some way and get it? So, let us try that supposing we augment this. So, this one alpha beta determines beta gamma. So, what have we done? We have augmented this by putting the same thing on both sides. So, what we have brought is the right hand side to beta gamma. What we want to infer is alpha determines beta gamma. We want to show this. So, the first step is to get this on the right hand side and the left hand side automatically becomes alpha beta. And this is through augmentation on applied on this rule. So, this rule plus augmentation gives us that one. Now, what we actually want is alpha determines this. So, what we can take is take this rule and apply augmentation of alpha on both sides. So, the left hand side would be alpha alpha determines alpha beta, but alpha alpha you know aside from being a crop in our context alpha alpha is the same as alpha it is a set. If you say twice it is not a multi set. So, alpha alpha is really the same as alpha. So, now, we have these two things alpha determines alpha beta, alpha beta determines gamma. So, applying these two plus transitivity what we get is alpha determines beta gamma. So, what we have shown is using Armstrong axioms the union rule. Now, it is a good time for this quiz given this particular schema with four attributes and these two functional dependencies. The question is which of the following is a candidate key for this schema. So, what is the candidate key? First of all it must be a super key. So, you have to check that it functionally determines everything and B it should be minimal that is no subset of it is a functional dependency sorry no subset of it is a super key. So, what we have here is which of these is a candidate key. So, first of all is A a candidate key you know. So, what all attributes is A determines, A determines B. So, that is good and then we also know that B determines C. So, we can infer that A determines A B and C, but does A determine B? No there is no dependency at all here which has B on the right hand side. So, the first thing we can realize is that anything which does not include B cannot be a candidate key. So, A C is out if you look at it A C can only determine A B and C nothing more it cannot determine B. So, certainly A D could be a candidate key let us check it A determines B B determines C D determines itself. So, we can easily infer that A D determines A B C D. So, it is definitely a super key is it minimal if you drop D we just saw that A by itself is not enough if you drop A A D by itself determine nothing other than D. So, it is in fact minimal and it is a candidate key what about A B D is it possibly also a candidate key you know there can be more than one candidate key, but in this case it is clear that if you drop B it is still a super key and therefore, A B D is not a candidate key it is a super key, but it is not a candidate key. So, the right answer is 3. So, now coming back to our slide there is a concept here which we kind of used implicitly in when I did the last quiz I said that given A how do I check if it is a super key I said that A determines B A also determines C and therefore, A determines A C A B C. Now, this concept can be generalized to find what is called the closure of an attribute set earlier we took a functional dependency set and did a closure of it this is different. Here we are doing the closure of a set of attributes and the closure determines what all attributes are functionally determined by the given set of attributes. So, this particular algorithm is actually very useful for the previous quiz question, but I put it out of order it is a little unfair may be to those of you who are new, but I wanted the intuition to come out before I went into the algorithm. So, the algorithm is actually very simple initially we set the result to alpha. So, what we have is we want to find alpha plus what all attributes are functionally determined by alpha given a set of function dependency f. So, alpha plus denotes the closure of alpha under f. So, f should always be specified if you do the closure under some other set you will get a different result perhaps. So, given a set of dependencies f we will do the closure as follows initially set the result to alpha and now apply this while loop repeatedly as long as the loop results in some change to the result. So, what do we do in each iteration of the loop? We look at every dependency beta goes to gamma in f and if beta is a subset of result then add gamma to the result. This test is applied repeatedly why because initially a particular beta may not be a subset of result, but after sometime after we have added some other attributes now beta may be a subset of result and at that point we can add gamma. So, what exactly is this doing? Initially we have alpha now if we have a dependency which says alpha determines beta we can say that beta is unique. So, we can add beta to that set now if you have any subset of result where there is a dependency beta is a subset of result and we have beta determines gamma we can infer that gamma must be uniquely determined by alpha because we already know that alpha uniquely determines beta because it is a beta subset of result. Therefore, alpha must uniquely determine gamma. So, gamma must be in alpha plus. So, it is a very very simple algorithm we can do it mechanically mentally very fast by scanning through this set of dependencies. We just keep looking at the dependencies as soon as the left hand side becomes a subset of result we add the right hand side we stop when we are not able to add any more to the result very simple algorithm. Let us take few examples this is the set of dependencies here what is a g plus we are starting with a g result is a g. Now we have a determines b and a determines c. So, we add b and c to the result. So, we get a b c g I am keeping the result sorted for convenience. Then the next iteration around c g determines h can fire c g determines i also becomes applicable because c g is part of result in the after the first iteration. So, in the second iteration h and i get added to the result actually we have shown it as two iterations h gets added and then i gets added. So, the result is a b c g h i can we add anything more we have a b c g h i there are no more attributes. So, obviously we have to stop here. So, it is clear that a g plus includes all the attributes. So, it is a super key this is how we check if something else is super key just apply attribute closure and if nothing more can be determined it is if everything is determined it is a super key. The next question is is it a candidate key. So, we have already determined that a g is a super key now we have to check subset of a g. The trick here is we do not want to check every subset of a g here actually it does not matter the only two subsets of a g are a and g. So, is a super key does a determine r we simply apply alpha a plus what is a plus here a plus has b it has c, but can we make any further progress no a plus is a b c b gives h. So, we have a b c h, but g is not going to come in g is not on the right hand side. So, we are not going to be able to include g in a plus a plus is not a super key. How about g plus is it a super key well if you apply g plus actually nothing changes nothing is can be inferred only from g therefore, g is not a super key. In other words both the subsets of a g are not super keys therefore, a g is a candidate key. So, that is a very quick check to see if something is a candidate key. Let us stop here.