 to get back to normalization. We were looking at functional dependency and the key point that I noted is that a functional dependency is not a property of the current data in database. It's a property that you expect to hold always. Now, what kinds of things can we do with functional dependencies? Well, you when you use primary key dependencies, that is a special case of functional dependencies. Again, I think most people are aware of this. So, if you have particular key functionally determining all the attributes of a schema, then that key is a super key and then a candidate key is a minimal one. So, what that says is if that key determines all attributes in the schema, but no subset of it determines the whole schema, then it's a candidate key. And finally, you choose one of the candidate keys as primary key. But of course, there are things which functional dependencies can express which super keys, primary keys and so forth cannot express. For example, in this bad schema which we had, ID does determine name, salary and department name and in fact it also indirectly determines building and budget. But department name only determines building and budget. So, it's not a super key, but there is a functional dependency, department name to building. So, now how do we use functional dependencies? There is some notation here. We will say that particular relation is legal if it satisfies functional dependencies. And we will use the terminology a functional dependency F or a set of functional dependencies holds on R if all legal relations on that schema satisfy those functional dependencies. So, name functionally determining ID may have happened to hold true on an instance, but you cannot guarantee it. You may have two in people with the same name. So, you cannot say this holds in general. In this notation of trivial functional dependencies, you probably already know about it. A trivial functional dependency is one which will hold on all possible relation instances. In particular, instances trivial functional dependencies trivial, if it is of this form alpha determines beta where beta is a subset of alpha. Now, obviously when you say two tuples agree on alpha, then beta is a subset. So, obviously they have to agree on beta also. So, this will hold on any possible instance. So, it is trivial. There are of course many such trivial dependencies. How many are there? If you have n attributes, you can take any subset for the left and any subset of that for the right and that is a trivial functional. So, there are lot of trivial functional dependencies. So, one of the problems when you try to do practical stuff with functional dependencies is if you try to do it exhaustively, there are way too many to enumerate. So, what we want to do is restrict our algorithms to a smaller set of dependencies. There is another notion of closure. Given a set of functional dependencies, we can infer more dependencies. So, as an example, if we know that a functionally determines b and b functionally determines c. So, then we can infer that a functionally determines c. Why is this true? If two tuples have the same a value, the first dependency means they have the same value for b. Now, if they have the same value for b, the next dependency means they have the same value for c. Therefore, we can infer that whenever two things agree on a, they also agree on c. So, a determines c must hold. So, this is a dependency which we have inferred logically. So, the set of all functional dependencies logically implied by a given set of function dependencies f is called the closure of f. There are many such things in there. It is a very large set. In general, you do not want to compute that whole set, but conceptually it is important to understand what this set is. It is everything which is implied by this. And you can even say that every trivial dependency is in here because it is implied by any set of functional dependencies. So, the closure is very large, but we will use this notion of a closure in several contexts. We will denote by f plus the closure of f. It is of course a super set of f. Everything in f has to be in the closure. So, now let us see how we use this thing. Again, you note out familiar with this just to refresh your memory. What is the Boyce-Cord normal form? It says something about the relations. So, what it says is a relation is in Boyce-Cord normal form with respect to a particular set f of functional dependencies. So, somebody has determined that these are the dependencies that should hold. Now, you check if the relation is in BCNF. If for all functional dependencies in f plus because the definition is with respect to this very large set, but there are many times when we can get away with not actually enumerating the whole set, but the definition is with respect to f plus. For everything in f plus, let us take a particular one alpha determines beta where both alpha and beta of course have to be subset of r. So, these are dependencies on this schema. One of these two should hold. Either alpha determines beta is trivial that is beta is a subset of alpha or alpha is a super key. So, what does it mean? It determines everything else. So, what does it rule out? It rules out the possibility of having a functional dependency whose left hand side is not a super key of the whole relation. This is exactly the kind of thing we saw when we combine instructor with department. What happened there? Department name function determines building and budget, but department name is not a super key for the instructor department relation. It of course does not determine the instructor. So, this dependency shows, it is a witness to show that that particular relation is not satisfying voice called normal form. And it is obvious that when you have such a dependency, then you have a possibility of redundancy. Why? Because the left hand side here department name can appear multiple times in here. If it can, if it is a super key it can only occur once, but if it is not a super key it can occur multiple times. And each of the times it occurs the building and budget have to be the same. Therefore, there is redundancy. Voice called normal form rules out this kind of redundancy. There are other kinds of redundancy with multi-valued dependencies which it does not rule out. It is not the ultimate in any sense, but it rules out if it satisfies, if a particular schema satisfies DCNF, this kind of redundancy can be avoided. When there are other kinds of redundancy between relations and so forth which this does not target. So, now how do you come up with schema which is in BCNF? And the standard way is to start with whatever schema you have given, check if it is in BCNF. If not, you will find some dependency which shows that it is not in BCNF and now you decompose it. And the important thing is that the decomposition should be lossless join, which lossy join that you are losing information. So, what we want is a series of steps, each of which guarantees lossless join decomposition such that at the end the relation, the final set of relations satisfies BCNF. So, basic decomposition is actually very straightforward. Given a particular dependency alpha determines beta which violates BCNF, what does it mean? Alpha is not a super key, then we decompose R into two relations. One is alpha union beta, the other is R minus beta minus alpha. So, what are we doing? We want to retain all the attributes of alpha in the second relation, but the attributes which are only in beta can be removed. So, what have we achieved? In the previous case the department name determining building and budget that gets pulled out into one relation department name building budget and this part R minus beta minus alpha, in this case beta and alpha is disjoint. So, beta minus alpha is just beta that gives you department name sorry this part beta minus alpha is building budget that gets removed and what is left is department name and so we have ID, name, salary and department name. What is common between these two departments? In general alpha is common between these two relations. So, what can we say now? If alpha is common, alpha is a super key of the other relation. What we can say is after you decompose and remove duplicate tuples of course, we can say that if you take the second relation, take a particular tuple here, it will match only one tuple of this first relation. So, in general lossy decomposition gives rise to extra tuples. In this case the functional dependency ensures that this will match only one tuple. So, you cannot get any extra tuples that is the intuition. So, in general when you have a decomposition like this there is some slides coming up on it, when you have a decomposition where the common attributes form a super key of one of the two relations, the decomposition is lossless. So, this particular decomposition in this manner will be lossless and the basic idea is we keep doing this until the final things are no longer violating BCNF you have done. Now, sometimes you know you want to worry not only about redundancy, but you might want to ensure efficiently that the given set of relations after you decompose, you should be able to check functional dependencies on those. Why is that? A functional dependency is a constraint on the schema. If you have decomposition which you got somehow and now you add a tuple to it, how do you know that does not cause a violation of some functional dependency? What I mean by violation of functional dependency? I have decomposed, if I join it back I will get back the relation I started with. Now, there was some functional dependency on that relation. I should not be allowing a new tuple, a state of the decomposed schema where if I join back the result will violate the functional dependency. I have allowed an inconsistent data to come into the database because it violates the functional dependency. So, how do I check this? One way is I can every time I do an update I do a join again and then check if the update causes a violation of a functional dependency in the joined result. As you can imagine this is very expensive. Doing this on every join is very, very expensive. What is easier is to be able to check this on individual relations in the decomposed schema. So, it would be good if I did not have to do a join in order to check a functional dependency. Is this possible? The answer is first of all after doing the decomposition I can infer a number of functional dependencies on the individual relations. Each of these can be checked efficiently because it is a local to that relation. The question is if I can check these can I show that all the original dependencies that I started with will also hold. If a particular dependent decomposition is such that all the local checks will guarantee the original set of dependencies will hold then I can check for functional dependencies efficiently. Such a decomposition is dependency preserving. So, this is what it says. If it is sufficient to test only those dependencies on each individual relation in order to ensure that all the functional dependencies hold then the decomposition is dependency preserving. We will see a more formal specification but this is the intuition and DCNF it turns out it is not always possible to get a dependency preserving DCNF decomposition. So, certain in certain situations people are willing to allow redundancy to allow efficient checking of functional dependency. It is a tradeoff not ideal, but you may want to have the ability to make that tradeoff between this and that. So, either case you lose something in one you have to store redundant data and make sure it is up to date, but you can efficiently check for violation of functional dependency. In the other case there is no redundancy, but you may allow erroneous data to come into the database that is the tradeoff. Third normal form is this alternative again you probably are familiar with this, but the formal specification of third normal form there are depending on the textbook you use there are two different, but equivalent ways of specifying this. The version we use is based on the, this is the definition. We say that schema is in third normal form if for all alpha determines beta in F plus at least one of the following holds alpha determines beta is trivial or alpha is a super key so far is the same as DCNF. 3NF has one extra relaxation which says each attribute A in beta minus alpha is contained in a candidate key for R. Each attribute can be in a different candidate key, but each and every attribute if you take one at a time it must be contained in some candidate key. This sounds very weird what is this weird thing why do we care about candidate keys and so forth. So, it is not intuitive, but it has there is some history to it. So, this is kind of a minimal relaxation which still allows you to ensure that you can have a dependency preserving. The goal was dependency preservation at some minimal redundancy this is the minimal kind of redundancy which you want to allow to ensure dependency preservation that is that is roughly the intuition although it is not clear how this works at this point. Any questions? So, the goal is to take a given relation decide if it is in good form normal form if not decompose it. So, there is a whole theory of one cell dependencies the first one is the closure I already told you what the closure is it is all the things which are implied. Now, how do we compute the closure how do we find out what all this implied. So, there are several ways of doing it one way is to use what are called Armstrong's axioms those are inference rules. So, we keep applying these rules until we cannot infer anything new. So, there are three basic inference rules the first one says it is if beta subset of alpha then alpha determines beta this rule generates all the trivial dependencies it is called reflexivity. The second one says if alpha determines beta then gamma alpha determines gamma beta this is the augmentation rule and the last one is transitivity which we saw earlier if alpha determines beta beta determines gamma then alpha determines gamma. How do you know only these three rules are enough do we need some more inference rule. So, that we are claiming that these three are enough if anything can at all be logically derived you can derive it using exactly these three rules. The proof for it is not trivial we would not even try to do it in a course like standard course in fact you know I have read it and I have forgotten it. It is a non-trivial proof to show that these are sufficient you can generate a lot more inference rules using these which may be useful for humans, but all of them can be derived using just these three rules they are minimal they are sound meaning that anything that is derived is correct that is the easy part that is pretty obvious in fact the other part is complete which says that they generate all functional dependencies that hold and showing it is complete is the hard part. So, here is an example we want to compute things which hold on this set of rules. So, I will give you a few minutes to work out a few member few things which one can infer from this set bring out a piece of paper and write it out. So, first of all there is a lot of trivial dependencies you can generate you probably do not want to generate all of them it will take forever. So, I will focus on just the non-trivial one ok I think we can stop now and take just go over the solutions. So, we can move on to other topics. So, go ahead and suggest some of the answers start the simplest ones which can be directly derived and then a to a determines h why because a determines b b determines h. So, transitively a determines h ok next one a g to i why is that augment well not directly augmentation you have you have several things here ok. So, you have c g determines i and you have a determines c. So, how do you derive a g determines i from these two can you directly apply these no you have to first do augmentation. So, you can say from a determines c what can you determine by augmentation a g determines c g. So, this step is augmentation and then from here using this. So, we can give names to these right if you call this 1 2 3 4 5 6 ok. So, this is 7. So, 7 is got by augmentation from on this one 2 ok. So, that is how you derive 7. Now, from this and c g determines i you get a g determines i by 4 7 augmented sorry transitively. So, that is one of the steps which you can use ok what else can you derive a to b c how do you derive that. Now, this is a new rule which you can say union rule you know about it from Armstrong's axioms can you derive the union rule should you can how do you derive it. So, from a you can determine a b how is this derived augmentation on 1 and now you have a determines c. So, using augmentation on that you can get a b determines b c augmentation on 2 and now combining these 2 a determines b c by transitively. So, let us give these numbers this is 8 9 10 transitively on 9 10 and 10 ok. So, if you are a human doing this you will say well it is obvious you can union the right hand sides of two dependencies whose left hand sides are the same and this two step process is actually a generalize you can generalize it to say that whenever the left hand sides are the same you can combine the right hand sides. So, that is can be added as a union rule it is not a fundamental thing in the sense it can be derived from the other axioms, but you can add it to make derivation easier c g to h i and that is again using the same. So, now, I will say union rule because we just proved the union rule I will use the union rule anything else there are probably a few more addition to the trivial anyone came up with anything else a g h i for the same reason we have determined a g determines i similarly a g determines h and then union it ok. So, let us not try to go over every possible thing hopefully all of you got at least a few of these given the above function dependency a b determines b can you you have a few options cannot be inferred can be inferred using transivity reflexivity and augmentation augmentation you take a determines b augment both sides with b you get a b determines or you can start from the trivial thing using reflexivity I can also get a b determines b you have both options ok. So, the additional rule union rule we also saw then there is a decomposition rule if alpha determines beta gamma then alpha determines beta and alpha determines gamma and pseudo transivity is small variant if alpha determines beta and gamma beta beta determines delta then alpha gamma determines delta we in fact use that we added something with the left hand got something just like this ok. So, now coming to quiz question 2 given this schema with this two dependency a determines b and b determines b which of the following is a candidate key for r a, a d, a c we have several options is a a candidate key how do you find out you can apply Armstrong's axioms, but there is actually a simpler algorithm to find out the closure of a given set of attributes ok there is a closure of all of a set of functional dependency, but the other part which is given a set of attributes what all does it determine does this set of attribute determine can be done much more efficiently. So, in this case how do you find out what all a determines given a, a determines b implies that b is also determined by it. So, I will add a b to b to that set a b now since we have b determine c and b is part of the set a b I will get a b c now can I add anything more to that no b is not a member of that. Therefore, a is does not determine the whole schema it is not a key at all similarly a c its closure of a c will give you what a b c same you cannot it is not a super key how about a d you will because you will get a b c and d is already there. So, all the things are in there. So, it is a super key a b d is also super key, but it is not minimal since a d it is a subset is already a super key. So, the candidate key is a d here. So, that brings us to closure of attribute set given a set of attributes alpha the closure of the set of attributes alpha under f which is denoted by alpha plus is a set of all attributes set of functionally determined by alpha and the algorithm which I just outlined the very simple algorithm is here you start with result equal to alpha and now for every beta determines gamma in f if at this point beta is a subset of result then add gamma to result union it to result. So, it is a very cheap check and can be done very fast. So, in fact we can use the attribute closure to even determine functional dependency closure can you give a simple algorithm which is very easy to program this can be programmed very easily. So, now supposing I already programmed this a function which given a set of attributes will return all the attributes that are functionally determined by it. Now, can I compute f plus using this I have given you a function given a set of attributes alpha it will determine everything which is it will give you back a set of all attributes which are functionally determined by alpha from this can I compute f plus simple easy to code algorithm what is it not using Armstrong's actions and form those are a lot more complex to code can do it, but there is a much easier way in this case expensive in terms of runtime but easy to code. So, I hope you understood the question I am given you a subroutine which implements this algorithm. So, given a input which is set of a set of attributes its output is the set of all attributes functionally determined by that given set of attributes. Now, I want to write a main program which uses this to generate all of f plus. Sir, we will start from a functional dependency we will compute the closure of that one and after that we will set we will check the set of attributes which is actually determinant for the others then we will derive from that attributes. Not quite it is not enough actually a fairly simple brute force algorithm we will consider every subset of attributes we have a set of attributes consider every subset of it and on that subset call this function. Of course, there are lot of subset like I said runtime is high, but so what call this function on every possible subset of attributes what is it going to return given a subset it will give you another subset which are functionally determined. From this I will output a number of things which is on the left hand side that subset which I gave as input. So, let us take a particular subset let us say I consider a particular subset S1 and the output is S1 plus the set of all things which are determined by S1. So, I can output now S1 determines S2 for every S2 subset of or equal to S1 plus. There are lot of these things you do not want to generate all this huge number wherever possible, but if you need to do it this is a simple way of doing it easy to code. So, now I do this for every S1 subset of or equal to R output S determines up to for every S2 subset of S1 plus that is it just a couple of lines of code and then a call to that other thing to determine S1 plus equation. We have some more slides which do closure of attribute sets. So, we have the same set of dependencies as before and we compute AG plus result is AG initially. Now, we have A goes to B. So, B is added to the right hand side and also A goes to C we can add that and therefore, we get A B C G and now we have C G goes to H and so we get A B C G H and so we get A B C G H and from we also get C G goes to I. So, we get A B C G I. So, that is the final result. Now, is A G a candidate key how do you check that? First of all is it a super key? We can answer that question easily the result was A B C G H I which is all of R. So, it is a super key is it a candidate key? Well you take out one attribute at a time and check each of the resultant sets. And in this case if you pull out A what is left is G is G plus does it contain R? No, how about A plus does it contain R? If you apply A plus you will get B H C that is it you do not get G H or I. So, neither A plus nor G plus is a contains all the attributes. So, neither of them is a super key therefore, A G is a candidate key. So, attribute closure is actually very easy to check as a human it is very easy to run that algorithm and it is a quick way to check if something is a candidate key. This is a efficient test for candidate keys. So, now attribute closure can be used for many things the first was checking of super key we saw that. And then how do we check if you are given a particular functional dependency if you want to check if it is there in F plus how can you do it? I have given you F I give you one more functional dependency and I ask you is this functional dependency in F plus. So, that is actually easy. So, given a new functional dependency alpha goes to beta just compute alpha plus using the given original set of dependencies F not using this one using the original dependencies. Now, if beta is contained in alpha plus we know that the original set of dependencies have shown that whenever two things are equal on alpha they will be equal on beta also. So, alpha goes to beta is a member of F plus this is something important we will be using it later in 3NF synthesis algorithm and so forth. So, we will take a dependency and check if it holds by doing alpha plus and seeing if beta is contained if you have any doubts about this R it is a keep test and then computing closure of F we just saw for each gamma subset of R compute gamma plus for each S subset of gamma plus output gamma determines S that is what we just did. So, now sufficient condition for lossless join this we are kind of revinding we saw this before you remember what is the definition of lossless join a sufficient condition is that one of the tables common attribute is a super key of one of the two decomposed tables. So, in other words either R 1 intersection R 2 determines R 1 or R 1 intersection R 2 determines R 2 these are the two choices I mean at least one of these must be true both may be true that is ok. Now, how do you know this is lossless join we discussed that before we said that you take a particular tuple here it can only match one tuple there. So, it is a sufficient condition is it a necessary condition turns out that if you only restrict the thing to functional dependencies it is necessary, but functional dependencies are not the only kind of constraints that we have there is something called multivalued dependency which will come to which you have probably know about and we will actually generalize this in the context of multivalued. So, there may be a situation where a multivalued dependency holds, but not a functional dependency and decomposition is still lossless. If you consider only functional dependencies it is necessary ok. Now, let us take another useful example a b c a determines b b determines b. Now, you can decompose this in two different ways what are the two ways lossless decomposition of course what are the two ways a b a c b c and b a one of them is a b b c. So, how did we do this we took a functional dependency. So, the way to prove this is lossless we can show that because a determines b the common thing is b here and because we have b determines c we know that this is lossless join. Why did we choose to decompose it if you remember the b c n f decomposition algorithm we picked one of these may be in this case we picked b determines c and the relations we got were b c and a b c minus c which is a b c. So, that is what we got it is clearly lossless join we saw that is it dependency preserving yes. In fact, if you see the original set of a b and b c that can the first relation you can check a b the second one you can check b c. However, there is another decomposition which if you start by decomposing a b. Now, what happens a b is one relation the other one is a b c minus b which is a c is it lossless join yes the common part is a it determines b, but it is not dependency preserving why there is a functional dependency b determines c. Can you check it on the individual ones what are the dependencies that hold on the individual ones on the first one a determines b holds on the second one what functional dependencies can hold does anything hold first one does any functional dependency hold on the second one a determines c holds it is not in the given set a b b c does not contain a determines c, but we can infer it and a determines c of both sides of it are contained in yeah at the second r 2 a c. Therefore, a determines c can does hold on it and you can check it you can make sure that no two tuples are there which agree on a, but not on c. However, we can create a state of this thing where b determines c is violated how do we do that we can add a tuple supposing we add the following tuples we want a b and b c. So, we want to violate we want to see that it can violate b determines c. So, maybe we can add 1 2 and 2 2 does what is the dependency here a determines b does it hold sure it does over here I have 1 say 3 and 2 4 what is the dependency here a determines c does it hold yes if we join it what are the tuples we get we will get 1 2 3 and 2 2 4 now on the second one we on this if you see b c these two tuples have the same value for b, but different values for c. So, the join violates this dependency b determines c. So, we could check a determines c here a determines b there if we only check these two we may allow an update or a insert to happen which results in violation of b determines c this could never happen if we had chosen the other decomposition. So, that other decomposition dependency preserving decomposition is clearly better you would not choose this if you could choose that. So, formally a decomposition is dependency preserving if you take let f 1 f i in general be the set of dependencies in f plus that only include attributes of r i. So, those are the only ones which can be efficiently checked locally on each of the r i I have taken r I have broken it up to r 1 r 2 up to r n now I am seeing what all can be checked locally on each of those. Now, if I take the union of these local ones and take the closure if it is equal to f plus then I know that if I check these it is enough the other ones will definitely hold if this is true then the decomposition is dependency preserving but as we saw the previous one that decomposition is not dependency preserving this particular one is d c n f this decomposition is d c n f lossless join and dependency preserving the other one is not dependency preserving.