 Now, we come back to how do you test for BCNF, so given a relation how do you test for BCNF? So, we will take each dependency if I have a particular dependency alpha goes to beta it is very easy to check if it violates BCNF compute alpha plus and if it contains all attributes it is fine if it does not contain anything other than itself it is trivial that is fine, but if it contains any attribute other than itself, but is not a everything then you know it is a violation. So, now, if I am given a particular relation R schema R it is not decomposed at this point just I am given something and I am given a set of functional dependencies on that one schema. Now, if I want to check if this satisfies BCNF or not I can actually just do the following I take each dependency in F apply the previous test that is I take each dependency say alpha goes to beta compute alpha plus and that will tell me using the previous test whether alpha goes to beta shows that this violates BCNF. So, if I just do this for all the given dependency I can check if R violates BCNF or not that seems like a very easy very cheap test for BCNF looks very enticing you know it is correct actually as far as this goes the problem which happens is what if you decompose R then how do you check for violation of BCNF for the decomposed relations. So, if I decompose R then I look at a particular relation in the decomposition supposing I try to apply this test I will take the given set of dependencies F and on that particular relation R in the decomposition I will apply this test alpha plus and see if it is a super key or not unfortunately it this test does not work. So, let us take this example R is A, B, C, D and the dependency set is A goes to B, B, C goes to D simple pair and we can decompose this into A, B and A, C, D by using the first dependency here now if you look at this thing here the first one the only dependency is A goes to B this is in fact in BCNF in fact any binary relation will be in BCNF there is a dependency at all one determines the other it is going to be a super key trivially. So, any binary relation is in BCNF we do not have to bother testing it the other one A, C, D how do we know whether it is in BCNF or not if we use this given set of dependencies A goes to B, B is you know if I compute the closure here what do I get if I if I just take closure of A does it contain anything other than A as B, but B is not in this. So, it is irrelevant I will say closure of A is only A. So, it is we can ignore that how about closure of BC B is not even here. So, I cannot do anything with BC. So, that is useless for this one. So, if I just use these two dependencies and try to do the closure and test it I might think that A, C, D is in BCNF, but in fact it is not if you think about it if you compute F plus you will infer A, C goes to D that is easy to check A, B goes to C as we saw before from that we can infer A, C goes to D and A, C goes to D is actually a subset of A, C, D, E and now if I compute the closure of A, C it only contains D not E. So, it is not trivial it is not a super key clearly A, C goes A, C, D, E is not in BCNF. So, I cannot just use the given set of dependencies F I need to do something more when I after decompose before decomposition I have given a set on that relation I just use it as after decomposing I cannot apply that earlier test one way is to compute F plus and then find the restriction of F plus 2 R 2 and then use that doing all that is manually is a bit of a pain. So, the other option is to look at every subset here and see if it is a functional dependency which holds and but it is not a it is not trivial and it is not a super key. So, what we do here that is shown in this test. So, this is the first thing actually test R I with respect to the restriction of F to R I that is all F D is an F plus that contain only attributes from R I. So, first I compute F plus then see which of those contain attributes only in R I that is a lot of work to compute F plus. The cheaper way is we do the following if you have got an R I, but every set of attributes alpha subset of R I compute alpha plus using what using the original set of dependencies F I do not have to change that I will stick with F, but using F I compute alpha plus that is easy. Now if alpha check that alpha plus either includes no attributes of R I minus alpha alpha plus may contain other attributes, but as long as it does not contain any other attributes of R I other than alpha itself there is no problem or it includes all attributes of R I then it is a super key of R I. So, it is either trivial with respect to R I or it is a super key of R I. So, if this holds then R is R I is in BCNF if it is does not hold if it is violated by some alpha goes to beta then consider this thing alpha determines alpha plus minus alpha intersection R I. So, intersection is on this side this part the right hand side is alpha plus minus alpha intersection R I. This is a functional dependency which can be shown to hold and we just said that alpha plus does not include all of R I. So, it is not a super key it is not trivial it is not a super key and this is a witness to show that R I violates BCNF and we can use this to decompose R I that is the idea. So, in this case the same thing as before what is A plus? A plus as AB and AC plus includes B say BC, but not B. So, it violates BCNF. So, that is a witness AC determines B is the witness. So, alpha plus minus alpha. So, alpha AC plus is ABC if you remove AC from that you are left with B. So, AC determines B is a witness we can use it to decompose. So, finally, this leads us to the BCNF decomposition algorithm which we can implement. Any questions at this point? You can read this algorithm. So, this algorithm says compute F plus, but we can do it more efficiently without computing F plus by the variant which we just showed. So, this particular one does not guarantee dependency preservation. So, this is a simple example we saw already. Now, take this quiz question Q 4. So, please compute the BCNF decomposition. So, how do we do the decomposition? We first create ACD and what is left is ABCD minus CD which is AB. So, that is the answer AB and ACD. There are more complex examples here which try to be more realistic. So, we have a class, course ID, title, department name, credits. So, what is this actually? You see it is a join of course with section basically with the classroom. So, it is a join of three different relations all the attributes from that join. And this join result is not in BCNF because we have this following function depends. Course ID determines title, department name, credits. So, if I take course ID plus is it going to determine anything else? We have the original dependencies. The course ID is not going to determine section ID for example. There may be many different sections for a course. So, we use this and decompose it. And then if you take the next one building room number. Does building room number determine section ID in semester? Of course not. So, again we can use that to decompose and get building room number capacity as one of the relations. And finally what you will be left with is the section relation. So, if you started with this and use these functional dependencies you will get back the relations that we had in our schema. Now again the last point here is that it is not always possible to get a dependency preserving BCNF decomposition. And this is an example it is a fake example, but the book has another more realistic example. So, if I have Jkl as a relation with two dependencies Jk goes to L and L goes to K. Then it is not in BCNF why? L goes to K is L is not a super key. If you decompose it what will happen? You will get L goes to K and Lj, Lk and Lj that is the only decomposition possible here. Jk goes to L does not violate BCNF. So, that is not useful for decomposing this is the only option you have. But now that the only possible decomposition and does it preserve Jk goes to L. So, what have we got? We have if I decompose using L goes to K I will get Lk and Jl. Now what are the dependencies that hold on this? L goes to K is the only one which holds on this on this nothing holds. If you do the closure of this it is fairly easy to see that J by itself does not determine well J does not determine L that is only meaningful thing L determines J, J determines those are only things which could hold neither of them holds in this case. Therefore, this is the only one I have and this does not imply Jk goes to L. So, this is an example to show that this does not have any meaningful any dependency preserving BCNF decomposition. So, if you were worried about this particular functional dependency you want to make sure it holds one option is to not decompose it let there be redundancy what is the redundancy there may be two tuples with the same value of L and then they will end up having the same value of K and if an update happens you have to make sure that both of them have the same value of K otherwise it becomes inconsistent. But if you want to ensure Jk determines L then you should not decompose it. So, that is the motivation for third normal form that you want to ensure dependency preservation with minimal violation of redundancy. We saw this definition earlier as a reminder it is a modification of the BCNF definition by adding this one extra thing each attribute A in beta minus alpha is contained in a candidate key. Now, can we apply the BCNF algorithm and modify it slightly to get BCNF turns out it is not so simple what is possible however it is a completely different algorithm which is actually synthesizes it is not it does not decompose it synthesizes a set of relations which will be in BCNF and that is a standard algorithm for generating a 3NF decomposition. So, here is another semi realistic example department advisor which says that student ID instructor ID department name that is the relation the functional dependencies here is that a student may have a multiple major student may be in computer science and electronics. So, let us say this is possible. So, if the student is in CS there should be an instructor in CS for that student. If the student is also in electronics there should be an instructor advisor in electronics for that student. So, that is what this models. So, given a student and a department there should be a unique advisor there cannot be two advisors. On the other hand the advisor is in a department. So, instructor ID determines department a person in CS cannot be a advisor for a student for the department they can only be advisors for their department. So, those are the two dependencies. So, now this is identical to the previous example abstract example we saw. So, if you want BCNF you will violate dependency you cannot ensure that a student has if you cannot efficiently check that a student has one advisor per department. However, this thing is actually in 3NF. Why is it in 3NF? We will take the two dependencies. SID department name determines IID it is trivially a super key obviously. Another one ID goes instructor ID goes to department name it violates BCNF for sure, but does it violate 3NF? Turns out that the right hand side department name is contained in a candidate. So, the SID department name is a super key you can also verify that it is not just a super key it is a candidate key. If I remove department name it is not a super key if I remove SID it is not a super key it is a candidate. So, this check shows that keeping the relation as is does not violate 3NF and this relation is fairly intuitive. If I want to track the advisor for a student in a department I am very likely to keep a table like this and BCNF is violated, but I still want to keep it. So, 3NF allows it. So, I may not insist on BCNF I will say 3NF is good enough and the redundancy in 3NF is fairly clear here. Now, there are two kinds of redundancies one is if you take this is a abstract example L determines K there are two tuples with the same L value the K value has to be the same that is one kind of redundancy. The other kind of problem is supposing I keep J K L and I do not have a separate relation which links instructors to departments. Then I will have to create a tuple with a null value for J J is the in the equivalent example that was the student ID. So, what is this is showing is that I will have to use null values to represent some information which is a ugly. In fact, if you think of it take that same example the student instructor and so on. If the only way to link an instructor to a department is through the advisor relation that is very odd you would not like that kind of a schema that means you have to allow redundancy and null values both. So, what you would end up doing probably is having a instructor department mapping in a separate table instructor table and an advisor table redundantly you would not store null values here. So, the same information can be looked up from either table you can find instructor department if the instructor is in this table also you can look it up from that table. Now, this actually shows another weird form of redundancy it is entirely possible for this table to consistently show that Suharshan is in CS department and that other table instructor table to claim Suharshan is in EE department. If you look at each table separately it satisfies the functional dependency that instructor determines department. In this table the only instances of Suharshan might be with the EE department student in the EE department is I am advising a student in the EE department. In the other table there is only one table for Suharshan and it says CS department. Look at each table it is consistent it satisfies BCNF it satisfies 3NF whatever satisfy everything. But there is inconsistency across tables this is an aspect which these things do not capture BCNF 3NF do not capture this redundancy across tables. Now, if you started with a universal relation and broke it up this will not happen there is only one relation there was no redundancy across tables only one table then you broke it down. But if you started with redundancy they may not detected redundancy across tables may never get detected. So, there are limitations to this framework in the context of if you start from ER you had redundancy in ER you are going to have redundancy in the tables. How can you have redundancy in ER? So, maybe I should answer this thing in the context of aggregates this is slight detour but I think it is worth mentioning because somebody was asking when you have done ER modeling why should you even check anything afterwards. So, let us go back to this here is a situation which models aggregation. So, we had this instructor student project ternary relationship. So, what is this relationship say that this instructor is a guide for this student on this project and you can have more than one instructor being a guide for that student on that project. Now, each instructor may give an evaluation of the student and you want to record that. So, now how do you record the evaluation? One way is to evaluation let us say is a they write a letter or give a grade or whatever. But let us say the evaluation is a document they have said that something about this person's performance. So, let us say that evaluation is an entity itself. So, one way to model that is as a quaternary relation between instructor project student evaluation. So, we have two relations one is instructor project student which records which instructor is guiding with student on which project and when they submit an evaluation we have one more relation which has a relationship correspondingly a relation which links instructor project student and the actual evaluation. Now, the information in the second table is going to replicate all the information in the first table it is going to replicate which student is on which project with which instructor and so forth. So, the same information may get replicated in two tables in one it may be a subset of it in the other, but the point I am making is there may be redundancy across tables even after doing ER modeling. Even if you do aggregation this slide was an aggregation. You could instead of a quaternary relation you could have a turn that turn a relation into a entity by aggregation and then related to evaluation this is a cleaner way of doing it. But even if you do this you may still have redundancy across relations. And the whole theory which we are looking at does not capture this ok. So, now what I am going to look at is the how to generate a 3 NF decomposition of a relation using the synthesis algorithm. So, this part has two steps. The first step is to define a notion of a canonical cover which is a minimal set. So, this is the opposite of F plus what is F plus it is everything which you can derive canonical cover is the other way can we strip down what we are given. So, we have a minimal set of dependencies which is equivalent to the given set of dependencies this is the first step. And now from this minimal set of dependencies we generate a set of relations. Now if you did the same generation of relation in the given set of dependencies there may be redundancy across relations. But if you minimize it you are removing that redundancy. So, 3 NF synthesis first minimizes and then generates a set of relations. So, what do we mean by minimizing? We want a set of dependencies which is equivalent in the sense that their F plus is the same, but it is minimal. What do we mean by minimal? It could be minimal because we throw away some dependencies it is not needed at all. But minimality may not only be in this sense it may be even in the sense that we can strip out some attributes from a dependency either the left hand side or right hand side. So, it becomes smaller. So, we can minimize by throwing out dependencies or we can take something and shrink it by removing attributes. What is important is when we do any of these remove attributes from anywhere to make it smaller the result should be equivalent. So, we can only throw out or strip out attributes if the result is equivalent. And we keep doing it as long as the result is equivalent. So, how do you determine if it is ok to strip an attribute while keeping equivalence that is the idea. So, a canonical cover is this minimal thing. So, to show the examples of redundancy A, B goes to A goes to B, B goes to C, A goes to C. We know A goes to C is redundant. If we throw it out we can derive it back. But you may also have other kinds of redundancy take this one. A, B, C and A, C, D. Can you throw out A, C, D? You cannot. So, then you cannot derive A, D. Can you throw out A, B? No. B, C? No. However, we can throw out C from this one. Throw out C from the right hand side. We are weakening it. Earlier we said A determines C, D. The decomposition rule actually says that A determines C and A determines D. We can replace it and then throw out A determines C because it is derived. So, you can think of it that way or you can think of it as just removing C from this side. It is always fine to remove it. It is not incorrect, but you might lose information. So, in this case we do not lose information because we can derive it back from the remaining one. So, that is redundant attribute on the right hand side. Now, take this other example A, B, B, C. A, C goes to D. In this case we can delete C from the left hand side here. From A, C we can delete C. Now, in general you cannot do this. If you remove C from A, C you are getting a stronger functional dependency. A, C goes to D does not mean A goes to D. So, we cannot just remove arbitrarily from the left hand side. However, if the remaining functional dependencies along with this implies that A goes to D then we can remove it. That is the idea. So, in this case A goes to B, B goes to C. Therefore, A goes to C. Using A goes to C and A, C goes to D we can derive A goes to D. So, now we can add that to the set temporarily. A, B to A to B, B to C, A to D and A to A, C to D. Now, between A goes to D and A, C goes to D. You can easily derive A, C goes to D from A goes to D. How do you do that? You are just adding an attribute to the left hand side. It is implied. So, you can throw that other one away. It is no longer needed because this stronger one implies that we can throw that original one away. This is the idea. So, before stripping from the right hand side we will make sure that the weekend form still implies all the original ones. Before throwing away from the left hand side, we will make sure that the original set implies the strengthened form. Those are the two steps. So, that is the basic idea of extraneous attributes. So, given a set F of functional dependencies and a particular functional dependency, alpha goes to beta in F. We will say that A is extraneous in F if, first of all, A is in alpha. It is on the left hand side. And F logically implies F minus this one, union the strengthened one. So, we are taking out the weakened one and adding the strengthened one, alpha minus A goes to beta. If this is implied by the original set of dependencies F, then we can remove that attribute from alpha and keep the resultant functional dependency in the set. That is the idea. Is this clear from the previous example? And similarly extraneous on the other side, the test goes the other way. Instead of seeing if F implies this, we are going to see if the weakened functional dependency. What is the weakened set? From F we remove alpha goes to beta and add back the weakened one, alpha goes to beta minus A. If the weakened set implies F, then they are equivalent. So, then A is extraneous in beta. So, those are the two things which you have to check. So, we can actually just, well, the implication in the other direction is trivial. I said equivalence. So, weakened form is obviously implied by the stronger form. So, we do not have to explicitly check for that. Here is an example once again. A goes to C, AB goes to C. Now, is B extraneous here? Because that original set, if you drop B, what do we get? A goes to C. But if you take the original set, it already has A goes to C. So, B is extraneous. So, what do you get after that? You delete B, you get AC and AC. Two copies are the same thing. You just keep one copy. A goes to C. And in hindsight, that is obvious. This is a weaker form of this. But what we did is we just applied the algorithm blindly and we got to the same result. Another example, A goes to C, AB goes to CD. Now, we cannot drop it like that. However, C is extraneous on the right-hand side because it can be derived from A goes to C. So, we drop it. We will get AB goes to C and A goes to C. AB goes to D rather. AB goes to D and A goes to C. And what we do eventually is apply this fairly straightforward algorithm. Check if each attribute is extraneous by using attribute closure. Strip it down, get a minimal set. It is very efficient. It is not hard to do it by hand actually. And we can do all this and come up with a minimal set of dependencies which is called the canonical cover. So, F logically implies everything in FC and FC logically implies everything in F that is their equivalent. And nothing in FC has an extraneous attribute. So, we keep deleting extraneous attributes till nothing more can be deleted. And furthermore, each left side is unique. In other words, what we do is if you have two things with the same left side, we will union them. That is the one extra step. So, that is the canonical cover and come back to the 3NF algorithm. So, the goal was minimize. Now, the 3NF algorithm does one more step. It says take the canonical cover FC and do the following. Step through each dependency alpha goes to beta in the canonical cover. If none of the existing schemas contains alpha goes to beta, then create a new relation with alpha beta. Now, this step, you know, we take that case where we had instructor department, student instructor department. Student instructor department would be created maybe. It is minimal. If we also have instructor department, then it is contained in the other one. So, you may have a canonical cover where one of the relations is contained in the other. So, you may skip that and that is what this step does. And finally, if none of the schemas. So, you just create essentially you create one relation containing all the attributes of one functional dependency and do this for all the functional dependencies. You may remove some redundant relations and one last step. It is possible after all this that there is no candidate key contained in any of the relations. Why? There may be some attributes which were not determined at all by anything. So, it is possible that none of the schemas contains a candidate key for R in which case pick any candidate key and make that a relation by itself. So, these are the set of relations that you create. Like you said optionally remove redundant relations. So, you get a set of relations one per functional dependency and maybe one more for the candidate key. Here it is called a decomposition algorithm, but it is also called a 3NF synthesis algorithm because it is not doing stepwise decomposition. It is taking a relation and giving you a decomposition straight by working the other way. And in fact, this algorithm not only ensures in 3NF, it is dependency preserving and lossless join. Dependency preserving should be obvious, right? Why? Yeah, every functional dependency has a corresponding relation unless it is contained in another relation when in which case it can be checked there. So, it is very easy to see its dependency preserving. Is it lossless join? It is you can prove it essentially. There is a proof in the, it is either in the book or in the solutions. There may be an exercise. I think it is in the book itself. So, you can go read it up. The question always comes up. What should you choose? Should you choose BCNF? Should you choose 3NF? Gate question which has been asked many times is assumes that the ideal form is 3NF. Is it ideal? I do not know. I think that is a design decision. Is dependency preservation very important? Maybe it is or maybe it is not. And redundancy is more important. So, I think that is a decision which there are no single solutions. I cannot say do this or do that. That is up to the designer and we should leave that at that. We cannot say one is the ideal form. You can, you can choose whichever depending on your needs. So, let me take a quick check. How many of you cover multi-value dependencies in your course? Up to fifth normal form, all the normal forms. So, many do not. So, what I would say is multi-value dependencies are something which we should cover because it is a very natural thing. Do you go beyond multi-value dependencies? That is up to you. The higher normal forms are hard to deal with. But multi-value dependencies are practical and you should cover them. So, what is a multi-value dependency? It is basically something which says. So, here is a situation where we have an instructor child which maps the instructor to the child and instructor phone which maps the instructor to phone number. Now, of course, an instructor may have more than one child assuming it is not China which enforces the one child policy. Even there they have two relations and in India of course, there are many children unfortunately. And then instructor phone and certainly everybody today has more than one phone. There is a home phone, there is an office phone, there is a mobile phone, there is a mobile phone with dual SIM, there are two mobile phones. So, a lot of phones. So, you can represent it using these two relations. No problem. Have multiple tuples in there. What if I joined these two relations to get ID, child name, phone number. What functional dependencies hold on this? Does any functional dependency hold on this? Does ID determine child name? No. ID determine phone? No. Child determine phone? Of course not. Phone determines ID? Not necessarily because it would be a shared phone. Phone determines child of course not. So, there is no functional dependency on this schema. Therefore, is it in BCNF? Yes. There is no functional dependencies trivially in BCNF. Is it in 3NF? Sure, it is in 3NF. Does it make sense to store this relation like this? No. We got this by doing a join of those two. You can see there is a lot of redundancy here. The point to note here is there is no relationship between the child name and the phone number. Now, there may be a situation where the instructor reserves this phone number to be used by this child and that other phone number should only be used by that other child. This is a weird example, but supposing you wanted to record this kind of information, then maybe it is useful to keep this table. But the point here is that that is meaningless. And in terms of multi-valued dependencies, what we are going to say is that the relationship between the instructor and the child name is independent of everything else in this table. If you take the join table, if you started with that, the relationship of ID to child name is independent of anything else in the table. And in fact, the relationship of ID to phone number is also independent of everything else in that table. It is not functionally determined, but it is an independence. That is the basic idea. It is independent. So, we are going to write a multi-valued dependency by a double arrow like this. And we will say that ID multi-valued determines phone number. It does not functionally determine, but it multi-valued determines. And a multi-valued dependency, in fact, a functional dependency implies a multi-valued dependency, but not the other way. We will see that. We may not have time for it, but that is a basic property. So, now we are going to have a set of functional dependencies and a set of multi-valued dependencies. And we have to normalize taking all of these into account. And there is, there are extensions of the earlier things for multi-valued dependencies. In particular, there is something called fourth normal form, which is stronger than BCNF in the sense that BCNF only considers functional dependencies. 4NF considers functional dependencies and multi-valued dependencies and removes redundancy. So, if you apply BCNF, this particular schema, you can show, if you apply 4NF rather. If you apply BCNF, it is in BCNF. If you apply fourth normal form, you can show that this schema is not in fourth normal form. Therefore, it is a bad design. And this schema, on the other hand, is in fourth normal form. Moreover, what is interesting is that if you assert that child name is independent of ID and phone number is, sorry, is independent of phone number, then if you have this situation, David and William are the two children and these two phone numbers, 1, 2, 3, 4, 4, 3, 2, 1 are the two phone numbers. If you had David 1, 2, 3, 4 and William 4, 3, 2, 1, you must also have David 4, 3, 2, 1 and William 1, 2, 3, 4. If you only had the first and the fourth tuple, you are implying that that child is somehow connected with this phone number, which is not true. We just stated that the child is determined independent of the phone number. Therefore, you cannot have just those two tuples. You must have the other two tuples also. So, multi-value dependency says that if you have these two tuples, you must also have those other two tuples. Otherwise, there would be a erroneous implication that this child is linked to this phone number, that is the intuition. And the point is that now given that these two tuples must exist, the decomposition is lossless time because if you decompose like this and then join back, we will get all four tuples. In the decomposition, there is no connection of child to phone. So, when you join back, we will get all four tuples back. So, if you started with only two tuples and got four, there is a problem. The multi-value dependency says you cannot have only these two, you must have the other two also. That is the idea. So, I do not have more slides on multi-value dependency given the lack of time. So, we will stop here.