 Okay, people welcome back. So I would like to start this session with some interaction. How to check whether a decomposition is lossy or lossless joint decomposition. Okay, so how to check if a decomposition is lossy or lossless. So I have stuff on that coming up or unless somehow I lost the slide. So the basic idea is that before you decompose you must make sure that certain functional dependencies hold or if we move to multi-valued dependencies you could have alternatively a multi-valued dependency or in fact there are some further generalizations called joint dependencies. So the idea is that you take the original relation and check if certain properties hold if so the decomposition is lossless. So what are these properties? So let me use the white board. The question was how do you know that a decomposition is lossless joint? So the condition is if I have R being decomposed into R1, R2 what we want is the joint of natural joint of R1 and R2 ought to be equal to R. This is what we want but how do we know this? Is there a sufficient condition which will ensure this? So natural joint obviously depends on what attributes are in common. So let's take the intersection. So these are actually I am using R1, R2 as schema and relation kind of loosely. So if I want to be more specific I have to take actual relations on this but let's not worry too much about this. So what I want is that the common attributes what are the common attributes? R1 intersection R2 those are the common attributes. So there is a sufficient condition on the common attributes which ensures that the decomposition is lossless and that sufficient condition is that the common attributes either determine R1 or they functionally determine R2. One of these two should be satisfied. If it is then the decomposition of R into R1 and R2 will be lossless that is when we do this joint it will give us back the original relation for any relation state which satisfies the function. All of this is based on a set of functional dependencies. So if R satisfies the given functional dependencies based on which we checked that the common attributes functionally determine either all of R1 or all of R2 the joint will be lossless. Sorry the decomposition will be lossless. Any follow up question? During normalization we have to use the dynes. If we are dealing with small database there will not be any performance issues. Now my question is will there be any performance issues if we make use of huge database? So question is if you normalize and break things up then to get required information you have to join things back to get it data. So is there a performance implication especially on large data? So there are two answers to this. First of all most applications that we built today are not actually stressing databases. So few extra IOs required to fetch the data from another relation is usually not a huge problem for most in most situations. However there can be certain situations like the quills module yesterday misbehaving. I am not sure what the performance issue was. I need to I need time to dig deeper which I do not have during this workshop. Maybe I can somebody can take a look at it. But the point to note here is that the cost of doing the join is usually not atrocious. Yes there is a price but it is usually not a killer. So in the first cut you would probably not worry about performance you normalize in order to avoid redundancy. After all this supposing you find that there is a performance penalty somewhere and your system really needs to improve performance what you do. And a partial solution is that you create denormalized relations. I will talk about it at the end of today's session. I have a couple of slides on that. So do not worry about the performance penalty at this point. Last question from National College. Go ahead. Sir is it possible for a trigger to update tables on remote servers? The question is can a trigger update tables on remote servers? This is a very database specific question. So some databases do not even allow updates on remote tables. But if it did can a trigger do that? It would usually be a bad idea because triggers are executed from some other update and they are often executed as part of that update. So here is an update which you thought was very cheap but because somebody defined a trigger on it it turned into a remote update and a remote update is expensive because you have to communicate with a remote database. You have to ensure atomicity which means you have to use distributed commit protocol. So all that has big penalty. So it is usually a bad idea to put something very expensive within a trigger because some unfortunate soul gets totally chewed up because you added a trigger. So my guess is most databases would not support it. They would not allow something which can cause havoc but I do not know maybe some databases do support it. But if your goal is to do something like this there is an alternative mechanism and this idea is that you use what are called persistent messages. What does this mean? The trigger will add a row in some table which records what has to be done at the remote site. Then you have a separate program which is running maybe continuously maybe periodically every 5 minutes whatever. It checks this other table to see what all messages have been inserted and it will take that message and perform an update on the remote center and once the update is performed it will delete the message from this relation. So this is called persistent messaging. Let me use the white board. So this is unrelated to normalization but since you ask the question. So the basic idea is for any remote action you simply log in a local relation you will add a row. It is called a message which records what has to be done and there is a separate process which is continuously reading this relation taking the things there and executing them at remote site. And there are tools which support this out of the box. You do not have to develop this on your own. So there are many things. There is MQ from IBM. There is an open source. There is a Java version of this. I forget what. So there are tools for this also. Anyway let us get back to normalization now. So let us take a question from some other center. MGM college Nandith please go ahead. Hello. MGM college Nandith. Sir can we have a relation between 2 week entities and do we need to have a discriminator for that relation? So yesterday's discussion we had if you recall we said that in the modal thing course is a strong entity. Course is a strong entity. Then we said assignment is a weak entity set which is dependent on course. And then we said that a submission in turn is a weak entity set which is dependent on assignment. In fact we even had a version of it where submission was dependent on the user also. The user who submitted it. So we have actually 2 different cases both which were shown in yesterday's example. The first was that you can have a weak entity set which is identified by another weak entity assignment. So the primary key for assignment would be its discriminator along with the primary key of course. Now the primary key of submission forget the user. Let us say it was not identified by user it is only identified by assignment. Then the primary key of assignment of submission would be its discriminator plus the primary key of assignment which itself considers the primary key of course discriminator of assignment discriminator of submission. So all these keys are combined to get its primary key. And we had a variant where submission is also dependent on user. So it is identifying relationship from submission to assignment and submission to user. So it is a double level thing and then you it is in this particular case it did not even need a discriminant. So the primary key of assignment plus the primary key of user would become the primary key of submission. So yes all these things are possible. They are not discussed in detail in the book but it can happen. Let us move back to normalization. Please no more questions other than normalization because I want to finish this topic today. One of the questions is is there any tool to find functional dependencies? That is an interesting question. Functional dependencies are supposed to be determined by the application that you are modeling. It is not supposed to be determined from the data. You should study your application and if the application domain says that department has only one building that is a functional dependency. You cannot just look at data and determine it. However that said there are many people who look at large volumes of data and want to look for approximate functional dependencies. Things which more or less hold in real data. If you want to say that you know most people who satisfy these conditions also all have this common thing. So there has been work on finding such approximate functional dependencies on real data. It is actually related to data mining but that is unrelated to the database normalization process. Another question is actually many people have been asking related questions. One of the questions is on what basis do we say for a particular application that x normal form is enough? Which normal form industry supports? And many other people have asked similar questions. That can be answered as follows. First of all first normal form is pretty much standard although databases today including PostgreSQL or Oracle and so on allow certain kinds of structured attributes. Oracle supports arrays and sets which is not even first normal form. Forget all the other normal forms. So the databases allow you to do whatever you want. The question is what do people use in practice? So first normal form is generally a minimum which people do use. Beyond the second normal form is society is not really relevant. It is too weak. So what people would use would be one of 3NF or BCNF and then if you also include multi-valued dependencies instead of BCNF 4NF is the extension of BCNF to multi-valued attributes. So what people do in practice is BCNF or 3NF is ensured and then multi-valued dependencies are used for some further decomposition. So 4NF is a target although the inference rules for multi-valued dependency are a little more complex. So I do not know if people use it in practice but at least informally those ideas are used. Formally probably 3NF or BCNF is what is done formally. Now amongst BCNF and 3NF which do you choose? I will come back to this after it is there some slides down and the answer is given what SQL databases support. BCNF is probably good enough however the 3NF decomposition algorithm does tend to give you good schemas which preserve dependencies. So maybe it is a good idea to use that algorithm and then in certain cases we may decompose further to bring it into BCNF even if it does not satisfy 3NF. If there is something which does not satisfy BCNF even though it satisfies 3NF we may decompose it further to bring it to BCNF. Let us see what are the questions. Is it possible that a relation has anomalies and it is still not possible to apply any normal form? We are going to see that it depends on what normal form you choose but if you just consider functional dependencies you can have a relation we will have an example coming up later which satisfies all the BCNF 3NF trivially but has redundancy and that is where 4NF comes in. Now people then said hey 4NF based on multivariate dependency is great but here are some more complex situation where there is redundancy and that led to the notion of what is called joint dependency then people said hey joint dependency is still not enough there is some more examples of redundancy and that led to something called domain key normal form and so forth. But practically speaking this became too obstruous for people to handle so in industry people as far as I know stop with multivariate dependency. Let us get back to the slides here this is where we were just before the break. So here is a quiz for attribute closure this is actually a very simple quiz so let me start it now just go over the solution here attribute closure started with A plus starts with A because B is there we get AB and because now B is part of AB we can also add CD but now can we add anything more the only other dependency is B goes to F E is not in the closure so we cannot use that one so the attribute closure is ABCD. Okay so now what do we do with attribute closure there are many uses for attribute closure first of all we already saw how to use it to test for super key to check if alpha is super key compute alpha plus check if it contains all the attributes we can also use it to test functional dependency supposing we want to check if a functional dependency alpha determines beta holds it is very simple just compute alpha plus which we saw how to compute it is actually very very cheap to compute alpha it is a very efficient algorithm F plus can be very large but and take a long time to compute but alpha plus can be computed extremely fast so all we do is compute alpha plus and check if beta is a subset of or equal to alpha plus if it is then we know this dependency holds very simple now we can also use attribute closure to compute F plus how do we do this well basically we take every subset of R and for every such subset we compute gamma plus the closure of that the attribute closure and then for every S subset of or equal to gamma plus we have output of functional dependency gamma goes to S that is the closure of this because if you compute all of F plus there is a huge amount of redundancy so we really do not want any algorithms in practice which computes all of F plus very expensive lot of unnecessary stuff there which will not be of any use at all but if you wish to compute F plus this is how you could do it now that said many of our algorithm say that check something in F plus so it is actually seems like you have to compute F plus before you can check if something is in BCNF or 3NF and so forth as it turns out you do not really have to do that there are some shortcuts complexity wise the shortcuts in some cases are much cheaper in some cases the worst case complexity is high but in practice they are much faster so we will see this and they are all based on attribute closure but before we do that this slide probably should have come a little bit earlier lossless joint decomposition I already explained this using the whiteboard and this was this thing which we said that a decomposition of R into R 1 and R 2 is lossless joint if at least one of these two holds the common attributes determine R 1 or they determine R 2 and this notation is a more formal thing from what I said I simply said R 1 capital R 1 join R 2 but actually it should have been project on R 1 of R join project on R 2 of this is the correct way of viewing it if you decomposing the schema then you decompose the relation to get pi R 1 of R and pi R 2 of R and the join of that should give us back R then it is lossless now note that this is a sufficient condition but it is not a necessary condition because there are situations where something called multivalued dependencies hold no functional dependencies hold but still the decomposition is lossless joint we have another example here a BC is the schema two functional A join determines B B determines C there are two possible decompositions of this which are lossless joint the first is to take AB BC what is the common attribute here B and if you take the schema BC because of this function B determine C we can infer that B determines BC in other words the common attributes is a super key for one of these two scheme R 2 in this case and if you do the other way we decompose into AB and AC that is the other alternative this is also lossless because in this case the common attribute is a and because we have the functional dependency A determines B it is clear that a determines AB in other words is a super key of R 1 in fact you can also infer that a determines C so A also happens to be a super key of R 2 so this one is certainly lossless now here is something interesting so let's look at the last bullet in this slide it says that the second decomposition is not dependency preserving so what does that mean so there are two dependencies A determines B B determines C if you take the first decomposition to AB and BC it is lossless join and furthermore the dependency the functional dependency A goes to B can be checked on R 1 and the functional dependency B goes to C can be checked on R 2 but if you take the other decomposition you can check A determines B on R 1 on R 2 you can check A determines C but it is possible to create a state of the database with R 1 and R 2 where B determines C is violated how can you achieve this well basically you can let me use the white board to show this you have a decomposition into AB and AC and the functional dependencies are A determines B B determines C and the other inferred one is A determines these are the non-trivial ones now with this particular decomposition we can check A determines B we can check A determines C but what about B determines supposing I have values here A 1 B 1 and then I have another thing with A 2 B 1 does this satisfy A determines B yes because the A values are different similarly here I will have A 1 C 1 does this satisfy the functional well this is one tuple here let me add a second tuple A 2 C 2 does this satisfy the functional dependency A determines C yes there is no two tuples the same A value but if I join these back it is lossless join okay so whatever I started with I will get back so what is the joint result A 1 B 1 C 1 A 2 B 1 C 2 this is the joint result this is where I started with I mean you can easily check that this is where I would have started to get this thing and when I join it I will get it back because the common attribute A function determines is a super key for the first relation but now the functional dependency B determines C has been violated I cannot check it on this I cannot check it on the second one either I need to join these two to get this and then I can see that are the functional dependencies violated so the bottom line here is that if the decomposition not dependency preserving whatever I can check locally is not enough to ensure at least one of the functional dependencies here B determines C so coming back it is obvious that we should choose the first one because this one is dependency preserving so given a choice between this decomposition this one we would choose this one now the next question is this in BCNF and it is easy to see that this initial schema has a functional dependency B determines C where B is not a super key this dependency is not trivial and B is not a super key A is a super key but B is not therefore it is not in BCNF and we have to decompose it both of these decompositions are in actually in BCNF but this first one is also dependency preserving the second one is not dependency preserving so what we would like is to get a BCNF decomposition which is dependency preserving later we are going to see this is not always possible so couple of slides down so formally how do we know if something is a particular decomposition dependency preserving I kind of argued and showed you an example to show that it is not decompose dependency preserving but how can you show that it is dependency preserving the answer is as follows at least theoretically it is an expensive test but formally let us say that we compute F plus that is the set of all functional dependencies implied by whatever is given initially F is given we compute F plus now we are decomposing that given schema into R 1 R 2 and so on so let the let F I be the set of dependencies and F plus that include only attributes of R I these are the ones which can be checked efficiently on R I by just looking at R I I can see if this functional dependency is satisfied or not so what we can do now is each of the F I can be checked locally now we take the union of the F I's and then we compute the closure of that what all is implied by that if that closure is equal to F plus then we can be sure that the decomposition is dependency preserving if it is not then something is some dependency cannot be checked locally and we have to do a join in order to check the dependency so how do you check if a particular decomposition is dependency preserving for lack of time I would not get into it you can obviously compute F plus but you would die generating all of F plus there is a cleaner method using attribute closure which is there in the book not there in this set of slide these are abbreviated but if you look at the books textbook website dbbook.com those slides are more complete and this is discussed up for this one is the same example we saw just a little bit back this particular decomposition is BCNF last let us join dependency preserving this is a good one become into ABB now let us come back to how to check for BCNF now this is something which is a little tricky initially if you are given a relation and a set of one relation and a set of functional dependencies on that relation there is actually a simplified test for BCNF so what is the basic test for BCNF we should check every functional dependency in F plus but F plus is huge I do not want to compute F plus so the first question is if you are given an initial schema are and an any a set of functional dependencies F on that schema then it suffices to check only the relation the functional dependencies in F for violation of BCNF what I mean by check these for violation for each such functional dependency we are going to check if it is either trivial or it is a the left hand side is a super key we are only going to apply this test for things in F we are not going to do it for any other things in F plus only in F and it turns out for the initial schema this suffices if none of the dependencies in F causes a violation then none of the dependencies in F plus will cause a violation either so it is enough to check this but then one might think that okay now let us decompose the relation and again apply the same check on the decomposed relations and this example shows you that that does not actually work let us say F is A goes to B BC goes to D that is the dependencies are given R is this one A B C D E now it is clear that the very first function A determines B shows that this violates BCNF why it is not trivial is A a super key no what is A plus it is just A B it is not a super key therefore we must decompose and we decompose into A B and A C D E okay I hope you understood this I will repeat this the basic idea of getting something into BCNF is to find a functional dependency that shows violation of BCNF and then decompose using that functional dependency now decomposing using a functional dependency means what if you decompose with A B A A determines B one of the relations contains all the attributes of the functional dependency the other relation contains the left hand side and all the remaining attributes of the original relation so in this case A and then the remaining attributes which is C D E those are the two things now the next step is supposing we used the same F and checked R1 and R2 for R1 there is something interesting here the interesting point to note is that any binary relation is automatically in BCNF okay let me put this on the whiteboard to retrace this point why because if it is a binary relation only two attributes let us say A and B the only possible non-trivial dependencies on this are A functionary determines B or B functionary determines A these are the only two possible things now if you take this dependency A plus is clearly A B which means A is a super key if you had this one then similarly B would be a super key therefore either of these dependencies would not violate BCNF therefore any binary relation at all is automatically in BCNF we do not have to bother checking it further it is guaranteed okay so coming back A B is already in BCNF we would not bother how about A C D E is it in BCNF if you check the original set of functional dependencies A goes to B B is not even here so we cannot it cannot violate anything how about BC goes to D B is not even an attribute here so BC goes to D cannot show violation of BCNF here so you might think that R2 is in BCNF but it is actually wrong it is not and the reason is from these two A B goes to D and sorry A goes to B and BC goes to D from these two we can infer one more functional dependency which is AC determines D okay if you think about it from A B A determines B I can infer AC goes to BC and using AC goes to AC goes to BC BC goes to D I can infer AC goes to D very easy or alternatively if I start with R2 I take A plus using this dip actually let me come back to that method later I'll have it in a few slides so the bottom line is AC goes to D can be inferred and if you take AC goes to D and check this one is AC a super key no AC does not function determine E therefore AC is not a super key at the same time it's not trivial therefore R2 is not in BCNF we have to decompose it further what will we decompose R2 into AC D and AC E so let me just do that on the white board AC DE was the schema and the functional dependency we had inferred was AC functionally determines D now if we use this to decompose we get AC D as one of the relations and the other relation has AC and the remaining attributes other than D which is in this case E so this is the decomposition now is this in BCNF it's not binary so there may be a violation how do we check that we have to check if there is any dependency in F plus which shows this violation now again that is difficult to compute F plus so there is a nice alternative which is coming up in a slide or two let's go back to the slides so here is the simplified test so what we can do is one of two alternatives the formal alternative is you test RI each RI with respect to the restriction of F to RI that is Fi you know we saw that in the previous first compute F plus then take the subset which only contains attributes from RI what is RI by the way we have R we have decomposed into R1 R2 R3 up to Rn so RI is one of the decomposed relations so either we can do this it's too expensive or this is a in terms of you know algorithmic complexity this is the same but in terms of human difficulty it's much simpler what we do is we don't compute F plus but instead take every set of attributes alpha subset of R and check that alpha plus how do we compute closure of alpha plus with respect to the original functional dependency we are not computing F plus we are given F we compute closure of alpha under those things and we will make sure that alpha plus either includes no attributes of RI minus alpha that is it only includes attributes of alpha and maybe some other attributes not even in RI those don't matter but if it includes any attribute of RI other than alpha it includes all attributes of now why is this useful if this condition is violated by some alpha goes to beta in F plus it should be this shouldn't be F it should be if it's violated by something in F plus so then we can show that alpha determines this thing that that is what what do we have here it has beta alpha determines beta alpha is not a super key so it since is not a super key alpha plus minus alpha will have some attribute intersection RI okay this is this is the functional dependency which we can infer okay so what we have done is we have taken subset alpha we have verified that alpha plus is not RI therefore you know it's not a super key now what we can say is take this thing and it's a dependency which we can infer what is the dependency alpha plus minus alpha intersection RI this set of attributes you know we can show that alpha function determines this set and this one will show violation of BCNF because this will be non-trivial and alpha is not a super key and we can use this thing to decompose RI so that's a basic intuition so it will be a little more clear if we put it in practice given the same example before A determines B BC determines B and we have broken it so far into R1 and R2 on R2 we have to check various attribute pluses ideal we should check what all do we check every subset what are the subset AC DE AC CD DE EA AC DE CD and so forth we have to check all these combinations actually it turns out we don't have to check combinations of three attributes because any functional dependency involving them will become a super key so that doesn't have to be checked so only have to check AC DE individually and pairs of these but as it turns out the very first one we try A plus what is A plus it includes B and this A goes to the BC goes to the what we want so it's not ABC it's just AB C is not even in here or we should take AC plus but in this case just do A plus that's good enough so this is a mistake in this slide so if you have AC A plus is AB and sorry but that by itself is what does it show a B is not even here that by itself is no use so what we should have used is AC plus that is the one which we should have used AC plus is the one which actually shows violation so I said we have to try all combinations AC plus closure what does it give us AC plus will add B and because of BC it will add B AC plus is ABCD now since AC plus is ABCD in this case B is not involved here so if you take the intersection with RI that is ACDE what we get is AC determines AC goes to B is the one which we had determined so given the original dependencies A goes to B BC goes to B if I am not mistaken okay so those are the two dependencies so applying AC A plus is no use it doesn't show anything C plus is no use there is nothing in there in this case A plus is just AB which is no use for this one similarly D plus is no use E plus is no use so all the single attribute closure don't show anything now if you take AC plus what do we get as I said we get ABCD and from it if you intersect with RI RI is this one okay R2 is this one so if you intersect with that what we get is AC goes to D we intersect and we also remove the attributes in the left hand side AC so what is left is AC goes to D is the functional dependency we infer but bottom line we don't actually have to infer that initially what we do is we compute AC plus and it includes ABCD which has an attribute other than AC I mean B can be eliminated because it's not in RI forget about B among the attributes in RI it has an attribute D which is not in AC and at the same time there is an attribute E which is missing so AC plus is not a super AC is not a super key for R2 and we infer that BCN is violated and to know which functional dependency violates it the slide showed how to infer this in our example we can infer AC goes to D is the functional dependency that violates BCN coming back here assume that this slide is corrected and this is how we this particular line here is how we inferred AC goes to D is the one which is violating BCN this computation here okay so that now brings us to the overall BCN of decomposition algorithm we have shown how to check for violations now what we do is start with the initial schema being just containing only one relation R which is all the we just given one relation the initial result contains that schema the decomposition algorithm as shown here computes F plus but as I said we don't have to do that there are shortcuts now what we say is if there is a schema RI that is not in BCN initially for R we can just check the functional dependency in F if one of them shows violation of BCNF we decompose if none of them shows violation of decomposition of BCNF we are done if one of them shows violation we break it into we take result remove the set RI result is a set of schema we remove RI from that set of schema and we add back two different schemas one of the schemas is alpha beta that is this dependency the other schema is RI minus beta okay there is an extra condition alpha intersection beta is empty set that you know removes the need in one of the earlier slides we had said alpha not this one somewhere earlier we had worried about things which are common on both sides earlier decomposition thing worried about attributes which are on both sides of functional dependency here we are simplifying our life by ignoring functional dependency which have an attribute on both sides we can simplify those by removing that attribute and only consider these dependencies so then we add alpha beta and RI minus beta okay so this decomposition is lossless it's very easy to show the common attribute is alpha and it is a super key of this relation so it's easy to see it's lossless and if at the end of this loop nothing changes it's very easy to show that this thing is in BCN there are no more functional dependencies that show violation and we are done so of course how the question is instead of computing F plus and then looking for alpha goes to beta and F plus we take a schema RI and do attribute closure on each subset of its attribute and check this that's a easy way of doing it so here is the example the same example as before and here we see that B determined C shows that it's not in BCN because B is not a super key so we decompose into BC and the remaining is AB so this particular decomposition is now in BCN everything is binary and we are done there is another alternative we could have even well in this case we could have decomposed it on A goes to B and turned it into AB AC that is not required from the viewpoint of BCNF decomposition because A is a super key so this is not violating BCNF but if we chose to decompose in that way we would get a different decomposition which is in BCNF in the end okay so here is a quiz question given this relation and the functional dependency A goes to CD the BCNF decomposition is which of these it's a simple question so this is actually a very easy question so if you take A goes to CD obviously one of the relations has to be ACD the other relation basically removes the attribute CD from the original relation which gives us AB so the answer is AB and ACD okay people have been asking for examples of BCNF we did a toy example people ask for a live example I think this is what they meant by a live example using you know things which are identifiable meaningful relations so here is a relation class which is kind of artificially created by joining together many of our tables but if we started with a universal relation maybe we would land up with this situation at some point what all is in there course ID title department name credits these are the ones from course and then section ID semester year building room number all of these come from section okay so section has course ID section ID semester year building and room number and time slot ID and in addition building and room number have a capacity which comes from the classroom relation so this is really a join of course section and classroom three relations which have been joined together and we want to see if this is in BCNF and if not do the decomposition so what are the functional dependencies which we have here in our model course ID functionally determines title department name and credits this is fairly reasonable ignoring temporal aspects because across several years the same course ID may be reused but at a point in time course ID does determine the title of the course department and credits now room numbers are unique only within a building so building comma room number identifies a room uniquely which means the capacity room cannot have two capacities so building comma room number determines capacity the last one says a particular section must meet in only one room at one time slot so the section is identified by course ID section ID semester year it must meet in only one classroom therefore building room number are determined and moreover it must meet only in one time slot so time slot ID is determined so these are the functional dependencies we have so if we had not started from the decomposed function we started from this and our understanding of the domain we would have come up with these functional dependencies now we see if these functional dependencies are satisfied sorry cost violation of BCNF and then decompose so this it turns out does have a candidate key which is course ID section ID semester year because section uniquely identifies course and also uniquely identifies classroom so the key of section is actually a super key of this joint relation but there are other functional dependencies which violate BCNF in particular once we have course ID we know that all the details of the course are functionally determined so we have this one the first functional dependency where course ID is not a super key so if there are two sections of the same course all of these are going to get repeated that is the intuition therefore we decompose using this functional dependency to get what course with course ID title department name credits that is our usual course relation and the remaining ones after deleting title department name credits if you remove these three what is left course ID section ID and all of these that is let us call it class 1 is this in BCNF we will have to check let us look here building comma room number determines capacity now that is functional dependency which holds here but is building room number a super key of this no it is not therefore we need to decompose so using this we replace class 1 by classroom which is building room number capacity this is our original classroom relation and what is left if you remove a capacity from here remove capacity from here what is left is the four things which are super key of section along with building room number and time slot ID that is exactly this that is the section relation so what we have got now is a decomposed relation which is in BCNF now how are you sure well we do actually have to apply the test further here on this relation is there something which violates it as it turns out you know you can verify that there is nothing more and we are done so that is a good example of BCNF decomposition on a realistic example so now let us see what so BCNF is very good there is one minor drawback though which is that sometimes you cannot get a dependency preserving BCNF decomposition so here is an example there is three attributes JKL I will give you a more realistic meaning for this but let us keep it simple here to understand what is going on let us say there are two functional dependencies JK determines L L determines K that is what we are given is R in BCNF it is not if you see L determines K L is not a super key this is not in BCNF so if we decomposes what do we get LK is one relation LJ is the other now that is the only decomposition possible here so it is clear that any decomposition of R will not preserve JK determines L you can easily check that the decomposed relations are binary and the dependency is there will not let us infer JK determines L so this will not be preserved by the decomposition so testing it will require a join in other words so what this example shows is there are relations which do not have any dependency preserving decomposition into BCNF so what do we do we can say that dependency preservation is not so important forget about it let us stick with BCNF that is a perfectly valid thing and we are most welcome to do it but some people might say hey dependency preservation is important because if we do not if we cannot check it efficiently we might land up with an inconsistent database so then what do we do so the answer is if dependency preservation is more important than the potential for duplication then keep this schema unchanged JK L as is now you can check this dependency there is a violation of BCNF but on this relation perhaps you can have some other way of ensuring the functional dependency L goes to K so you can check you can make sure that there is no duplicate so if L occurs twice it will be the same K value assuming you have a mechanism in the database of ensuring functional dependencies keeping it as JK L and enforcing both these functional dependencies will keep it consistent it is redundant but it will be consistent so that was the intuition and going from this example more generally the question was can you have some beaconing of BCNF which allows dependency preservation and this led to another weaker normal form called the third normal form and what is third normal form do it allows some redundancy but it ensures that we can always get a dependency preserving decomposition into third normal form now there is some interesting history to third normal form third normal form was defined by chord and somebody else in a somewhat different way the connection to BCNF was not that clear although BCNF itself was also by chord boys chord normal form all of this was done by chord but later people said hey maybe you should interpret 3NF slightly differently define it slightly differently which is a little more clear and that's the version that we are using so our version of the 3NF definition takes the BCNF definition and adds one more condition there saying if both of these are violated as long as at least the third one is condition is satisfied it's okay and what is the third condition that for this functional dependency okay let me start from the beginning the schemas in third NF if for all alpha determines beta NF plus at least one of these holes either it's trivial or it's a super key so far it's BCNF or every each attribute in beta minus alpha so is contained in a candidate key for R and again I will stress that each one may be in a different candidate they don't all have to be in the same candidate so it's clear that if a relation is BC is in BCNF it is automatically in 3NF because one of these two conditions is satisfied we don't even have to test the third condition however something may be in third normal form but not in BCNF because it may fail these two but satisfy the third one so the third condition is in some sense a minimal relaxation of BCNF which ensures that we can get a dependency preserving decomposition so let's take example which is a little more realistic actually kind of similar to the earlier example but if we have just given names we have department advisor now not just advisor a department advisor so for each student a student may be associated with two departments why maybe there's a double major that's common in the US in IIT Bombay we have a minor there is no advisor for a minor but that might actually be a good idea some day we may introduce that so a student in computer science who minors in let's say mathematics could have one advisor in the CS department and one advisor in the match department what we want is to ensure that student cannot have two advisors in the department so what we will say is that student ID and department name function determines the instructor ID so a student cannot have two advisors in the same department but at the same time we know we you know we know that the instructor must be in a department instructor cannot be in two departments so we also have this functional dependency IID goes to instructor ID goes to department name now if you see this one SID IID department name and go back to the previous slide not this one one more back we said Jkl Jkl determines L, L determines K if you see it's actually almost isomorphic to this Jkl determines L and L determines K that's the same structure it's identical to the other one and if you see here R is not in BCNF we already know that this dependency IID determines department name shows that it violates BCNF why because IID is not a super key there may be many tuples in here with the same instructor it's not a super key but now the claim is that R is in 3NF how do we check this we are going to check the functional dependencies so here what do we have we have SID department name this one SID department name determines IID give us some slight mix here this line break has come at the wrong place what should have happened is SID here should have been on the next line it should have been SID comma department name is a super key so please note this correction I'll update it in the main slide SID comma department name is a super key therefore it satisfies the second condition of BCNF so this key is okay this functional dependency is okay how about this functional dependency IID goes to department name it certainly violates BCNF the question is does it satisfy the third condition of 3NF what are the third condition each attribute a in beta minus alpha that is each attribute on the right hand side must be in a candidate key so going back here what is the right hand side for this one department name now is department name in a candidate key yes SID comma department is a candidate key why you know first it's a super key right it's clear that SID comma department name is a super key trivial is it a candidate supposing we drop department is SID super key it's not because for different departments the student may have different advisers so there may be two rows department by itself is also obviously not a super key so SID comma department name is a candidate key it is minimal and the right hand side of this dependency the only attribute here is present in the candidate key therefore this last condition of 3NF is satisfied so the argument is that this guy is fine it's in 3NF we are done as we saw there is redundancy things can get replicated in fact there is another problem which is a little more complex supposing we wanted to we use this schema and we also want to track which department each instructor is in what are the alternatives one alternative is to use this schema itself for instructor and if a instructor does not have an advisor we put null for the student ID that is particular instructor is not an advisor there is no student that is advising so we are in order to store that information in this table we will have to have a null value for student ID we can have the instructor and the department here but null for student ID and if the same instructor is advisor for many different students that instructor will appear many times and every time it has to appear with the same department that is redundancy so clearly there are two problems there is redundancy and then some information cannot be represented without null value it is also a little weird to not store instructor department links separately but to merge it into instructor student ID there is actually a third alternative which causes another set of problems let me use the whiteboard so what did we have we had jkl or in our terminology student ID iid instructor ID and department and we said we will not decompose this this is in 3 and n so now we are unable to record which department and instructor is in without putting it in this table it is unnatural so it is a okay let this table be there let us also have an instructor ID comma department name as a separate table so for every instructor we will store the department name in this table and for every student department pair we will store the instructor ID in this table by the way this is eminently possible from an ER perspective so for every instructor there is a department name it is there in our schema now we may also have student advisor department combination so earlier advisor was a relationship between student and instructor now if we did the ER diagram let us say student instructor department and advisor would be a ternary relationship between so this one is called department advisor I will call it da short for department advisor is a ternary relationship between a student instructor and department so we have two separate things instructor and department there is also another relationship between these two okay so now what is the schema which we land up with if we turn it to relations we have this one corresponding to the instructor relationship department sorry instructor department relationship I am flipping it and this one which is the department advisor between student instructor and department is this one so let us call this instructor department ID is this one and department advisor is that one so we have two tables so we did a perfectly nice ER design but here we are in a situation where it is not in BCNF okay so some people have been asking me if you do something with your modeling is it guaranteed to be in a good normal form and we can do nothing further but here we did go through all the steps but we still have something which is not really in even in BCNF so if you decompose it what you do if you want to decompose this you will land up with a relation SID IID so these two relations are actually enough from the viewpoint of representing which all people are advisors of the student and the department can be inferred from the other chain so instead of associating with the department here you can infer it through this indirect link so yeah so the bottom line here is we did a perfectly reasonable ER design and landed up with something which violated BCNF and we could decompose it and store these two relations here which are instructor department student instructor now on with this schema it's very difficult to enforce that a student has at most one instructor per department in order to do that we would have to do a join of these two and then check the functional dependency so one option is to avoid redundancy and do this the other option is to keep these two relations the first two relations here we run a different risk here we run the risk that the instructor in the first case in this relation an instructor may be listed as being in computer science but in this other relation the same instructor may be listed as EE there is no check across relations within each relation you can ensure the functional dependency but there is no way to check this across relations so this also is a problem instantly this kind of decomposition you know may it may appear you won't land up with using BCNF decomposition but if you did 3NF synthesis algorithm it's actually possible to land up with even with BCNF I think there are some cases where you can land up with replication across relations the same attribute is there in more than one relation and there is a problem because of that so you have redundancy across relations so what is the model of this story it's all very very confusing should you do 3NF should you do BCNF it's confusing so I would say you know don't lose sleep over it in particular one of the reasons not to lose sleep is thanks to the SQL standard we can supposedly take this schema and verify L goes to K on this efficiently so the assumption is that it's efficient because there is no join we can easily check that L determines K on this schema in fact SQL offers no such support SQL does not have any notion of functional dependency as an integrity constraint the only things it supports are primary key and unique which are both super key constraints so SQL offers no way of checking that L functionally determines K so even if you did 3NF and kept this within this relation itself there is no way to enforce the functional dependency L determines K so after all that effort you know you go to SQL and SQL says I won't enforce it for you so what are the point of dependency preservation when you can't even enforce it and if you kept this other design where we had separate relation with instructor ID department name then there is repetition across relations which also can lead to problems so bottom line maybe BCNF is good enough given that SQL anyway doesn't support dependence functional dependency checking properly other than super key