 Welcome back everyone. I hope all of you have had enough time to work out the normalization tutorials and I hope you did not have any problems with it. But just in case you did, I am going to work out some of the problems here and the solutions for the rest will right now hidden, but I will make it visible. I made the solutions visible for those of you who wish to read it. But what I am going to do is go over the solutions. First I will take a quick look at the problem and then I will switch over to the solution. So the first question is to find candidate keys and attribute closures and this I think is relatively easy. Most of you actually got it right in the morning, but just for completeness let us do maybe the very first one. A, B determines C, D determines C, B determines C and in all cases we want to find attribute closure of A, B and the candidates key for the schema where the schema is A, B, C, D, E. So the solution is shown here. So first of all the closure of A, B is pretty easy. These are the okay. So we have these functional dependencies and we want to find the closure of A, B. So the algorithm is extremely straightforward. We start with the result as A, B and given the functional dependency A, B, C we add C to the set and then since B is part of the set and the functional dependency B, E is present we add E to the set. Now D is not determined by anything else. So at this point we cannot add anymore and we stop. A, B, C is the closure. Now what about the candidate key? We can start with A, B and its closure is not the whole relation. So question is what all got left out and in this case the attributes are A, B, C, D, E, D got left out. So anything which is left out from here should be considered for adding and forming a candidate key. In this particular case D is not on the right side of any functional dependency. So you have to make D a part of any candidate key. So this is a shortcut which you can use. If it is not on the right side of any functional dependency it had better be part of any candidate key. So D has to be added. So we add A, B, D. Now if you take the closure with A, B, D it is clearly A, B, C, D, E which is all five attributes and therefore A, B, D is a super key. But before we can say it is a candidate key we have to verify that it is minimal. We have already seen if we drop E it is no longer a super key. What if we drop B? In that case the closure of A is A itself nothing more. If we drop A, so the closure of B, E will be just sorry B, D will be B, D, E. A and C will not appear. So it is easy to see that it is minimal. Essentially we drop one attribute at a time and see if the result is still a super key. None of the things after dropping an attribute is. So it is a candidate. So we can do the same thing for the other cases. Now let us move to normalization to BCNF. So question is for each of the above set of functional dependencies we have to decompose it into BCNF. Again I am just going to do it for one of these. We will take the very first one. A, B determines C, D determines E and B determines C. The same one. We want to decompose it into BCNF. So the schema is A, B, C, D. So what are the steps we take? First of all we have to check if which all of these functional dependencies is a witness to violation of BCNF. And I think it is easy to see that A, B goes to C violates BCNF because A, B is not a super key. D goes to E violates BCNF also for the same reason. And D goes to E also violates BCNF. So all the three dependencies cause violation of BCNF. So we can decompose R using any one of these to start with. In fact the decomposition to BCNF is not unique. There are many possible decompositions. We already saw a case where there were two decompositions to BCNF. And here there are in fact even more. So we are going to take each starting point and then see where that leads us. So the first decomposition let us start with A, B determines C. If we decompose using A, B determines C, the first step is to create a relation A, B, C that is added there. And the remaining relation consists of A, B and the other attributes other than C which is A, B, D, E. So we get what is shown here A, B, C and A, B, D, E. Now is A, B, D, E in BCNF? We have the following functional dependencies. B goes to E, B goes to E, A, B goes to C. Now A, B determines C is not relevant for this one but B goes to E and B goes to E are both relevant here and directly those are both evidence that this is not in BCNF. So we pick one of them. Let us say we pick B determines E and decompose. So we add B, E to the result and what is left is B and the remaining attributes after dropping E. So that is A, B, D. So what we get is in case you missed the last part of what I said, we first decompose using A, B determines C and the two functional dependencies we got these two relations A, B, C and A, B, D, E and these two functional dependencies B determines E and D determines E both show that A, B, D, E is not in BCNF. So we can choose either one to decompose on first. If we decompose on B determines E what do we get? We get B, E as one of the relation and A, B, D, E minus the right hand side which is E which is A, B, D. So A, B, D, E has been broken into B, E and A, B, D. So the set we have got so far is A, B, C, B, E and A, B, D. Now of this is A, B, C in BCNF we have to check for subsets. Now it turns out in this case for any relation with three attributes if you take a subset of size 2 if at all it is involved in a functional dependency which is non-trivial the right hand side will be the third attribute. So it will automatically be a super key. So there is no point even checking any subset of size n minus 1 if there are n attributes. So that leaves us to check individually A, B and C. So if you take A, B and C individually on the first relation A, B, C using attribute closure using all of these. If you take B attribute closure it includes E, if you take attribute closure of A it includes only A and attribute closure of C is only C. So attribute closure on each subset of A, B, C does not give any evidence of violation in BCNF. So we conclude that A, B, C is in BCNF. Similarly, we have to apply the same test to A, B, D. Now what do we get? A, B, D if you take all subsets of size 1 the same thing happens again B goes to E and E is not in this schema. The closure of B is just E, there is nothing more. The closure of D is just E and again E is not in this schema and the closure of A is A itself. So that one is trivial. So the net conclusion is that A, B, D does not violate BCNF. So we are done at this point. So the decomposition we get is this one. But like I said the exact set of things we get depends on the order in which we do these things. So now supposing we chose a different thing in the last step. Say the first step is the same we decompose on A, B, C but in the last step we use DE instead of BE. Then what happens? If you look at this step if you decompose on DE here in this step instead of BE if you choose DE you will get DE and on this side what are the things you get? We took A, B, DE decompose it on DE. So what we get is what do we get here? We have to remove E in this case also which as it turns out is the same as what we got in the previous step. So the only difference is we have got A, B, D again but instead of BE we got BE. That is the only difference and we have already verified that A, B, D is in BCNF. So that is it. We can stop at this point. A, B, C is also verified to be in BCNF. Now there are further possibilities. In the first step instead of A, B, C we could have chosen either of these BE or BE. So supposing in the very first step we chose DE. So what do we get? We get DE and the remainder of the attributes except for E. So that is A, B, C, D. Now A, B, C, D itself can be decomposed again using what? We can decompose it using A determines BC because A, B is not a key. We have seen that A, B is not a A, B closure does not A, B closure includes E but not D. So we can verify that for the relation A, B, C, D A, B determines C is non-trivial and A, B is not a super key. So we can decompose. We have to decompose. So what happens when we decompose this using A, B, C? The first relation is A, B, C. The second relation is the same thing minus C which is ABD. So what did we get in the end of this? We got DE and A, B, C and ABD. This is the same as the one we got in the previous step and lastly we had the third B determines C to decompose from. So if we decompose using that first we get BE, A, B, C, D and then this if we decompose again we get BE, A, B, C and ABD which is the second one. So this is the second possibility. So these are the only two decompositions that we have got using this sequence of things. However that does not actually mean this is the only possibilities. We could have maybe inferred some other dependencies and then use those to do decomposition. In this particular case there is not much else to infer from here which would have helped us decompose. But in general there are many more dependencies which one can infer and we can start decomposition using any of them. So there are many, many possibilities for the final result. So note that after we decompose once we always did the attribute closure on all subsets which ensures that we would not miss anything. If we had used just the given set of dependencies without doing this attribute closure on every subset we might miss some functional dependency which shows violation. So similarly we have for the other two sets of functional dependent. Both the cases we have two alternative decompositions. Lastly we have canonical cover again for the same set of dependencies. The goal is to find the canonical cover by eliminating extraneous attributes and then decompose this using sorry here the set of dependencies is different. The dependencies we have seen so far actually do not have any extraneous attributes I think. So we chose two other sets of dependencies to look for extraneous attributes and they are the a determines c d, b determines b and c determines b this is the first one and then there is one more similar but a little bigger. So what we need to do is find the canonical cover and then decompose it into 3 n f based on the canonical cover. So let us go down and see the solution to it. So first of all to find the canonical cover we have to check for each of these attributes to see if they are extraneous. How do we do that? So here are the dependencies. So first let us check if c is extraneous in a determines c d can supposing we dropped it from here what is left is a determines b and then the remaining ones. From that can we infer that a determines c we cannot. If you do a closure on the remaining attributes a closure is just a there is nothing more. On the other hand if we dropped d from here we will check if it is extraneous. So how do we do it? We drop d and we use the things which are left behind which is a determines c and then these two. So remember this procedure is only for the right hand side. For dropping something on the right hand side the procedure is first we drop the attribute. Take the resultant set of functional dependencies and apply attribute closure on the left hand side in this case a and see if the attribute closure includes the dropped attribute. So we have dropped d what is left is a c and then these two. What is the a closure here? a plus will include c in right away because of this one a determines c and in the next round because c determines b it will be a c d. So what we have done is we have derived a c d which includes the attribute we just dropped namely d and therefore d is extraneous. Now what about the other attributes b goes to d e if you drop d from here we are not going to derive b determines b there is nothing else involving b similarly dropping e we cannot derive b goes to e again. So even that is not extraneous and we also have to check left this one c goes to d to drop d can we derive it back? No we cannot there is no other dependency involving c. Similarly we have to check the left hand sides. When we check the left hand side it is a little bit different the test has to be to see if the given set of dependencies these dependencies from this if we do attribute closure can we get the stronger dependency and this particular example is not good for that I will use the next set of dependencies to illustrate that point the next example. So for this the left hand sides there is nothing extraneous. So the canonical cover is this. So what do we do next? For the canonical cover we also need one more step if there are two functional dependencies with the same left hand side we union them here there is nothing to union. So now the 3 n f schema how do we generate it from this? Well we create a relation for each of these a c b d e and c d. We also check if any of these is redundant because it is contained in one of the other things in this case nothing is contained. So we will keep all 3 relations but there is a little bit more to do. We need a candidate key does any of these include a candidate key? Is does this first schema include a candidate key? Let us do a c closure. So how do we check that? We just take the whole relation and do closure on that set of attributes a c plus what does it give us? a c plus will give us a c d and that is it nothing more. So a c d is not a super key. So we just check for super key and then some part of it will be a candidate key. How about b d e plus? If you see b d e gives this and then that is it nothing more. So b d e plus gives only b d e it is not a super key. Finally, if you do c d we do not get anything more. c d e plus is just we can use this but d is already there. So c d plus is just c d nothing more. So none of the relations includes a super key and therefore it does not include a candidate key. Now we have to choose something as a candidate key. So which one shall we choose as a candidate? It should be minimal and it should have everything in its closure. So if you see here a convenient one is a b because a determines c d b determines d e a b plus will be a b c d and then e and that is the entire set of attributes. So a b is the candidate key it is minimal because a plus and b plus are not super keys and we are done. We get this schema. Before I do the next example I would like to take questions and see if people have understood things so far and then I will do the last example of extraneous attributes. Bharti Vidya Peet, if you have a question please go ahead. Sir I have one query that can you explain how to find out the primary key, candidate key and super key using functional dependency? Yeah, can you repeat one? There are possible candidate key, candidate key and super key in example. So how to find out that how to take this candidate key we have to take? Right. So that is what the first example did. So let me go over it once again to clarify. So let us go back to screen share here. The first part was to find to do attribute closure and find candidate key. So as part of finding candidate keys we found super keys. Now how do we do that? So there are many possible super keys and candidate keys. We do not want to waste our time enumerating every single one of them. So there are some tricks which we use but the basic algorithm is to use attribute closure on set of attributes and see if it includes all the attributes of the schema. If it does it is a super key. That is the very first step. Now there are many possible super keys which one do we choose? So a good way is to start from the left hand sides of some of the functional dependencies here. So we do AB plus that includes C. So there is no point adding C to the left hand side because AB plus already includes C and AB plus also includes E. So there is no point adding E to the left hand side. What is left is D. So if there are many more attributes left we can pick attributes which are on the left hand side of some functional dependencies. So we could choose ABD and now it is easy to see that ABD plus includes all the attributes. So attribute closure is the basic step in finding super keys. Now once we have found the super key we want to make sure it is minimal and that as I told you consists of dropping one of the attributes at a time and again doing attribute closure. If after dropping an attribute the closure still includes all the attributes of the schema well then we have got something smaller. Now we try again on this. Again from the smaller subset which we got we again try to drop things. If at this stage dropping an attribute you know whichever attribute we drop it is no longer a super key then we know the super key is minimal and therefore we determine it is a candidate key. Like I said there are many possible super keys there are many possible candidate keys. Enumerating all of them you know exhaustively is kind of tiresome I have not tried to do it. If you wish to do it yes you could automate it if you want try to program but it is a little cumbersome to try all possibilities. What would be any research area in this normalization? So the question is is there any research area in normalization. Let me make a small detour into research in general. Normalization was a fantastic research area in the 1970s. Probably many of the people here were just about born and many were not even born back then. So it was a hot area once upon a time. But for many many years there has been essentially no new research in this area because all the obvious things have been done. There may be some nooks and corners here and there and indeed there are few papers published once in a way but it is not an easy area to do any new work in. So if you want to do research my advice is do not go to areas which have been beaten up many many years ago it is going to be very difficult to do anything new. What you want to do is pick an area which is relatively new and then see what can be done there. And one way to find out what areas are new is to read conference proceedings and journal proceedings at least the good ones. So there are ways to find out what areas are currently of interest. That does not mean those are the only areas worth doing research on. So you need one of two things. Either find an area where clearly the people are actively working so there are probably more areas in there to do something new or find a real problem which people are not looking at and see if you can solve it. Now many times the solution might be relatively easy that is some solution but if you look deeper you will find that the solution has drawbacks and you say but I want this other thing which the solution does not give and then see if you can get a better solution. For example this issue of normalization. So everything in here was is a single schema in normal form. Now way back in 1981 this was in the 70s this area was very hot. In the 80s it started cooling off but still at that point somebody said but it is not just one relation you should think about you should think about a whole set of relations in your schema and see if there is any redundancy if you take the whole set of relations as one unit and that led to a very interesting paper which talked about this issue. But again all these sorts of questions have been asked. So most of the time you know if you do not know the answer if you come up with a question like this it is good to search in the literature. Now how do you do literature search? It is actually very easy these days. Once upon a time you have to go to a library read through a lot of papers. But today you can just go to Google Scholar it is a wonderful resource. There is also a few others there is Microsoft academic and there is sites here and so forth. But Google Scholar does a pretty good job. So go to scholar.google.com search for with some keywords you will find papers on that topic. So this particular issue I knew was an issue I was not aware of what all work had been done on it the set of relations. So just yesterday I did a search and I found some old papers 1980s papers on this topic. So it is easy to find what research is happening. But if you think it is a research area forget it go to some other area. There are lots and lots of questions which have come on chat. In fact there are some good questions. One of the questions is very basic it says differentiate binary relationships from n-ary relationships. Binary simply means it relates to things n-ary means it relates 3 or more. The next question is good question. In the university database why do not we have an ID for department in department table? And the answer is actually very simple. We probably should have put a department ID or department code. But we decided to keep it simple and use the name itself as the ID. So we did not want too many attributes in the relations. Whenever we showed examples the tables became very wide. So we wanted to keep as few attributes as possible. So just like you know many examples use name as an ID. We decided to use the department name as an ID. In fact it is not unreasonable for departments because very it is not never going to happen that two departments have the same name. So it is not a big deal. Can we apply normalization and functional dependencies on a view? That is an interesting question. And the answer is no. That makes no sense. A view is often got by joining data. It is not stored as such. So you absolutely do not want to bother about these things on a view. Unless the goal is to have a check on the underlying thing. So for example if we have a functional dependency if the database system supports a primary key check on a view. We can create a view which is join or projection as appropriate of other relations to get a relation containing exactly the attributes in a particular functional dependency. And now we declare the left hand side as a primary key for the view. And now if there is an update which violates the primary key constraint on the view the database could detect it. Like I said I do not know which databases if any support constraints on views. Theoretically it is possible and interesting but practically I do not know if it supported. But we do not want normalization on this. The sole goal is to check integrity constraints by imposing primary key dependencies on views. We are not going to normalize a view. View is not actually stored as such. It is perhaps a join. It can be redundant. We do not care. That is not how it is stored. Another question says normalize some relation where there are two ideas. So we have a roll number and an aadhar number. Now both of these are perfectly valid as unique something which uniquely determines it. So they are both super keys and they both candidate keys roll number and aadhar number. So what do we need to do to normalize? We do not need to do anything. We only need to decompose if there is a violation of some functional dependency which shows violation of BCNF. In this case if you have an extra attribute which is functionally determined by the first one it does not really matter. I mean they determine each other. So either of them would be a super key. If you apply the BCNF check there where you have let us say roll number, aadhar number and name three attributes. That was the example which was given. Roll number, aadhar number, name. Roll number determines aadhar number and name. Aadhar number determines roll number and name. So if you see the only non-trivial dependencies here are from roll number and from aadhar number and both of those are super keys. So BCNF is not violated. So we do not actually have to decompose it. Another question was about full dependencies, partial dependencies and so on. So we have been using a particular terminology for functional dependencies and to define the normal forms but other textbooks have used other forms. So for example supposing I have AB is a candidate key and then I have another thing C, another attribute C and I have a dependency like B determines C. So what we have here is that C is dependent on a sub part of a candidate key namely B. So 2NF, so this by the way in 2NF terminology this is called a partial dependency. So 2NF says that there should not be any non-trivial partial dependencies. So if you had something like this where even though AB is part of a key you have B determines C then you want to decompose this and what would you decompose this into? You decompose it into AB and BC. So that is a partial dependency. Now 3NF and BCN, 3NF is often defined in terms of what is called a transitive dependency. A transitive dependency in a schema is something of the form A determines B and in turn B determines C. So what we have with a transitive dependency is that maybe A is a super key but B is not. But if you see that these 2 partial and transitive dependencies are both dealt with in the same way in our definition of BCNF. Essentially we say that if you have any dependency the left hand side had better be a super key or it should be trivial. If it is a partial dependency the left hand side is not part of the whole candidate key. Therefore it is not a super key. So this would also violate BCNF using our definition. Similarly if you have a transitive dependency like this B determines C would also violate BCNF. So in our definition we did not have to differentiate between these 2 cases. We kind of merged it into 1. So some of the earlier literature in this area said first worried about partial dependencies and defined 2NF. Then they worried about transitive dependencies and defined 3NF. But other researchers said that look why bother about all these stages. 2NF by itself is totally useless. 3NF is useful. So we will go directly to that. And by the way for 3NF there is no need to distinguish between partial and transitive dependencies. All that we care for is a dependency whose left side is not a super key and that is true in both these cases. And that is exactly what is used in the definition of 3NF and BCNF. Non-trivial dependencies whose left hand side is not a super key and it covers both these cases. So you can find other textbooks which define 3NF in terms of transitive dependencies but it is equivalent. Now if you want to explore this in more detail there is material available on the website of our book. It is not in the textbook itself but actually there is a question in the exercises. So in our book we have exercises in 2 parts. We have practice exercises and then we have the main exercises. Now the practice exercises are actually solved. We have put up the solution on the web. So there is a question which explores the connections between these other definitions which other textbooks use and the version that we use. So in chapter 8 the practice exercises explore this and the solutions are on the book website which is dbbook.com. So if you go to this website and go to solutions to exercises go to chapter 8 you will find the solution. The question of course you will find in the book itself. So do look it up and it will tell you in detail how these two are connected. Let us take some live question now. We have Sree Buddha college Kerala. Please go ahead. We have a question from today's chapter 8. We have a conversation to DCMF. The question was ab determines c, d determines e, d determines e. And we got actually what we got is 4 tablets which are, which contains first one abc, then be, then be and abd. And the answer you give is? So it is a good question but you will have to repeat it so that I will write it down. So I will write it down on the screen and then you can tell me what I have written down is correct. So please start again and tell me I will write down the things which you said. So the question was on which set of dependencies. The one on? The second one was a goes to cd, b goes to de. Is that the one? It is ab determines c, b determines e and b determines c. And the solution for my question is? Okay. You got two solutions ab which contain three relations each. Yes. Abc, de and abd. Correct. Also abc, de and abd. Yes. Those were the two solutions we gave. Correct. Did you find one more? Yeah do tell me what was your question? Okay. So in each of these relations we are missing one function and dependency. If we are choosing the first option we are not able to express the dependency b determines e. Also in the second option if we are choosing we are not able to express b determines e. How is this dependency managed in the solution? Okay. So let me just show you the two things here. What we have are two possible solutions abc, de, abd and abc, de, abd. I hope you can see both of these on your screen. And the question as I understand it is the functional dependency d goes to e is not actually verified by the first schema. It is not preserved. So we do not have dependency preservation. I think that is your question. Is that correct? So if you check the dependencies on the individual relations here, those together are not going to ensure that d determines e. Is that your question? Can you verify that is what you asked? Yeah sure. Thank you. So this is an issue of dependency preservation. Now let us come back here and check if dependencies were preserved. Let us go back to this. So from these are we able to verify d determines e. So let us go to the dependencies here. Yeah so in this case the only functional dependency involving d is d determines e. So if you take attribute closure here d goes to e but e is not part of abd. That is the only place where d appears. So nothing else is there. So the answer is yes it is not dependency preserving. That particular functional dependency cannot be verified from this decomposition unless you perform a join. So we are losing that. The other one correspondingly does it preserve b goes to e. So what are the things we have on b? We have only b determines e. So if you do attribute closure over here you are not going to get anything. So yeah so neither of these will preserve all the dependencies. Each of them is losing one of the dependencies. So like I said you do not always have a dependency preserving decomposition to bcnf. If you wanted to decompose to 3nf on the other hand I think it is easy to verify that none of the attributes here is extraneous. So the 3nf decomposition you will get includes all these three. Each of these would be a relation in the 3nf decomposition. Furthermore none of these is a candidate key. So we would add one more thing. The candidate key here could be abd. So we would have four relations abc, be, be and abd. In fact if you see here we have abc, be, abd. The 3nf decomposition algorithm will simply add be as one more relation there. And now there is something interesting. This is something which I would like you to note. It is important. What happened is when we applied the bcnf decomposition algorithm the final result was in bcnf but it was not dependency preserving. On the other hand if you use the 3nf decomposition algorithm we got another decomposition which in this case also happened to be in bcnf. It is in 3nf but in this particular case all the things which you get out of it are also in bcnf. So the 3nf algorithm can very well be used as a first step in getting a dependency preserving decomposition to bcnf. So the first step is apply that algorithm. The second step is check the results to see if any of the results violates bcnf. In this particular case we have already verified that none of the results violates bcnf so we can just let it be. So what we have got is using the 3nf algorithm to get a dependency preserving bcnf decomposition. So we can have the best of both worlds. So that is one possible approach. Apply the 3nf algorithm and then decompose more if required. And if not required leave it alone. Does that answer your question? Now the other part is of course why do we have this redundant schema BE and BE? So that is another problem though. It is dependency preserving but you know there is something weird going on here. We got bcnf decompositions which had one less relation. How is that possible? The only way it is possible is if that relation itself is redundant meaning all the information contained there is there in something else. So through some closure we can get back to it. So we did not have BE but if you did a join of these 3 relations we are actually going to get back the same things whatever things we had for D and E we will get back. Now it is not dependency preserving so you might have a state where after doing join you find that DE is violated the functional dependency D determinacy is violated but the data in there is preserved. We are going to get it back. We are not losing any data even though we have one relation less. We are now with Nevar University Rajasthan. Nevar do you have a question? Sir I have a question about the normalization on the unstructured data. How do we apply the normalization process on the unstructured data? It actually makes no sense to apply normalization on unstructured data. Unstructured means you have no control on what is there. You do not know what are the attributes. It is like text. So how do you normalize text? Is there an algorithm to do it? No there is no way. It is basically out of your control. So there can be redundant information in unstructured data but the basic fact that there is no structure means that you cannot do anything. You cannot decompose because you do not know the structure. It is only when you have structure that you can actually apply any of these algorithms and do anything useful. So if you have semi-structured data like XML and so forth then yes it is possible maybe to apply normalization in some of these things. I am not familiar with related work in this area. So maybe there has been work or maybe I was wrong about normalization being dead. Maybe this normalization of XML schemas for all I know might be worth looking at. I have not looked at this area I should say. But yes if you are looking at semi-structured data where there is nesting and so on. So is there an equivalent concept to normalization? And the answer might be yes. I do not know if that answered your question. We are back with Jha College Hyderabad. If you have a question please go ahead. Sir by using you have elaborated about axiom rules. Based on axiom rules can we able to drive primary key, candidate key, super key? You mean Armstrong's axiom? Yeah. So the techniques which I showed you using attribute closure to check if something is a super key and so on. You could instead use Armstrong's axiom and infer whole bunch of stuff and check if something is a super key or not. But it tends to be haphazard. You do not know which one to do next. There are so many possibilities. What would you do? Maybe you miss something. You missed opportunity which you could have used. So it is absolutely doable. I mean by definition it is doable. Armstrong's axioms are have been shown to be complete. So any functional dependency which you can derive, you can definitely derive by a series of applications of Armstrong's axiom. The only question is is it convenient for a human to do so? And the answer is no. Is it convenient for a computer to do so? Yes, you can code an algorithm to apply Armstrong's axiom. It is not all that hard. So if you are doing it using software, yeah, that is a different story. But what we have done in the book is mostly do stuff which is easy for humans to do. As it turns out, what is easy for humans is also easy to program. So yeah, you could very well do all of this with Armstrong's axioms. But you are better off following the attribute closure approach which we have given. One last question and then I am going to go back to the last part of today's assignment or last but one part. One last question from Bhanipur, West Bengal. Sir, I have a question on normalization. That is after having a good normalization do I still need to index the table? And will the index even affect the performance in some way after having a good normalization? Normalization has very little to do with indexing. Indexing is a function of what queries you ask on the database. So if you have a lot of queries which are based on let us say the name of a person, then you want an index on name. If you have queries which are based on part of the name of a person, then you want a different kind of index which can index individual pieces of a name. If you have queries which are based on let us say course ID on the section relation, then you want an index on course ID. Even though course ID by itself is not a part of a primary key but it is not the whole of the primary key. On the other hand if most of the queries on section are based on year and semester first, then you want an index on year semester. So we will come back to indexing later but it is not related to normalization. It is related to what queries are asked. Now when you normalize of course you have broken up data and a related issue is that if you start with a relation schema and you break it up, I did not tell you what constraints to put on the tables that you create. That is something I missed. So first of all what are the primary key constraints on the tables that we create through normalization? And the answer is anytime you create a table from a functional dependency, you can immediately add a primary key declaration on the left hand side of the functional dependency. So you know remember in the decomposition algorithm we took a dependency alpha determines beta and we created a relation alpha beta. You will recall that. Now alpha is a primary key for this relation. So we can declare that immediately. But there is actually one more thing we can do. Whenever we do this, we can have a foreign key dependency from the other relation. Remember alpha is present on both sides. We can add a foreign key dependency from the other relation to this one. So we can ensure that the alpha values present in the other one must be present here. If it is not present, when you join it back you will actually lose data. So let me just write it here using a white board to reemphasize this because it is something I missed. If I have something like a relation R on A B C D E and I decompose it using some functional dependency, A B determines C. And I created two relations A B C and the other one was A B D E. Now on this relation A B will be a primary key. So primary key A B because if you see we got it from this functional dependency. So clearly A B determines the remaining attributes here because we got it from a functional dependency. So it is going to be a super key. So we will just since an SQL primary key is only declaration available, we will declare it as a primary key. Now on this relation, this A B value should really appear in the other one. If it does not, we are going to have a lossy decomposition. So we can also add a foreign key here. A B references the other relation, this one. So let us call this R 1. So what we have is that foreign key declaration A B referencing R 1 can be added in the R 2 side. So that is an important part of the decomposition process which we omitted. There is one more part which I did not talk much about and that is null values cause havoc if they are present. So the whole decomposition theory quietly assumes that null values are not present. Supposing a null value were allowed here, in this we had a table like this and null values were allowed. Now first of all what does it mean to say that you have a functional dependency where a null value occurs? It is not clear. So if there were null values you probably should not say a functional dependency holds but if you did for some reason and you went ahead and decomposed it then you have a problem. If you do a join here the common attributes are A B but if you had a tuple in R where say one of them is B was null. So in the decomposition B will be null here and here. If you join null is not equal to null. So that tuple will not get reconstructed in the join. So in all our decompositions we said lossy join because we get extra join tuples. But with null values a join can also be lossy in the sense that you do not get a tuple which you started with. So the whole theory quietly assumes that there are no null values in anywhere. So wherever we have functional dependencies there are no null values involved in those. So that is the basic assumption. If that is violated then you do not use those functional dependencies. Use only ones where you know that there is no null value and use that for decomposition. I do not know if that answered your question. So these are the two points. One is the constraints on decomposed relations and two is null values are not compatible with functional dependencies. So let me now go back to the tutorial questions. So we had done the normalization to 3NF based on canonical cover with the first one. Now our goal with the next example is to show that on the left hand side an attribute would be extraneous and how to check for it. So that is my goal here. And here we go. Let me highlight this. These are the functional dependencies which we are looking at. And we want to see what all are extraneous. Now do we have anything extraneous on the left hand side? So let us just go in order and cover the whole thing. So here what we first considered is A determines C. And intuitively it is clear that you can derive this from A determines B, B determines C. But let us just verify it using the closure. So if we drop C from here, A determines C. We take the resultant functional dependencies and C of A closure includes C. And that is very easy because A closure will add B first and then add C and we are done. So that part was easy. So we can drop A goes to C. To drop the right hand side, I mean we drop C, the resultant is the right hand side is empty which is actually a trivial dependency. A determines nothing is trivial. So we can drop it. Now let us look at AD determines E. We want to check if A is extraneous here. For this it is actually the check is slightly different. If you compare AD goes to E with if we drop A we get D goes to E. D goes to E is a stronger statement. When you drop it from the right hand side you are weakening it. When you are weakening it your goal is to check if the original thing can be derived from the weakened side. Then they are equivalent. Our goal is equivalence. If you take something and weaken it, you want to see if the weakened thing implies the original then you are okay. But here the goal is different. If you are dropping it from the left hand side you are strengthening it. If you want to strengthen it, you have to show the stronger thing can be derived from the original set of functional dependencies. So in this case we want to check if we can derive D goes to E from the original set from what is given here. So we apply attribute closure on D and here it is actually trivial. There is directly a functional dependency D goes to E. So the attribute closure of D includes D and therefore A is extraneous here. We can drop it. So if you want to do it in two steps, we first take the result of deleting A. D goes to E. We check if it can be derived. Indeed it can be derived and so we add it. And once we add it, A D goes to E is weaker than that and we can drop it. So this is what we get and this is the set of dependencies we get. A B, B C, D E and B E. D E is already present. So when we deleted this, that got merged. So this is what we get. And now there are two things with the same left hand side B. So we union those and this is the canonical cover. And if you want to use the synthesis algorithm, we create a relation from each of these. Is any of them redundant? Is A B contained in the others? No. D E contained in the others? No. We also have to check if any of them contains a candidate key. And you can easily verify that none of them does. Why? If we look at this set of dependencies, if you take any of these A B plus, A B plus does not include D. If you take A C plus, that again does not include D. If you take A E plus, that does not include D either. On the other hand, if you take A D plus, that includes everything. You can easily see that A D plus has A sorry B and then you can add C and then you can add E. So we have everything. A B C D E are all contained in A D plus. And we can verify that it is minimal. So it is a candidate key. So the final schema has these three plus A D and this is what we have. This is the final result of 3NF synthesis algorithm run on this particular set of dependencies. I hope that was clear. There is one more part to this which is taking a student and adding the functional dependencies on that. I will leave it to you to read it. I do not think it is particularly hard. It is fairly straightforward. So we can take the last few questions and then wrap up today. We are half an hour beyond. So I will take the last few questions and then wrap up. Yeah, D by Patil, please go ahead. We are having a question on BCNF relations testing. Suppose sir we are having three different functional dependencies. First two function dependencies violates the BCNF conditions. But using the third function dependency that satisfies the BCNF. So can that relation schema is in a BCNF or not? It is not in BCNF. For it to be in BCNF, every single functional dependency which is given or which you can infer, every single one of them should either be trivial or the left hand side should be a super key. So if there is even one functional dependency which violates this condition, that is the left hand side is not a super key and it is not trivial, then the relation is not in BCNF. We have to decompose it. So it is not enough if one of the dependencies is ok. All the dependencies have to be ok. Can you give some example for fifth normal form sir? Fifth normal form that is based on joint dependencies. I do not have an example of the top of my head but there is material and it will take some time first of all to even explain what is a joint dependency. So essentially I will give you some intuition without going into the details. I will just get the whiteboard. So if I have a relation r and I want to decompose it into two or more relations r 1. So joint dependencies say that you know supposing I can decompose r into these some set like this two or more and if I can guarantee that for a particular way of decomposing it this joint is equal to r for any legal relation. So point is of course for an arbitrary relation if you break it up like this the joint may not give you back r. It is also clear that if you have appropriate functional dependencies it could be a sufficient condition for the joint to be equal to r but it is not a necessary condition. A joint dependency is a more general kind of thing. It says that the decomposition of r into these attributes which of course have overlapping attributes between these. If we state that their joint is equal to r that is a joint dependency and multi value dependency a m v d alpha m v d beta can be written in terms of a joint dependency as alpha beta r is decomposed into alpha beta and r minus beta assuming alpha intersection beta equal to m t set for simplicity this is the decomposition alpha beta and r minus beta a multi value dependency is equivalent to saying that the joint of this is equal to r. There is no functional dependency but multi value dependency is equivalent to a joint dependency with two relations. But there are joint dependencies involving three or more decomposing into three or more parts which are not equivalent to multi value dependency it is more general than that. And from that you get first normal form and so forth. Practically I do not know anybody who uses it this is very very rare I mean I have not seen good examples of joint dependencies which go beyond multi value dependencies. So note that these are more general so it is actually a sequence that side we go from 4 n f to 5 n f. So anything which is in 4 n f is automatically in 5 n f because m v d s are a special case of joint dependencies. Similarly anything in b c n f is automatically in 4 n f because you can show that functional dependencies are actually a very very special case of multi value dependencies where the right hand side has only one value in a multi value dependency it can take on many values in a functional dependency it can only take on one value. So I leave it at that and if you want to know more go read it up. So as I said we have material online if you go to the books website there is an appendix on advanced normalization theory and all these things are discussed in that appendix. At one point it was in the printed book but we decided that this is not something which your average course should go into. So we took it away and put it free on the web. Does that answer your question? This is regarding a spatial database sir is it possible to do normalization on a spatial dbms because we have lines and polygons instead of attributes or entities so is it really possible to do normalization in that? So the normalization theory does not really care about the type of the attribute it could be ints, it could be strings, it could be you know lines, polygons, whatever all that it says is if something is equal something else must be equal it does not care about the type. So even if you have a spatial database you can certainly use normalization theory as is. But you can also look at the question another way are there any generalizations of the whole theory of normalization if you take specific types into account such as you know if you have a polygon line segment and so on instead of just equality there are other possibilities maybe you can say that if this is a subset of that if a is a subset of b then sorry not that way. So the question is can we somehow generalize normalize you know we had already functional dependencies we had multi value dependencies we had joint dependencies can we come up with new types of dependencies specific to spatial data. I do not know if this is of any use but maybe we can have contains dependency right. So we can say a contains depth b whenever you have two tuples a 1 b 1 a 2 b 2 you can say that if a 1 I will say contains by subset of equal to a 2 implies b 1 subset of a equal to b 2. This is just something I am coming up with on the fly I have not thought about this but it is just an idea which I had triggered by your question. So maybe we can create a new type of dependency which might actually make sense in the context of a spatial database and now we can maybe come up with the whole normalization theory related to it. I do not know if this has any applicability or any use but hey maybe normalization theory is not fully dead yet maybe there is a little bit of life somewhere little juice somewhere which you can squeeze and extract from it maybe I do not know if this might not lead anywhere but it might lead somewhere. We will take last question this is from K. L. Kubli. Sir how is BCNS different from third normal form? How is BCNF different from 3NF? This was the definition of BCNF. So we say a relation schema R is in BCNF with respect to a set of f of functional dependencies. If for all functional dependencies in f plus of the form alpha determines beta where both alpha and beta are subsets of R at least one of the following holds either it is trivial or alpha is a super key for R. This was the definition of BCNF. If you skip ahead to 3NF this part is actually the same I mean I have abbreviated it on this slide but the first part of it is exactly the same as the BCNF definition. The only difference is that in BCNF we said at least one of the following holds either it is trivial or it is a super key. In 3NF we are adding a third option either it is trivial or it is a super key or each attribute A in the right hand side minus the left hand side each attribute A in this part is contained in a candidate key for R. That is the only other option which we have added. So it is a weakening of BCNF. If it satisfies BCNF it trivially satisfies 3NF because every functional dependency will satisfy one of this or this but if it satisfies 3NF it may or may not satisfy BCNF because maybe a particular dependency failed these two but it satisfied the third one. I will just check if there is any other relevant chat question and then we will wrap up today. Somebody asked what is domain key normal form? That is something which goes even beyond 5NF and you know I do not want to get into it now but as I said the books website has more information on all these advanced normalization theory. So DKNF domain key normal form is defined there. Again none of this is really used in practice for multiple reasons. First of all once you go beyond 4NF. For 4NF there are axioms just like Armstrong axioms. There are axioms which let you infer all the possible 4NF. So if I give you multi-valued dependencies I can infer other multi-valued dependencies using a set of axioms and maybe there are algorithms similar to attribute closure which could be used for it. I am not sure but there are axioms for sure. Now if you go beyond 4NF if you go to joint dependencies and so on it has been shown that there is no procedure like this. There is no set of rules which let you infer it and when there is no such set of rules it is very hard to reason about these things in any clean way. So both irritations and practitioners don't really use these things but sometimes you know the intuition behind these forms shows up and makes sense. So even though in general it is hard to infer things from this in specific cases it is intuitively clear and sometimes you can use it to aid in normalization but I don't know if anybody uses it as a routine thing. Yeah I think that's it on the chat. So I think we have taken up a good deal of your time way beyond and I am very happy that so many of you stayed back and asked so many questions. Thank you.