 Welcome back to the tutorial discussion session. I hope all of you have finished up on the normalization questions. The solutions to the tutorials have been put up on Moodle and let me see if I can bring that up on the video screen for a discussion. So, here we have schema A, B, C, D, E all the questions use the same schema, but they have different sets of functional dependencies. So, for the first set of functional dependencies A, B goes to C, D goes to E and B goes to E. There are two parts of the question to compute the candidate key and to find the closure of A, B. So, first of all how do you find a candidate key? There are a couple of ways. One way is to start with the set of all attributes which will form a super key and then drop one attribute at a time checking if it is redundant. If the remaining attributes still form a super key and when no more attributes can be dropped without making that set no longer a super key, then you are done. You have found a candidate key. Now, how do you find if a set of attributes is a super key? Well, the usual way do attribute closure. This is one way, but here we have used a trick which helps speed up the process in many cases and the trick is that if an attribute does not appear on the right hand side of any functional dependency, then there is no way that the closure of the other attributes will ever include this one. Therefore, any attribute which does not appear on the RHS of any functional dependency has to be a part of every candidate key. So, in this case if you notice A is not on the right side of any dependency nor is B, nor is D for that matter. C and E are on the right hand side of dependencies. Note that we have many rules which let us infer new dependencies, but with this set we can only infer dependencies with this at the right hand side if the same attribute is on the left hand side also. So, those are not very useful. So, they would be either trivial or at least the part containing A. If A is on the right hand side, it also has to be on the left hand side. There is no other way it can be on the right hand side. Therefore, ABD have to be in any candidate key and if I take the closure of ABD in this case, they actually do cover all the attributes. So, it is a super key and we are done. Now, to take the closure of AB, it is very simple. I use the functional dependency AB goes to C. So, the result will be ABC and after that we have B goes to E which we can use to get ABCE and that is it. Nothing more can be added because the remaining functional dependency is B goes to E. D is not on the left hand side. So, we cannot do anything more and we are done. So, that was easy. The next two are again similar. A goes to CD and B goes to DE. The same way A and B do not appear on the right hand side. So, they have to be in any candidate key. Again, in this case AB is a super key. Therefore, it is also a candidate key. In this case AB plus is a super key. So, I expected it includes all attributes. The third one again is very similar. There is not much difference. It is more of getting more practice with more of these. So, in this case again ABE are not on the right hand side and they form a super key and we are done. AB closure on the other hand here includes C and once we add C and the C goes to D is available. So, the AB closure includes A, B, C and D. So, the first part of the question was relatively easy. Now, the next one is for BCNF decomposition. Here, the solution is not necessarily unique. So, several of these examples illustrate a situation where depending on which functional dependency we chose for decomposition, the final result looks quite different. So, in the first case AB goes to C, D goes to E and B goes to E. So, we have to decompose as soon as we find something that violates BCNF. In this case, is AB a super key? It is not. Neither is B nor is B. So, each of the three dependencies actually shows violation of BCNF. So, if we use the first one AB goes to C, what do we get? One of them is ABC, the other is AB, D that is the first thing. Now, ABC itself now there is nothing more to decompose it, nothing more which you can infer and it is in BCNF that can be verified although I have not shown it here. That leaves AB, DE and here there are two choices, DE or BE. If you decompose it using BE, what do we get? We will get BE as one of the options that is here and the other part will be B along with the other attributes except for E. So, that becomes ABD. So, what we have are ABC, ABD and BE. These in BCNF again we will have to check each functional dependency which holds on this. In this case, there are really no other dependencies and so we can check easily that these are in BCNF. Similarly, if we chose D goes to E to decompose here, ABDE will turn into DE and ABD and the other parts are the same. So, this is what we get. If we start decomposing using ABC. Now, if we started decomposing using DE first instead of AB goes to C, I use D goes to E. Then what do I get? I will get DE and the remaining part will be ABCD. Now, ABCD can only be decomposed using ABC. So, I will get ABD and ABC. So, then I will get the same one. So, if I started with DE and then continued decomposition I will end up with the second option here. Similarly, if I started decomposition with BE, what would be left is ABCD and BE and decomposing ABCD using ABC will give ABC and ABD. It comes to exactly the same set as these two. So, these are the only two solutions in this case. So, now let us look at the second set of functional dependencies. As I said all the relations there is just one relation ABCDE. If we take A goes to CD and B goes to DE, again we have two options for which one to decompose using first. A goes to CD or B goes to DE. By the way, coming back to the previous one, here the right hand sides have a single attribute. Whereas, in this one, the right hand sides have two attributes. So, in fact, we have more choices for decomposition. We could have broken A goes to CD into A goes to CD and A goes to D and then used A goes to CD to decompose. Then I will get a different result from what I get here. But it turns out that the result would not be drastically different. It is just that in the final schema, instead of getting ACD, I would get AC and AD separately and we can actually merge it into ACD without any problem while retaining the BCNF property. So, coming back here, let us say we decompose using ACD. A goes to CD. We get ACD and A with the remaining attributes, which is DE. Sorry, A, B, CD are gone. So, B and E are left. So, A, B, E. Now, if you just look at these two dependencies, A, B, E, there is no dependency in these two which covers just A, B, E. But there is actually, because B goes to DE, can easily be split into B goes to D and B goes to E. So, now, B goes to E, which we just inferred, shows us that A, B, E is not in BCNF because it is not trivial and it is not, B is not a super key either for A, B, E. Therefore, we decompose A, B, E into DE and A, B. BE is from this functional dependency and the remaining attributes are A. So, we get A, B as the other one. So, finally, we get ACD, BE and A, B. So, in this order, we get this result. If we decompose using the other one, DE first, what do we get? We get BDE and B with the remaining attributes which are A and C. So, BAC or equivalently A, B, C. Now, again, we have ACD from which we can get AC and AC is A, B, C in BCNF is A super key. It is not because B is not inferrable from A. The super key here is actually AB, but we have A goes to C and A is not a super key. So, we have to decompose. So, what do we get? We get AC and A with the remaining attributes. In this case, it is B. So, AB. So, what do we get? BDE, AC and AB. Do we have to decompose BDE any further? Not really. There is no other dependency on this except B goes to DE and B goes to DE is a super key. So, we do not have to do anything more. The other ones AC and AB are binary. Therefore, they are in BCNF. So, we can ignore those. So, those first two were straightforward. The next one is just a little more complicated. AB goes to C and C goes to D. So, what we have two options here decompose using either of them. Let us say we decompose using A goes to BC. So, we get A, B, C as one relation and the other one is AB with the remaining attributes which are DE. Now, if I look at AB, DE and these two functional dependencies, it looks a bit like AB, DE is in BCNF because AB goes to C. C is not here. C goes to D, C is not here. But is it really in BCNF? It is not. We discussed a similar example earlier which showed that if we use the original set of functional dependencies, even though none of them may show violation of BCNF, we may be able to derive something else which shows violation. In the previous part, the one we derived was actually easy. We took A goes to CD and said from that we derived A goes to C. That was a decomposition rule. However, it is a little bit more work. So, what we have here is using transitivity, we can infer AB goes to D which is contained in AB, DE. And AB is not a super key because E is not in the closure. So, AB goes to D is a functional dependency which shows violation of BCNF and what do we do to decompose it? We get ABD and ABE. By the way, here I did transitivity to infer it. But when I described how to do decomposition, I also mentioned an alternative way is to take each subset of that relation and do closure. Supposing we took subset and did closure, in this case the relation is ABDE. If I do A plus here, what do I get? Nothing, using the original dependencies. If I, the only left hand side which there is AB. So, if I do AB plus, what do I get? I get ABC and then because C determines D, I get ABCD, that is the closure. Now, if I intersect the closure ABCD with the attributes of this relation ABDE, what do I get? I get ABD. So, after doing that intersection, I can infer that AB determines ABD or simplifying it AB determines D because it is kind of redundant to keep AB on the right side. So, AB determines D is a functional dependency which shows violation of BCNF and we use it to decompose. So, what do we get? We get ABD and ABE as I just discussed. So, the final set of tables with this path are ABC, ABD and ABE. But this is not the only possible answer. Some of you may have used, C goes to D to decompose it and then what do we get? We will get CD as one of the relations. The other relation will be C with the remaining attributes which are AB and E. D is gone now. So, I get ABCE. Now, because I have AB goes to C already, that AB is not a super key. So, we have to decompose. So, we get ABC and AB with the remaining attribute here which is E. So, we get CD in the first place and then we get ABC and ABE. So, this is the final set here, ABC, ABE and CD. So, I think this is a good point to take a short break and get feedback from centers if they had any doubts. Before we move to the second part of normalization which involves canonical covers and 3NF. So, let me go up to the ABU screen and see if people have questions. If you have questions, please get on to ABU and let me know you have questions. The question is on normalization. That is the topic of generalization and specialization. You had said that we can take either the primary key attribute in the case of generalization or specialization or we can take on different models. In the first model, you had said that primary key can be kept in the other relationship or all the attributes can be inherited for the other relationship. In most of the cases, what happens is that the ID, the student ID that is the student number is not asked in the queries. Normal type of a query which we get is find out the top, find out the name of the student who is the topper, whereas name is for name to be found out, we have to require the join operation. So, in this case, is it possible that we have ID primary key included and the name included in the relation itself so that the join is avoided? That is model number 3. So, let me use the whiteboard. So, the question was about specialization. Person, below person, we have student and we have and then there are some attributes from here which down here ID and name. I should have put it in the box. So, this is a bit of patchwork and down here student has credits and employee has salary. So, this was the design in the ER level and when we converted it to tables, we gave two options. One option is to have these tables, person, ID, name, student, ID, credits and employee ID comma salary. This was one option. The second option was to have the following tables, person, ID comma name, student, ID, name, credits and finally, employee, ID, name and salary. I hope you are able to read this on the screen. So, these are the two options and as we discussed earlier, if you take the second option, if you want to find out both the credits and the name of a student, it is all there in one table. Whereas, with the first option, you are forced to do a join of student with person to get both the name and the credits together. So, there is a trade off here in terms of efficiency with respect to some such queries. Now, the point is supposing that you can have persons who are neither student nor employee, you can have students, you can have employees, but students cannot also be employees. Then, the second design here will not have redundancy. You can use it safely, but if you have a situation where you use this design, but now student can also be an employee. This is where redundancy comes in, where you have the same idea occurring in the student table and in the employee table and the name gets repeated in both and they better be consistent, otherwise you have an inconsistent database state. So, the point I was making with respect to normalization was that BCNF does not have very much to say about such duplication across multiple relations. So, if you see what are the foreign, sorry the functional dependencies on each of these relations. In fact, the only functional dependency here is ID determines all the other attributes. That is it, that is the only dependency and it ID is a super key for each of these relations. So, if you use functional dependency theory, it does not tell you anything about redundancy across relations and here the redundancy is indeed across two relations. So, the point I was making was functional dependency theory cannot help you detect such issues. The question which was raised was in spite of duplication should we keep this design because it speeds up certain queries and the answer to that question is most applications I know do not really need that kind of performance, you can always do a join that is part A to the answer. Part B to the answer is if you are using an object relational system, you can actually declare that student is a subtable of person. So, the syntax looks something like this, there is also some type specification which has to be done I am going to omit that, but it looks like create table student and with credits say int under person. If you do this then the database may implement either of these two things internally. If it implements the second one it is its job to make sure that things are consistent between different relations. So, you can use create table and if you do this what happens is you can access student dot name and it is inherited automatically from person above. So, object relational databases may save you some trouble here, but there is a catch at least as implemented currently as far as I know all the object relational databases assume that any particular entity has what is called most specific type and SQL also the object relational extensions to SQL also make this assumption. So, what does this mean? It means that I can have a person who is a student or I can have a person who is an employee, but nobody can be student and employee at the same time. So, the database systems do not allow someone to be both student and employee unless I create another subtype below this called something else maybe teaching assistant that requires multiple inheritance, but again multiple inheritance is not supported typically therefore, you really cannot do that. So, model of the story is although table inheritance avoids some of these issues the problem of duplication because somebody is both a student and an employee actually cannot be modeled in current database systems that does not mean it is theoretically impossible it just means that currently it is not supported tomorrow if it is supported by all means you could use that and then internally the system may keep just one copy above or it may duplicate, but it will deal with that it is not your headache. The benefit to you as a programmer is you can directly use student dot name and not worry about whether this representation is used or this representation is used internally it can use either regardless you can just say person dot name sorry student dot name student dot credits and your query will work. I hope that answered your question back to you. Is it possible that in that case only ID and names from the person have been put in the student and the employee table over to you sir. I think what you are asking is if person had multiple attributes name address and so on is it possible to replicate just name in student and employee table, but not replicate the other attributes that is a good question. I do not see any reason why you cannot do that. So, there are intermediate points between these two when you have multiple attributes, but whatever you do if somebody is both student and an employee there will be duplication of data. Furthermore, if you have such a representation you are forced to store the same tuple in person and in student. Even if you know the person is a student in our scheme here if somebody is a student I will not create a person record with this second scheme there is no need all the data is there already in student. With your scheme I will have to create a person record and a student record and directly there will be duplication. So, even when everybody has a most specific type your suggestion will have duplication there are probably a few scenarios where performance demands it, but let me tell you that most applications today really do not need that kind of performance boost to be got from denormalization it is not worth it. Most of the applications I know at an enterprise level run perfectly happily on a single machine. So, all of IIT's applications academic application runs on one machine they can very well run on a regular desktop machine with a single core CPU and it will run fine the needs are not that much. So, performance tuning by denormalization is not really important today. I hope that answers your question back to you if you have any follow-up. I will take may be one more question. Sir actually I have got two tables one is employee and another one is department and employee in employee relation there are employee names salary and department ID and department has got department ID and manager ID where the manager ID refers to the employee ID of employee relation. Now, how do I insert the first couple of this particular relationship? Thank you this question is related to day 3's topic and we actually had a very similar example there for those of you who did not understand the question let me repeat the question. Let us say that each employee couple has a foreign key reference to a department and furthermore each department has a foreign key reference to a manager which is actually back in the employee table. So, you have employee referencing department, department manager referencing employee. So, you can actually have a cycle where a particular person is in department X and X is also the. So, person P is in department D, but P is also the manager of department D. So, this person record points department record the department record references back to the same person record you can have this situation. So, if you recall this is very similar to the spouse example I gave where you have a married person relation and each person has a point reference to the spouse which is also a married person. So, if a person or husband and wife the husband record refers to the wife the wife record refers to the husband it is a cycle. So, if you go back to that day's lecture I gave several alternatives for handling this situation one of which was to set the values to null if at all possible and then update them later that set the references to null insert the employee insert the department then update the references. If that is not possible then you will have to basically tell the SQL system to wait for the end of the transaction to check the integrity constraints. That means first of all you have to have a transaction and I gave you some syntax for it which is based on begin atomic and end, but like I said most systems do not support that directly. Later in this course we will see how to run transactions in Postgres you can try it if you want even now where you say begin semicolon then all the following statements are part of the same transaction then you say either commit or roll back. So, here you can say begin you can say defer integrity constraint checking for that integrity constraint that foreign key constraint or both the foreign key constraints then do the two inserts then commit when the transaction commits it checks the two foreign key constraints at this point both the tuples are in place and the constraints are not violated. So, you are okay I hope that answers your question. We have only one question at this point from Raja Ram Bapu Islampur. Sir, my question is regarding the third normal form. Suppose that the R is one relation and alpha implies beta is a non-trivial functional dependency and alpha is not a super key and if beta minus alpha is null in that case that relation R will be in the third normal form or not. So, you are saying beta minus alpha is the empty set is that your question? Yes beta minus alpha third condition beta minus alpha is null in that case that relation R will be in the third normal form or not. So, first of all if beta minus alpha is null by that I assume you mean that beta does not have any attributes which are not in alpha, but then alpha goes to beta as a trivial functional dependency. So, that case is already handled. So the only interesting case now so that is the same whether it is BCNF or 3NF. The only interesting case is when beta minus alpha has at least one attribute and alpha is not a super key that is the third case for third normal form where we insist that each of the attributes in beta minus alpha must be in one candidate key at least. So, there may be different candidate keys, but each of them has to be in some candidate key. So, the question of beta minus alpha being empty set is trivial. Does that answer your question? Let me at this point move back to the last part which was third normal form. So, the third normal form question again used a single relation with a, b, c, d, e and now the set of dependencies is different. The first one is as you can see a goes to cd, b goes to d, e and c goes to d. So, those are the three of constraints functional dependencies. So, for third normal form using our algorithm the first step is to get a canonical cover. How do we do this? Well, we have to look for extraneous attributes and also do union. In this is anything extraneous. Taking the very first one a goes to cd is a goes to c extraneous, I mean is c extraneous in here. If you remove a goes to, if you remove c from here there is no way to derive it using what is left. So, c is not extraneous. How about d? If I remove d from this what is left is a goes to c, but I also have c goes to d. So, I can infer from these two I can infer a goes to d. So, d is actually extraneous on the right hand side of this and we remove it. So, what do we get? This set a goes to c, b goes to d e and c goes to d. Is anything extraneous anymore here? In fact, we can go through this and check for each of them using the standard rule and we will find that nothing more is extraneous. This is it. Can we do any unioning? No, they are all separate left hand sides. Therefore, the first step is to create a table for each of these. Is any one of them contained in the other? No. So, we cannot even delete anything. We have to create these three tables. The next step is to see if any of these tables has a candidate key. How do we do that? Again, we can take each of these a c, b d e and c d, do attribute closure and check. If you do so, you will find that none of these actually include all five attributes. What we did here is we checked what could be a candidate key and we found that neither a nor b is on the right hand side of any functional dependency. Therefore, a and b have to be in any candidate key. Now, what is a b plus? That includes c d e. Therefore, a b plus is everything. It is a a b is a super key. It is a candidate key also. So, we chose a b as our candidate key. It is the only one here and added it. So, the final set of tables is a c, b d e, c d and a b. That is it. Candidate key as we just discussed is a b. Now, the next set is again kind of similar. We have more dependencies here. a goes to b, b goes to c, a goes to c, d goes to e, blah blah. Now, what do we do here? Here, let us try to use the union rule first. We have a goes to b and a goes to c. If we apply the union rule, we get a goes to b c. Now, on this, let us try to remove c and see if it is extraneous. In fact, we do find it is extraneous because we have a b, b c. So, we can easily infer a c. So, there is no need to keep c. We could have also done this directly. If we take a goes to c and check if c is extraneous on the right hand side, what do we need to do? So, we have to see if we can infer it back from the remaining functional dependencies. And indeed, we can a goes to b, b goes to c. So, if we remove c from here, what is left is a goes to nothing. a goes to nothing is a trivial functional dependency and we can just throw it out. We do not need to retain it. So, either way, what we have landed up with is a b, a goes to b, b goes to c and then these other three. Now, is anything extraneous? Let us see here. Again, by eyeballing this, we see that a d goes to e, d goes to e. They look very similar. Now, it should be clear that d implies e is a stronger statement than a d goes to e. So, we could actually say that this is a stronger dependency. We can delete the weaker one, but that intuition is correct. We can do it, but if we follow the direct procedure for canonical cover, there is no such thing which says just drop it if it is weaker, although we could infer it. But following the same procedure, we are going to check if a is redundant here in a d goes to e. When is a redundant? It is redundant if we can infer d goes to e using all the given dependencies and in fact, it is already present. So, directly we can say that a is extraneous. So, if we delete a, what do we get? d, but d is already present. We remove the duplicate. What else do we have? We have two dependencies with b. b goes to c, b goes to e, which we can union to get b goes to c. So, at the end of that, what do we have? This a goes to b, b goes to c d, d goes to e. Is anything extraneous here? You can check and try to delete each of these and check if it is you will find that we are done. Nothing more is extraneous here. So, we will create tables for a, b, b, c, e and b. Can we delete any of these tables because it is contained in another? No, nothing can be deleted. Is there a candidate key in one of these tables? Again, if you take the closure of each of these, you will soon realize that the closure of a, b over here does not include d. The closure of b, c, e does not include a. The closure of d, e does not include a. So, none of these has a super key. Again, using our usual tricks, we find that a and d do not appear on the right hand side. Therefore, a, d will have to be in a candidate key. Do a, d plus? Sure enough, it is a super key. Therefore, we add a, d as the fourth relation here and that is it. This is the 3 NF schema and the candidate key is a, d. We are done with that particular question. Now, the last question, well not the last, the fourth question was what are the first functional dependencies here? I think you should have all got this because these are basically from the tables we already have. We just threw in a few extra attributes from other relations into this and we made one other small change. We got rid of section id over here and we do not care about the section. We assume there is only one section for a course in a semester. So, what are the functional dependencies that you would expect based on your knowledge of an academic setting? You can easily figure that id must determine name and if a person is registered for a course, they can only get one grade for that course. You cannot get two different grades for the same course. Well, it is not true strictly speaking in IIT, IIT used to manage things. In IIT, we had, we still have for certain categories grade called FF, which means you can take a re-exam and then you may get a passing grade and then the transcript would show that you got an FF and then you got a passing grade. So, you got two different grades for the same semester apparently, but that is kind of a weird way to do things. We have actually changed it, but in any case assuming that it is not going to happen, we certainly have id course, id semester here determines grade and similarly for instructor department and then decomposing it is again very straightforward. I am going to skip the details. You can read it yourself. The last part was something for you to do on your own. Each of you may have a different solution. So, go ahead and discuss it with your coordinators. So, that ends the tutorial part. I am ready to take a few questions on the tutorial, but what I want to do is also spend about 10 minutes covering the last few slides, which I did not get a chance to cover in the morning. I ran out of time. So, let me start with any questions. If people have questions on what I just covered, do not ask any other questions. Only questions about the solutions to today's tutorial, nothing else. So, if you have questions, please indicate on a view. Yeah. So, is the super key always the candidate key? I mean both the values have the, both the keys have the same value or what is the difference? No. A super key may or may not be a candidate key. As an example, we have a student relation with id is a super key. Id is also a candidate key. However, supposing I threw in some id and name, id and name together also uniquely identify a student. That is the condition for being a super key. So, id comma name is a super key, but it is not a candidate key because it is not minimal. Why is it not minimal? If I delete one of the attributes name, what is left is id. And as we just saw id is, after deleting name, what is left is still a super key. So, id name is not minimal. Something which is smaller than it is also a super key. So, id name may be a super key, but it is not a candidate key. I hope that answered your question. If you have a follow up question, ask. Okay. Looks like nobody else has questions. So, then let me take some time to finish up my talk. And at the end of it, I will have time for a few more questions. So, where I left off, I had covered everything about BCNF and 3NF and how to reason with functional dependencies, closure, the canonical covers and so forth. So, at this point I want to get back to a few other design issues. Now, first of all, I have already mentioned this particular point a few times. We assume that a given schema is there and we then proceed to decompose. Now, this schema could have been generated from an ER design. That is typically what people do. Although the database theoreticians who first came up with normal forms assume that there is a single relation with all the attributes called the universal relation. Or people may have done some totally ad hoc design without ER modeling, come up with some relations and then you normalize. Again, normalization will not help you deal with problems in the initial design which prevent you from representing certain pieces of information. All normalization does is it looks for redundancy and make sure redundancy will not occur. Now, when you do the ER diagram carefully, things should be ok. No further decomposition should be required, but there can easily be situations where this can happen. So, for example, supposing we do not model department as an entity. We do not care about departments, details of departments. So, what we do is in the employee entity with put in an attribute department name and an attribute building. And let us suppose that every department is in one building as we have been doing. So, now department is not an entity. So, there is no question of creating an entity and a relationship. It is just a string. Building is not an entity, it is just a string. So, we might very well have made both of these attributes of entity. If we did that and then we converted it to relations. Now, let us see what are the functional dependencies. We know that employee should have only one department, that is why we made it an attribute. We also know an employee should have only one building, that is how we made it an attribute single valued attribute. But suppose we know that the building must be the department's building. In other words, a person is in a department. A department is in a building. That is the constraint for our enterprise. That is going to be enforced, let us assume. It is not a very good example, but let us say it will continue to hold and will be enforced. Then we really should not be storing department name and building in the same schema. Even though we did not model department as an entity, our relational schema now violates BCNF, where department name functionally determines building. But in the employee table, department name is not a super key. So, we have just found the reason why it is not in BCNF, we decompose. We pull department name building out and the employee has just department name, no building. This is standard. We have seen many examples of things like this. But the point I want to make is, even though you went through the ER process, you may still land up with things which require further decomposition. But typically, this does not happen. These are relatively rare cases. Functional dependencies from non key attributes of relationship are also possible, just like with entities as we saw, but that is also rare. Another kind of situation is one where we saw, where relationship set, we said its primary key consists of the primary key of the entity sets that it relates. But we also saw an example where after doing the design, everything was proper. There was one more constraint that student cannot be in two sections in the same semester for the same course. So, with that constraint, we actually found a new functional dependency. In this case, it did not lead us to decompose. That change did not happen, but it certainly changed the primary key of that table. In other cases, the decomposition may happen. Now, for the part which I think is most interesting from a real world perspective, the entire thing process we have done up to now, including ER and relational model, is has been done with a point in time relation. What do I mean by this? So, we have been using a lot of functional dependencies. We say that course ID functionally determines the title of the course. Unfortunately, in the real world, things are not so simple. Right now, a course ID can have only one name. But next semester, the person who is in charge may decide that this title is no longer an accurate depiction of what the course is about and then we change the title. So, if you see across time, the same course ID was associated with two different titles. Our database schema did not let us store our history. If we change the title, we just update course. That is it. We have lost history about the old title. This can be a huge problem. Supposing, we want to generate a transcript for a student who took the course five years ago. Five years ago, the same course ID had a completely different name. Now, if we generate a transcript for this person by using the current name for that course ID, that is ridiculous. Let us say the course changed 317, which is now database was maybe graphics five years ago for whatever reason. These kind of changes are rare, but occasionally it can happen that the same course you use number is used for something completely different. If you generate that person's transcript now, we are just saying that that person took a course in database systems, although they actually took a course in graphics, which is very wrong. So, what we have not modeled is the fact that a course ID functionally determines the title of the course is true at a point in time, but not across time. This is a very important issue. A lot of database designs completely ignore this fact and sometimes they get into trouble. In IIT, we have got into trouble for exactly the kind of reason I told you some years ago. So, what was the solution? Well, people realized there was going to be a problem. One solution which was adopted was to keep a history relation. So, what they did is whenever the title is updated, they will stick a record in a course history relation, which says between this year and this year, 317 was graphics and currently it is databases. So, if we generate a transcript, we have to go to the course history relation explicitly find what was the title at that point in time, join based on that. So, the join is based on course ID and on time to find the correct title for the course. So, that is a lot more work for the programmer. Wouldn't it be nice if the takes relation has a time point? Let us say for a university, time does not mean the seconds and hours and minutes and so on, but just the semester. Semester is good enough. During a semester, we cannot create a new title for the same course. It can change next semester, but this semester it has to be fixed. Now, if we say that in this semester, the title was this. In this semester, the person took this course. Now, the semesters have to match when we do the join. If we can automate that, that is a situation where we are modeling history automatically in the database. Such databases would be called temporal databases, which actually track at what time a fact was true, more realistically it is for what interval of time. So, the title of this course 317 became database systems in 2006 and continues till now. 317 had a title of graphics from 2000 to 2005 and so that is the history. So, there is no separate relation. There is just one course relation and the course relation is internally, the database is keeping track of all the history and when we do a join, the times are automatically taken into account to fetch the correct version of it. This is the ideal situation, but unfortunately databases today do not really support temporal features very well. There have been temporal extensions proposed to SQL, but they have not really been adopted widely so far. In fact, it is not even a part of the standard yet. So, what do we do? We realize there is a problem. So, as part of the database design process, we made first do a first cut ignoring time, but then you have to go back and say what about changes over time? How do we model them? What do we do? And then model the schema or modify the relational schema appropriately. You might ask why not do it at the ER stage itself? Again, there have been a few proposals to extend ER modeling with temporal features, but they have not been very successful. So, the best solution we have at this point is first to do a point in time modeling, ignoring history in the ER model, convert it to relations, then look at those relations and see how they have to be modified to take time into account. So, little bit of terminology here to understand this. Temporal data is any data which has an associated time interval during which it is valid. A snapshot is the value of the data at a particular point in time. So, I can take a snapshot of the database as of right now. The database may get updated. My snapshot is still whatever it is at this point in time. Now, a temporal database will actually store all the values across time. So, at any point I can go and say give me a snapshot as of 2006, 21st December 2006, 5, 20 p.m. And it can return me a copy of the database at that point if I want. It is going to be a very large piece of thing. It will return. So, actually you would not do that. More typically you will say here is a query evaluated on this snapshot at this time and see what we get. That is one kind of thing you can do. Then I will show you some other kinds of things we can do. But before that here are a few more examples of functional dependencies which hold in a snapshot but not across time. So, here is employee or a person with ID and then a street and a city address. Now, of course people change addresses over time. So, over time the ID does not functionally determine street and city. At a point in time employee or person is required to have only one address of record and so the functional dependency holds in a snapshot but not across time. So, to model this situation we will say that a temporal functional dependency X there is a little symbol called the Greek tau symbol on top of the arrow Y holds on schema R if the functional dependency X determines Y holds on all snapshots for all legal instances of R. So, at a point in time a person can have only one name, one street, one city and so on. So, that is a temporal functional dependency. So, now we can use this notion of temporal functional dependency to normalize relation. So, if you are looking at a snapshot we can just normalize based on the temporal functional dependency. But once you add time you will realize that you really had to include time for the functional dependency to hold. So, if you had a full-fledged temporal database which manage time internally you could get away with that, but like I said no database supports it today internally. So, what do you do? Instead database designers end up adding fields like the history relation I told you about course history it will have a start date and an end date that is very typical you can even put it in the main course relation. So, what you can do here is instead of course ID course title I will say a course ID course title start and end where start and end are time stamps. That means in this period of time course ID had this course title. There is also more in this period of time course ID existed if a particular course ID does not occur in with a let us say it occurs with start time 2006 end time 2010 and there is no other tuple that means that course ID did not exist before 2006 does not exist after 2010. So, that is the meaning of this. There is some more meaning hidden here what this means is since we had the functional dependency on course ID being a super key that means this if you have two different tuples here with the same course ID it means that start and end times for each of those this is an interval what it means is these intervals cannot overlap. If they overlap what you have is a point in time where two intervals are both valid for the same course ID. That means you have two separate tuples for the same course ID which means the primary key declaration is violated. So, that should not happen. So, the problem is we can no longer say course ID is a primary key because you will have multiple records with the same course ID. So, then if you try to use SQL constraints you will say course ID start and end form a primary key. Unfortunately that does not work also think of a situation where a start and end may be July 2006 and May 2010 and another tuple with the same course ID which starts in January 2010 and ends December 2010. So, now what do we have the start and ends there are two starts and two ends they are all distinct. If you treat it as a primary key there is no violation if you treat course ID start and end as primary key there is no violation if you treat course ID start as a primary key there is no violation. But really if you see the intervals they overlap between whatever 2010 Jan and May 2010 whatever both of those tuples were active that is a violation which cannot be detected using SQL constructs currently. So, it is hard to enforce this kind of dependency which says that valid time should not overlap unless the database supports it internally. Another issue is that foreign key references may be to the current version of the data or to the data at a point in time. For example, if somebody is an employee in a department if the department name changes their department name obviously changes. So, the department ID is or let us say the budget changes for the department and you look up the department budget of this person clearly it should be always with respect to the current point in time. In contrast when you had a grade for a student and a registration for a student takes there the course was at a point in time not the current time, but at some point in history at the time when the course was taken. So, there are two possibilities and you have to figure out which one it is if it is current point in time no need for a time associated with it. If it was at some point in history you need a time point associated with it. Luckily in the takes relation we have the time point which is the semester and the year. So, that information is already there. What is missing is the course relation does not have history. So, if you include the history in the course relation then we can actually look up the correct record in the course relation. How do you look up the correct record? Here we have semester year in the takes in the course relation which we have just extended just now we have start and end time point. So, we are going to look for a record in the course relation with the same course ID who start and end time point includes this year and semester when this person took the course. So, the join is on course ID plus this semester being contained in the valid time period for that course record. Your querying also has to take time into account. So, really if you want to take time into account you have to modify what you store what you query and so forth. There have been proposals to simplify this job by treating by creating this concept of a temporal database where time is a basic thing included in every tuple. Query is also implicitly take this into account. But as of now you have to code it on your own. So, whenever you do a design first do the design without worrying about time. But make sure you go back see what happens when things change across time and make sure you have taken care of it. So, with that I will stop and take any remaining questions. Sir, in our university we are following Navaday and as per Navaday third normal form definition is based on transitive functional dependency. But in your discussions today you did not mention the transitive dependency at all. So, can you clarify? Thank you for asking that question. It is a very valid question. There are in fact two ways of defining third normal form both of which are equivalent. But they look totally different. One definition is from is used in some textbooks like Elmastry and Navaday which is based on a notion of transitive dependencies. And the other definition is what we covered here. In fact the two are the same. How do you know they are the same and how are they related? I am not going to cover it right now here. But there is an exercise in our book which discusses this issue further. So, do look up the exercises and I believe it is a solved exercise. So, I think the solutions for that are also on the web. I will have to check this. So, if it is not I am happy to make it available through the Moodle site. So, let me check that offline and get back to you on that. So, if you are using Elmastry and Navaday you can whatever we do is the same. I mean the 3NF decomposition algorithm nothing really changes because the definitions are equivalent back to you. So, one more question. How to insert null values? How to insert null values for a data type in SQL server? I think you had asked me this question before. So, normally in the SQL syntax you just use NULL not as a string not quote NULL, but just NULL in the insert statement. So, insert into table values and wherever you want to put a null just say NULL that ought to work. Now, if SQL server is not accepting null for datetime first of all check if that has been declared as not null. If it is not been declared as not null it should accept it I do not see why it would not work. If it is not working please send me an example and we will follow up on that offline. As far as I can tell there are no more questions. So, thank you very much and that ends today's session.