 So, we have already discussed this issue of why we do not explicitly solve the course ID because it would be redundant. So, we will move on. So, here is a ER diagram for the university enterprise way modeling. Now, again this is modeling only a small part of the university, it is a toy. It is not a final thing, but it is a model some parts of it fairly comprehensively. So, what are the different entities we have? We have departments, students, instructors, courses, sections. Then, we have classrooms, we have time slot which is useful for the following reason. This could have been an attribute which says this course meets at this time. However, it is useful to model it as an entity because then we can say that here is a time slot ID which meets at these times and the course meets in this time slot ID. That is at least how IIT Bombay models it and many other places do the same thing. They may use different names for the time slot. For example, IIT Bombay uses a name slot 1, slot 2, slot 3. Many places use something like MWF in the US. This is very common MWF 8 to 9. So, 8 AM to 9 AM Monday Wednesday Friday, but that is they give a name for that. So, that would be the identifier for the time slot. So, those are the entities and what are the relationships? We are assuming that every instructor is associated with the department and we have added some constraints also for this department relationship. This is total. Every instructor must have a department and it is also many to one. So, every instructor has at most one department. So, it is exactly one department. Now, this of course rules out an instructor being in multiple departments and some universities do allow that, but usually what they do is they require a primary department. You are a member of this. You may also be an adjunct in another department. So, that is a common situation. Mostly people do not say you are equally there in two departments. So, if you want to model an adjunct, then you can have another relationship which is adjunct as opposed to the primary affiliation. Similarly, for students we have this is a double line and there is an arrow. So, it is the same thing. A student is associated with exactly one department. Again some universities allow multiple majors for a student in which case you would have to modify this schema. Then we have a course. Now, a course is also associated with a department. Again, course is associated with exactly one department although in some cases there are cross listed courses. These courses, the same course can be taken either as a CS course or as an EE course because both the departments need it. We are not modeling that situation here. We would just treat it as two separate courses. And a course has an ID, title, credits. It has prerequisites. We have already seen this. Similarly, department has a name which is also the primary key. Now, yesterday somebody was asking me why do you use department name as identifier here because it is a long string and it can cause confusion. A more realistic thing would have used some form of identifier, ID rather than a name. It could be an integer. It could be an abbreviation like CS for comm sign as opposed to comm dot psi which we use here. Again, the reason we just stuck to department name was to the more attributes we had the wider tables become and the more complex queries become. So, we have kept the schema a little unrealistically simple so that it is easier to understand and write queries against it. But in a more realistic schema, certainly we would have had an ID and a department name separately. So, then instructor of course the relationship would not have changed but the relations we create out of it would look different. They would have a department ID instead of a department name. That would be the difference. We have the advisor relationship as before. What else is here? This course to section week diagram actually has a dash here but at this scale the dash is not showing up but it is there in the figure. If you zoomed in on it, you would see it. So, these are the discriminators as before and this section must meet somewhere and the assumption here is that it meets in exactly one classroom. So, this is again an assumption which is usually true but occasionally you may say that this course meets here on this day and there on this other day. That cannot be modeled by this. So, there are some assumptions we are making. If you remove this arrow, we could certainly allow the section to meet in more than one classroom but then we would not know which time it meets in which classroom. So, then we will have to have another maybe an entity which is time and then that could be a ternary relationship perhaps. So, there are different things we could do but we have it simple. And again we have each section meets in a particular time slot ID but the time slot ID actually has a multivalued attribute which indicates when all this particular time slot meet. So, for example, there is a day start and end time. So, time slot 1 for example, on Monday 8.30 to 9.30, Tuesday 9.30 to 10.30 and Thursday 10.30 to 11.30. So, that is that models that situation. What else? We have students taking a course. They do not actually take a course. What they register for is a section of the course. So, that relationship is between this week entity section and student and we have an attribute grade associated with it. Of course, initially there is no grade when they register. So, that would have to be null and the grade is assigned at the end of the semester. No, it is only a course. It is teachers is not an identifying relationship although it totally participates. What this indicates is that every section must be taught by at least one instructor that is a constraint here. I do not think there is any rule which can stop us, week entity becoming owned by two strong entities. You can certainly, you can have multiple. So, one is this week entity which is identified by this and by another one. So, let us think of an example of this kind of situation. No, child actually has identity independent of the mother and the father. Even if you do not know who the mother and the father is, you can identify the child uniquely. So, that does not quite work. No, that would not, a role would be optional. So, that whereas this is existence. So, but mother child or child parent could be roles, a person entity. Yeah. So, a person entity may be related to another person by, actually there are two separate relationships, mother and father. So, this, I am trying to think of an example. Every section is assigned some different classroom. So, in that section you do not have the identifier. So, here in this section we do not have an identity key. So, that section class is also a week. Section is weak. Section is the only week entity in this particular diagram. Right. So, that section class is also weak in the relationship. This one. No, no. Relationship can link to a week entity. That is not a problem. But eventually, how do you identify a section? To identify it you will have to have course ID, section ID, semester here. So, that is implicitly part of the primary key. Okay. So, you have to copy the primary key. And the issue here was that if particular week entity, sorry, is identified by two separate relationships, then the issue is that the primary key. So, let us call this k1 and k2. This is the primary key here. Then the primary key for this would be k1, k2 and whatever discriminator. Let us say there are two discriminators. D1, D2. So, for this, this is the real primary key for this entity set. In fact, you can have other situations. So, the example which was mentioned earlier. Let us think of this as course. This is section now. Let us forget about this part. Let us ignore it. Now, if you merge two institutes, then there is an institute entity. And now this is no longer a strong entity because the same ID may occur in multiple places. So, then a particular course would be identified by the institute. And now this itself becomes a discriminator. And here institute has, so its primary key is institute ID. So, now in this case, what is the primary key of section? Well, the primary key of course itself is insured ID. And in this case, this would be course ID. Let us rename k2 to course ID. So, the primary key for this would be insured ID, course ID, and then whatever else here. For the section, it is section ID, some as the year. So, you may actually have to complete a path here to get a primary key for this weak entity. So, the question is, why do not we simply put time slot as a multi-valued attribute here? That is a good question. It is a design decision. You could have done it that way. We chose to do it this way because we have time slot ID. And it may be useful to check, for example, if two courses are running in the same time slot. You can use the time slot ID instead of comparing the day start time and time. It is easier to check if there is a clash. Assuming the time slots do not clash. It is enough to check that if a person is registered for two course sections, it is enough to check that their time slot IDs are different. But yes, you could equally well have stuck the multi-valued attribute directly in there. And then a little bit more work in the query will get you the same thing. So, it is a design decision which could have gone either way. You also lose reuse of the time slot information at multiple places. Reusing is lost again if you use as a multi-valued attribute. Yes, if you do multi-valued attribute, the fact that in that particular time slot ID, whichever course you use that ID has exactly the same sort of things. So, that is repeated in every section. During normalization? Provided you had a time slot ID. Then you would catch it during normalization. Right. If you discarded time slot ID and only had a multi-valued attribute, which is day start and end time, then you would not even catch it at normalization time. In fact, what will happen there is if there is no notion of time slots at all. The university says, well, each course can decide when it wants to run completely independently. Of course, that would result in chaos. That is a very bad idea for a university. But if you decided to allow that, then, in fact, there is no functional dependency here. And normalization would not help in that case. So, the fact that we decided that time slot should be structured properly to minimize clashes. Every university does this. So, because we have done that, we model time slot as an entity. And if we didn't do that, we had time slot ID and day start time in here. Then when you do normalization, you realize that there is a redundancy. And that you can solve by coming back here and splitting it up like this. So, instead of just normalizing this issue as asked before, we could leave the ER diagram as is and just do normalization. Or we could come back and do it right, which makes sense. Any questions? So, the next topic is, of course, reducing ER diagrams to relational schemas. So, a strong entity set can be converted very simply to a schema, a relational schema with exactly the same attributes, and the primary key being a primary key of the relation. This is the first cut. Again, this is going to be refined, as you are no doubt familiar. A strong entity set includes a column for the primary key of the identifying strong entity set. So, again, we are assuming that this is a strong entity set. If you have a sequence like this, you have to copy the whole thing to get a key for this. So, the primary key for this would be insured ID, course ID. And so, the primary key of this thing, identifying entity set, would also get copied here. So, that is straightforward. We also have to deal with relationship sets. So, in this case, advisor is modeled as a relation, student ID, instructor ID. Now, in this case, this is a many to one. Actually, there is a glitch here. Since this is many to one, what would the primary key be here? It is many from the student side to one on the instructor side. So, the primary key here would be just student ID. The many side will be the primary key. This should not be there. If this were many to many, then the combination would be the primary key. Now, there can be some redundancy in this schema. We can actually simplify it. So, here is relationship between instructor and department. So, using the scheme, which we just saw, we will have a relation for instructor, a relation for department, and one more for instructor department. Now, in this case, we know that instructor has only one department, at most one. And in fact, it has exactly one. So, as again, you are familiar. Instead of creating this relation separately, we might as well merge this relation with this relation. The primary key of this would have been the ID of the instructor because it is many to one. The primary key for this would also be ID. We might as well merge them instead of keeping them separate. It does not really help. And so, what we end up with is department name becomes an attribute of instructor. So, that is the schema we have been using. So, although we started with the relationship, the relationship vanished when we converted to the relational schema. Similarly, student department, department name got copied into student and it vanished. And for advisor, if this were in fact many to one, we could have merged advisor into student. We did not do it in our schema because we did not make this assumption. Well, what we have done here is that we have not said that student advisor is total. So, now we could merge advisor as an attribute of student, the instructor ID in here. The issue will be that it will be null for all students who do not have an advisor. So, we do not want to unnecessarily create null values in the database. Therefore, we keep the relationship separate. So, we are going to do this merging only if it is many to one and total. If it is just many to one, we are not going to merge it. So, this was the point about total. So, if the participation is partial on the many side, replacing a schema by an extra attribute could result in null value. So, we do not do it. Finally, the schema corresponding to this identifying relationship here from the weak entity set to the identifying strong entity set is guaranteed to be redundant when we convert it to a relational schema. Why? Because when we create a schema for this guy, we have to copy the primary key from this side in here. So, that is already part of this. So, for the section relation, course ID has to be present in the section relation by our previous construction set here. Course ID is part of this. So, now, what will be the attributes of the relationship section course we created? It is going to be the primary key from this side and from this side, which is course ID. And on this side, course ID, section ID, that is the same thing. It is redundant. So, it is equivalent to have pulled this into this side. So, just like we merged certain relationships into the entity, the identifying thing will be merged into this automatically because of this step. So, we do not even have to bother to explicitly get rid of it with that redundancy here. So, that was for the simple attributes. What about the more complex composite and multi-valued attributes? What do we do? So, composite attributes can be dealt with by simply flattening them. For example, here, we converted into a relation with ID, first name, middle initial, last name. If there is any ambiguity, if the same thing appears here also, for example, then we would have to be careful and call it name underscore first name, name underscore middle initial so that it is clear where it, there is no duplication. Address, everything got flattened. Composite attributes can very easily be flattened, if you wish. Of course, that means you are losing some flexibility. If you had an object relational system which actually allowed composite attributes in a relation, then the benefit is that we can refer to instructor.name and automatically get all three components. You do not have to write a query which separately says first name, middle initial, last name. So, if the system supports it, we could avoid this flattening and keep the same structure. If it does not, then we do this. Now, what about the multi-valued attribute here? You will notice it does not appear over here for number. That has to be kept separate because it is not atomic. There is a question from my side. You said that you can flatten the structure by just separating out first name, middle name, last name or something of that sort. But then, does it mean that while designing an ER model, you should have in mind what kind of queries will be there for my database? So, the point of the flattening is that this flattening happens only because the relational model does not support structured things. So, when you are doing the ER model, you do not want to worry about whether your final database supports it or not. It is better to have something which is conceptually meaningful. The name is conceptually meaningful and it also is conceptually meaningful to break up the name into parts. So, you should be doing it that way when you are doing the ER model, not worry about the implementation. But now, we have to worry about the implementation and if it does not support composite attributes, we flatten. That is basically a separation of concerns. There is a higher level conceptual thing at the ER modeling and then we get down to nitty gritty when we convert it. There is actually other way of saying it. If you are going to question the structure of that composite attribute, in the sense you are going to have a query on the structure of the attribute, then it makes sense to flatten it. Otherwise, just represent it as a simple attribute which is name of maybe 50 characters. So, you are saying combine these first name, middle initial and then we do a single string. Exactly. Yeah, you can do that only if you are sure that you will never need to query the independent parts. So, we are not making that assumption. So, we are keeping them separate. Here, of course, it makes sense to keep it separate. But there may be some situations where you are not sure whether you should look at the structure of that name or whatever is that composite attribute. But you could have done it also at the ER diagram itself. Instead of breaking up the structure there, you could have just made it a flat name and then the mapping will be straightforward. So, if you were sure you didn't want the parts of it, you might as well have done it in the ER notation. So, is it revealed at some point when you actually execute the queries that you need to have a composite attribute? You mean? Something of that. So, I am not sure what you mean by that. So, when you run the queries for your database. So, does at any point of time you have any way of evaluating that these are the frequent queries. So, let me go back and represent the structure of this attribute in the sense first name, middle name, last name kind of thing. In the initial design you just kept these as three separate things. So, do we analyze these things? Most queries want all three together. Therefore, you should have combined them by creating a structure. Yes, that is also possible. Although typically the queries are going to reflect the semantic notions and that semantics is already available when you did the ER diagram. Even though you never looked at the queries. So, I am not sure you would look at the queries and come back, but it is possible in some situations. So, just to generalize thing, you cannot look at the query and go back and refine the ER model or in relation model. Typically that is. Yeah, that flattening can be done at the higher level itself or can be done in a relation model. That independent depends on the view of it. Yeah, typically you should not depend on the query because the queries may get added or removed. Any kind of queries can be done. Yes. So, what about multi-valued attributes? Again, the representation is straightforward. We create a new relation containing the primary key of the entity. If it were a weak entity, its primary key is really a composite key. So, we need its primary key followed by the attribute here, whatever was the parts of the multi-valued attribute. In this case, phone number was the multi-valued attribute. So, we get that. Now, if you apply this to time slot ID, what are we going to get? This is a multi-valued attribute. So, we are going to get relation time slot with just time slot ID. You will get two relations. I have not shown it here, but you are going to get two relations. One is time slot with time slot ID and no other attribute because the only remaining attribute is multi-valued. And then, we will get a time slot, let us say times, give it some name, with this thing. Time slot ID, day, start time and end time. So, this relation, we would actually get two relations, time slot and then this one, which we would have to give some other name. Now, if you look at this pair of relation, this guy does not have anything, it just has a single time slot. It could still be useful. Why is it useful? Because section has a time slot ID. So, when we create the relation corresponding to section, time slot ID is going to be in there. And in the relational schema, you would have a foreign key from time slot ID to this time slot. Because here, that is a primary key. So, you could have a foreign key into this relation. In addition, you have one more relation, which is time slot times with all of these time slot ID. And its primary key is really a combination, day, start time and end time. Note that end time is not part of the primary key. It would be a super key. But I think it's safe to assume that if it starts at a particular time on a particular day, it cannot end at two different times. It's semantically meaningless. Therefore, end time is not part of the primary key there. And this would be a foreign key referencing that. And similarly, from section, there would be a time slot ID, which is in turn a foreign key referencing that. That would be the relational schema. Now, in our design, we chose to do away with that intermediate relation. It doesn't serve too much of purpose. The only benefit it gives is we can have the foreign key. But we chose to remove one relation and keep this. So the price we pay is that this can no longer be a foreign key anywhere because time slot ID is not the primary key of this. The primary key involves multiple attributes. So we just chose this to illustrate what would happen if you got rid of that relation. And we reduced the number of relations. So we lost the ability to have a foreign key. But queries are simple enough because we can directly join section with our time slot. Now, we renamed this relation. Since we got rid of that relation, we renamed this relation itself to time slot. So that is the university schema which we have. So the question is, when we are doing the ER diagram, do we have to worry about the queries? Typically, no. We are trying to model an enterprise. What data do we need to store? Queries are a separate concern which we will worry about later. We can't worry about all of them at once. And typically, your query is not going to change your model. Typically, there are some exceptions which actually don't happen at the ER level but happen at the relational level where you may decide to keep something separate or merge some schemas from an efficiency viewpoint taking into account what are common queries. So that part of normalization for performance takes queries into account. But usually at the ER level, we try not to do that because it just confuses the picture. We are almost out of time for this session. We have another 10 minutes. I will quickly go over these other two aspects of which specialization generalization is very important. Aggregation is less important. So what we have is specialization generalization. Again, you are not familiar with this. It occurs in object-oriented design all the time in heritance. So I won't spend too much time on this. But this is actually a very, very useful feature in ER modeling. We have already saw an example yesterday. I think you had mentioned it about this library where you have multiple users, students and faculty and maybe external users. So in this case, we have a person who is an employee or a student and an employee could be instructor or secretary. First, this notation is modeled after the UML notation, which is basically identical using these triangles. The only difference is that the constraints on this in UML have to be stated explicitly. We adopt this thing where if we have a common thing which then bifurcates like this, then that is a disjoint thing, then specialization. Whereas if we do this directly, then it could be overlapping. So which means an employee could also be a student. Other than that, it's just standard. In UML or if you want to avoid confusion, so we could explicitly say total specialization. So there are two other notions. There's a notion of total or partial. Total means that this guy has to belong to one of these two. You cannot have a person who is neither an employee nor a student. Every person must be one of these two. Partial on the other hand means that you can have a person who is neither. So in our notation, if you don't say total here, the default is partial. So you can have a person who is not either a student or an instructor. But if you do say total, then it has to be total. And if you want to make it explicit that this is disjoint or overlapping, you could explicitly say disjoint over here. So that can make it unambiguous with respect to the standard UML notation. Can I ask you one question? If you are saying that that participation is that the hierarchy is total. It is total. Means E1 is completely classified into either E2 or E3 or both of them. So if such situation is there, then are you... Because practically we come across the situations where by creating the hierarchy, we are actually looking at exploiting the extendibility of this ER model. So if you are very sure then only you can go ahead and say that it is total. But then the future extensions are closed actually. You are right. So any constraints which you add like this, that itself is a design process. Should you have this constraint or not? And it has a very major impact for the following reason. Take a simpler constraint which is... Is this many to many or many to one? Now if you said that this is many to one, you are allowing the relational design to pull an attribute into a relation. So a student has a department name. Now tomorrow if your requirement changes and you say, well I changed my decision, I am going to allow a student to be equally into departments. Now you are in trouble. You have to go and change the relational schema which means you have to change all the queries. It is a huge impact on the application. So you should not throw in constraints just like that. Because even if they hold today, if you feel that they may not hold tomorrow, it may change. You should probably not throw in the constraint at the schema level. You may throw in the constraint at the application level. Maybe the application can check that this does not happen. Or you may create a schema which is flexible in the relational schema, which allows the constraint to be enforced or not. And then you add the constraint explicitly on the relational schema. But you can drop it later. If that no longer holds, you can drop the constraint without making any other change in the relational schema. So that is very important. This is something which people tend to neglect. They will throw in any constraint they want which happens to hold. And then hardware the schema so tightly that change later becomes very, very hard. So you have to future proof the design in this sense. Yeah, specialization. So this is the ER diagram. How do you convert it to tables? Again, you are probably familiar with this. Assuming that the specialization is not total, we would create a table for each of these. But you have a choice of what attributes these tables contain. Well, this is the constraints. So let's finish the constraints first. So this constraint on whether entities can be members of a lower level thing could be condition or user defined. This is not used very much. We'll skip it. This says whether it's disjoint specialization or overlapping. We have just seen an example. And then we have total versus partial. Again, we saw the example. So now taking these constraints into account, how do we do the relational schema from this? So here we had person, student, and employee. So in this method one, each of these has the primary key which is inherited from the top one. So note that ID is the primary key which appears here. It does not appear down here. So in a specialization, you should not be copying the primary key down because you're implicitly inheriting all the attributes from above. So you should not again list it here. That's a mistake. So given that you're inheriting everything, that doesn't mean you have to copy everything down into the schema. So one option is to copy only the primary key down. And if you have a student, there would also be a row in the person table to record that student's name, street, and city. And similarly for an employee, there would be a record there. This makes a lot of sense in particular if the specialization is overlapping. If something is a student and an employee, there's no problem at all here. You would keep their name, street, and city once here. And then you would also have extra rows which says that this person is a student and this person is an employee. But then when you want to get information, you have to do a join. That is the drawback. The other option is you copy all the attributes down. So name, street, city have been copied into both student and employee. This makes more sense if the specialization is non-overlapping. So you cannot have somebody as a student and an employee. Then this is okay. So each entity would either be a person, a student, or an employee. And further on, more, if the specialization is total, every person must be a student or an employee, you can even get rid of this table, person table. Now there are some ways of doing this. If you had something which could relate to either a student or an employee, there's a relationship to this. You could create a relationship to person. That's what you would do. But if you get rid of the person table, what do you refer to? And furthermore, with this representation, if somebody is a student, we will not represent them as person. There will only be a record in the student table, not a record in the person table. So from that viewpoint, this representation makes more sense. Because you can have a foreign key referencing this table. So in the library case, you would have a foreign key referencing the person table. And for every student and employee, that would be a row in the person table. So this is actually a cleaner design. And you would do this only if you are sure that certain things will not be required. So that wraps up pretty much our ER thing. And that's it for this chapter.