 Good morning. Shall we get started? So, hope all of you had a chance yesterday to work on those SQL queries and maybe during the end of the lab session today we can discuss those also. Today, we are going to move on to ER modeling. Now, you will notice by its absence that we have not really covered relational algebra and relational calculator here. In the final, you know, December offering we may cover relational algebra, but time is fairly short here and my assumption is that most of you would already know that material. So, what is the goal of ER modeling? Why do we even bother to do it? If you take somebody who has not worked on designing any database or schema and ask them to go ahead and design something and they know about tables, their first thing that they do is start designing tables. They start designing the attributes, the primary keys, the foreign keys, they go straight into it. The problem with this is that you do not get a big picture of the system. What all does the system do? What are the different things in the system? You want the big picture before you dive into the details and what the ER modeling approach does is it lets you view things at a higher level of abstraction before you get into the details. Since we use relational data pieces, eventually we have to create the relations, but let us not be in a big hurry to do that. So, what we want is a model of whatever enterprise we are trying to create a database for and the question is how do we model this? As most of you are already aware, the ER approach models things in that enterprise in two ways. One is as entities and the other is relationships and sometimes there is a little bit of confusion about whether something should be an entity or a relationship, but most of the times it is clear. Using this diagrammatic notation in particular, one of the big benefits of the ER modeling is the diagrammatic notation. We can give a high level view of the enterprise and then convert it later. Now, we are going to study two things in the first lecture today. One is the concepts of ER modeling which are uniform across all the different notations. The second is we are going to look at ER diagram notation. We are going to use a specific notation, which if you have ever used either the database system concepts book or one of the other widely used books like Navathi or Ramakrishna. You will notice that the diagrammatic notation we use in this sixth edition of the database concepts and in the lecture today is a little bit different from the earlier one. So, you might ask why this switch? The reason is ER models in the original chain notation matches what textbooks have been using for many years. But in the industry, people have moved on to other notations. In particular, UML, the universal modeling or unified modeling language is very widely used and UML also has a class diagram, which is actually very close to ER diagrams and that uses a different notation and most of the tools on the market do use UML. So, we decided that we might as well move to reflect reality in the market. That is one reason. The other reason is that the original notation was a little more cumbersome. It occupied more screen space to represent some amount of information and the UML ish notation that we use is more compact. Now, why did not we use UML itself? UML does not really support ER modeling per se. It is close, but not the same and the benefits of ER modeling are still significant, which is why we do not use UML itself, but we use a notation very close to UML. So, that is a brief overview of what we will be doing in the first session, introducing you to the concepts very quickly, which you have not of familiar with and then the notation, which may be a little bit new. In the second session today, after the tea break, we will have a tutorial, which involves doing some designs hands on. So, we will let you spend a little bit of time working on the design and then we will discuss the design issues and we will do this for at least two maybe if time permits, we will do more than that. So, because ER modeling is very hands on exercise, unless you do it, if you just see the slides it does not help. So, it is really nice to actually work on some real may be not fully real, but realistic examples. So, let us get started this is an overview of this chapter. So, as I said already a database can be modeled as a collection of entities and then relationships between entities and as you know that know and entity generally reflects something, which is could be concrete could be abstract, but it is something for example, a person car or even a course. A course like this one is an abstract entity, it does not have any physical existence, except for the fact that several documents say that there is a course, you are physically here, but the course is an abstract thing, but it is still an entity, because it has attributes it can be described, although it is not concrete it is an entity. And then you have attributes for entities as we just discussed, people have names, courses have titles and so on. Now, an entity is a single thing and single object, an entity set is a set of entities of the same type, which have the same properties. So, from our university design here are two entity sets, one is instructor and the other is student, we have simplified it from the thing, which you have been using to conserve space, here we just have an ID and a name, the student ID and a student name, we do not have anything else here. Of course, in the real design there are a lot more attributes. Then we have relationships between entities, again a relationship is a connection between a specific pair or set of entities. So, in this case here is a student identified by this ID and here is an instructor with this ID. Now, these two are related in some way and in this case the relationship is that this person is the advisor of this student. So, that is the relationship. And just like we had entity sets, we have relationship sets, which are sets of relationships of the same type, that is between the same set of entities and having the same meaning. Now, two people can be related in different ways, in this case there is an advisor relationship, maybe there is also some other relationship, parent of something else, someone may be both a parent and an advisor, but they are two different relationship sets, because they mean different things. Although, the relationships are between the same pair of entities, the relationships belong to different relationship sets, because their meaning is different. So, we can show this diagrammatically for the case of a binary relationship by drawing lines between the associated entities. So, each of these lines here is a relationship and the set of lines is a relationship set advisor in this case. Any questions? And again, as you know, not only can entities have attributes, relationships can also have attributes. For example, here this is this records, when was the last time perhaps this student met this advisor, students are supposed to meet advisors. So, we can record that time as a attribute of the relationship. You have a microphone. Diagram on this slide and previous slide. Yeah, they are not ER diagrams. No, no, these are instance diagrams. Yes. So, why not same approach is used in previous case also, because attribute could be null, like, now what I use is that this approach represents relationship instance by some small square. Okay. So, why do this? That would have been consistent. I mean, why is there a specific This is not a formal notation. This is just something to help with intuition. It is not a formally defined notation. So, No, but it helps in understanding the. Yes, there are different approaches. But previous one and this could be the same. So, this notation has not changed from the previous edition. We have not changed. No, no, Edison, I am not saying previous slide. Okay. So, here we do not have any attribute for the relationship. Yes, we do not have attributes. Here we could be used for previous setup. You mean with the oval there? Yes. Yes. Yes, that is this is inconsistent between. You are right. We could change that certainly. I hope you understood the point he was making and I agree with him. Okay. So, binary relationships are the most common, which we can represent using lines. And relationships between multiple entity sets, which is more than binary, ternary, quaternary are relatively rare. They do exist. We will see some examples. But you have to think twice before you create a non-binary relationship to know if you are doing the right thing, because most of the time it is not. Most of the time the right thing is to use binary. We will see examples later. Now, attributes in our example so far have been simple. They are just values. But ER models in general allow attributes to have structure. So, you can distinguish between simple and composite attributes. There is also a notion of a domain for an attribute, which is a set of permitted values. We are not actually going to focus on this in the ER modeling. When we work on the relational schema, then we need to worry about the type system and the set of allowed values. Here we are not going to worry so much about it. Okay. Coming back, we have simple and composite, single versus multi-valued and derived attributes. So, I think we have an example here. Name, split as first name, middle initial, last name is a composite attribute. So, name is composite. Similarly, address has two levels. Street, city, state, postal code, maybe a country also here. And then street address itself is broken up into several pieces. Now, in the US, this kind of breakup makes a lot of sense because addresses are very structured. In India, it is a lot harder to break up addresses this way. But certainly, names ought to be broken up. And we often don't break up names for various reasons. For example, when we admit students in IIT, we take their names as it is in their school-leaving certificate. And some of them have their first name, middle name, last name. Others have last name, first name, middle name. Others have last name, middle name, first name. So, it's a mess. So, it would make more sense to structure it this way. We actually do that subsequently. We ask them to tell us which part is which. So, that's important. Then, there are cardinality constraints which play a very important role in both in the ER modeling and in how we convert the ER model to the relational model. Again, you are not familiar with this. So, I'll just go quickly through this. This is an example of one-to-one mapping between these two entity sets. So, this relationship set maps each thing here to at most one thing here and vice versa. So, you'll notice that A4 is not mapped to anything, which is okay. Unless the mapping is total from A, in which case everything here has to be mapped. Or if it's total from B, everything from B has to be mapped. In this case, we are allowed to, in general, we are allowed to have things which are not mapped. Then, there is one-to-many where one thing here can be mapped to multiple things there. But on the other side, each of these can be mapped to only one or at most one. It can be mapped to zero also. And correspondingly, many-to-one, which is essentially symmetric, you flip the two sides, one-to-many becomes many-to-one. There's no difference, really. So, we could save a little bit of space, but it's traditional. This notation one-to-many, many-to-one predates entity modeling. And it depends on how you are viewing it. If you think of it as function, if it's a regular function, a function could be many-to-one. If you have a function which returns a set of values, then it could be even one-to-many or many-to-many. So, depending on the application, it may help you to think of one entity being on the left and the other on the right. That may help in understanding what is going on. But really, it is symmetric. As far as year modeling and even the relational design is concerned, it doesn't matter which is left and which is right. So, we could drop one of these potential. And lastly, many-to-many, which allows each one here to be mapped potentially to more than one on the other side and vice-versa. Any comments from your teaching? Sorry, what are the questions? A is mapping A to B, one-to-many. So, this is an example, one-to-many. And B to A, again, one-to-many. So, B to A, this B1 is mapped to many here. So, both sides, if there is one-to-many relationship, can we say that it's a many-to-many? You will not say this is one-to-many. This one will be called many-to-many. Because the situation is what you described. One element here may be mapped to many here. And similarly, one element here may be mapped to many. But it is called many-to-many. One-to-many, okay, take this one. Many-to-many is if one element here can be mapped to at most one on this side. Similarly, one-to-many. Because at the time of explanation with the student, we normally take one example where this sort of thing is there. That is, from both way it is one-to-many. Or one-way one-to-many and other-way many-to-many. Then we can say that that relationship is a many-to-many. So, is it right or for the explanation purpose it is right? It is correct in the sense that if you take a particular entity here in many-to-many. If you take one entity here, the idea is that it can be mapped to many. Similarly, here it happens on both sides. One element can be mapped to many. So, it is absolutely right. That is perfectly correct explanation. Okay, good. Any other such question? Let us make this interactive. Otherwise, this is going to be boring because I am probably repeating things which you have taught many times. Sir, suppose there are three entities A, B, C. A is one-to-many, A-to-B. And from B to C, suppose many-to-many. Let me write that here. With the three entities A, B, C. From A to B, one-to-many. Okay, one-to-many. And from B to C, many-to-one. Okay. With the transitivity A, from A to C, can we say it is one-to-one? So, the question is can one thing here be mapped to more than one? From A to C. So, this one is one-to-many. One-to-many. So, here is an example. Okay. So, far everything is okay. On this side, it is one-to-many. On this side, it is many-to-one. But suppose I want a direct mapping problem. But if you do the transitivity, this A is mapped to two different C's. But that mapping from A to C is one-to-one? No, it is not. Because if you see this example here, one A entity has been mapped to two different C entities, transitively. Transitively. So, if you compose a one-to-many with a many-to-one, that could be in general, it could be one-to-many. One-to-many. One-to-many. Because from here you will map to, let me just check this. Certainly, this example shows that one can be mapped to many. And then again... You have also one element here, mapped to multiple elements there. So, you can also have... So, this is one-to-many. So, you cannot have this mapped here. So, if there is another element here, it must be mapped to something else. But on this side, it is many-to-one. So, you can map this here. This is legal. But now, if you do transitively, A is... So, let us call this A1, A2, C1, let us call this C2. So, if you do the transitive thing, A1, A2, C1, C2. If you do the transitive, A1 is mapped to C1. A1 is also mapped to C2. A2 is mapped to C1. So, what can we say here? It is many-to-many. So, if you compose it this way, it could continue to be many-to-many. On the other hand, if you take a one-to-many, and this is also one-to-many. If you also compose one-to-many with one-to-many, then you will get one-to-many. And conversely, many-to-one, many-to-one will also be many-to-one. But I think you cannot bypass B. It always has to be routed through B. Well, in the model, it will go through B. The question was, if you take the transitive thing and say that this is related to that, and make it that relationship. Does that make some sense, transitivity in this context? Think of it as a query. If this particular thing over here could be thought of as a query result or a view. But query... Not as a base relation, but as a view relationship. So, conceptually, there is no problem. Do we do this cardinality business while answering the query? It's actually used in query optimization. Cardinality is used in there. So, it may not be very useful in general for design, but conceptually, there is nothing wrong in thinking about it like this. So, the notion of keys, again, is well-known. A super key is any set of attributes which uniquely identifies an entity for the case of entity. Similarly, for relationships, a candidate key is minimal, and primary key is a candidate key which you have chosen to be a representative thing, and uniquely chosen. You choose it. You, the designer chooses it. So, all this is standard stuff. Now, for relationship sets, again, we need a notion of key, super key, candidate key, primary key. The question is what set of attributes would form a key? And in the ER representation we use, we are going to insist that the combination of primary keys of the participating entity set forms a super key of a relationship set. What does this mean? Where is the diagram? Here is an example. We have instructor and student, and ID is the primary key of student. Similarly, ID, student ID is the primary key of student. Now, for this relationship set, what do we mean by saying that the combination of this and this, let us call it instructor.id and student.id. That should form a primary key for this. So, what does that imply? It implies that any pair of entities, in the case of binary, in the case of binary, a pair of entities can have only one relationship in a given relationship set. Now, why does that matter? Because if you, let us go back to the set. Yeah, here. So, here, every pair of entities here, instructor, student, has only one relationship. And on that relationship, we have added a date when that student last met that advisor. Now, supposing you want to say that, supposing I want to model not just the last time the student met, but every time the student met the advisor. I want to track all the meetings. Now, it may be tempting to say that, let us create one relationship. Let us take Crick and Tanaka. Here is one relationship with this date. Let us add one more relationship with a different date. Now, that is going to mess up things. First of all, this notion of cardinality, you know, one to many, one to one will get messed up if we create multiple things. Everything, well, you could still get back to it by doing duplicates. But the key thing is that if you have two separate ones in these, then this primary key plus this primary key will no longer uniquely identify this. Now, you could potentially have a version of ER modeling which allows that. However, in our notation, at least, which follows the Chen notation, we are not going to allow that. So, we are going to insist that a pair of relationships, in general, any set of relationships have only one relationship in a particular relationship set. And if you want to model this notion of multiple meeting times, there is a way out, though. And the way out is to have a multi-valued attribute. So, a set of meeting times, it is a single attribute, but it is a multi-valued attribute containing a set of meeting times. So, that's a better way of modeling it than by creating separate relationships. But the identification of that instance, relationship instance becomes difficult that instead of having one value, it brings in multiple values. If you have multiple separate relationships in the same set, then you can't identify it. But if you have a single relationship whose attribute has a set value, that is okay, that's not a problem. There's a difference between these two. So, in this notation, in the ER notation, it's chosen to go with this model where you can uniquely identify a relationship because there's only one relationship, but its attributes can take on. I have some specific question I will ask later. One question I hear is these terms, candidate key, super key, and the key, primary key, are they taken from relational model or they are in ER model? It comes from the relational model. That certainly predated the ER model. So, all of these came from that, but they make sense even in the ER. So, we keep on using all terms here also? No, not all, only the ones which are relevant. So, do we use primary key or do we use super key here? All these three terms are valid here also. They have the same meaning. There's no difference in the conceptually, they have the same meaning. Instead of identifying a record or a tuple, you are identifying an entity or a relationship, but the purpose is exactly the same, so the same terms apply. Now, here's a common mistake which people make. Supposing we have two entity sets. One is instructor, the other is department, and we have a relationship which relates instructor and department. Which department does this instructor belong to? Now, it's very common when people do a design to do the following. The instructor will have certain attributes, ID, name, salary and so on. In the relational schema which we have been using, there is also an attribute called department name. So, people tend to put this also in the ER diagram as an attribute of the entity instructor. Now, what is the problem with this? The problem is there is redundancy. So, you have this relationship which is already recording that an instructor is related to department, and you have a value here which presumably stores a key from the other one, that is department name identifies the department, but that is not made explicit in the diagram. Implicit in the name. I called it department name, and based on that, you have to figure out that this is really something which refers to department. Now, if you drew a schema diagram, we have seen schema diagrams also, which are relational. You take the relations and then draw lines representing foreign keys and so on. That is a schema diagram. That is different. There, you are showing the entire set of attributes of a relation. We have not yet come to a relation. We are modeling an entity, and we do not want to stick in attributes. A relationship should not be represented by an attribute at this stage. Any relationship should be explicitly shown as a relationship, and if you stick in an attribute department name, you are turning a relationship into an attribute, which will happen later perhaps when we create a relational schema, but it should not happen at this point. So, this causes a lot of confusion. Almost everybody makes this mistake. But as teachers, you have also seen many students making this mistake. When we go to the December workshop, it is important to stress this, so that the next set of teachers will carry this message. Any questions about this? That is the basic, simple set of concepts. Now, let us move on to the diagrammatic notation. I will introduce you to this UML inspired notation. So, in this notation, we show the entities as boxes. Relationships continue to be diamonds, although there are some small differences, which we will see. The name of the entity is on top here, and then the attributes are listed below here, and the primary key is underlined. So, this is, if you are familiar with the boxes and ovals notation, instead of having one oval per attribute, we are just listing them one below the other. So, it conserves space. Now, this relationship is identical from the earlier notation, which has long been used. So, then there is an issue of how do we represent composite multi-valued attributes? Again, there is no standard for this. We are creating standards, sort of. And what we do is, we have a composite attribute. We are just using indentation to show what are the parts of the main attribute. So, name has first name, middle, initial, last name. We could have done it differently. We chose to do this. Similarly, address has street. Street itself has street number, street name, apartment number, and so on. And down here is an example of a multi-valued attribute, phone number. So, again, we needed some notation for it. So, we need to indicate that it is multi-value. And last is a derived attribute, age, which is really a method, which does not take any parameters in this case. And it is just shown using brackets as usual, parenthesis as usual. Any questions on this? Now, all of this can again be done using standard tools. We just have to indent attributes and add curly brackets. The tool will not interpret any of this. It is part of the name. But we can interpret it semantically. The next question is, how do we show attributes of relationship? Earlier, we used ovals linked to the diamond. Here, we used the UML convention, which is to show a dotted line to a box, which contains the attributes. Exactly how UML shows attributes. It relates to what you have already talked about, multi-value. If the student has or instructed this, there could be different duration of the advisership. Different meetings. So, how do you model here? So, to repeat the question, we have a student, we have an advisor, which is an instructor, and we have the relationship advisor between these. So, the question is, let's say that a particular instructor was an advisor for this student between this pair of dates and this pair of dates and so on. So, there could be multiple time periods during which this person... One way to do it is to have... This is different from what we used. In our example, we used the last meeting time. Now, what we are doing is slightly different. We are trying to record during which periods of time was this person and advisor for this student. So, start... This is a new diagram. This is different. I am using the same things, but don't confuse it with the last meeting time. That is different. So, start time and end time. But as we discussed, the same advisor could become an advisor again after some time. So, here is one way of representing that. So, that is the set of attributes. This should be a dashed line. So, that this is an attribute. This is the set of attributes of advisor. And in this case, it's a multi-valued attribute which has two parts, start time and end time. So, actually in our notation, if you want to have multiple of these, instead of making it very wide, the way we would actually show it is start time, end time. So, we could even give a name to this attribute if we want. So, we could say that here is another way of modeling this. Move this up. So, we could say valid time start end. This is for one interval. Now, if you can have multiple intervals, we enclose that in parances. So, this is another way of representing the same information. Does that answer your question? So, in that case, what becomes the primary key for relationship instance? The primary key for this relationship instance remains student ID, instructor ID, because there is only one instance between a particular student and advisor. What if this same instructor happens to be advisor again? For the same student? So, we will have, that's why this is an attribute. But that unique identification of the instance is not... There is... No, there will be only one instance between a particular student and a particular even though the instructor was the advisor, let's say in 2009 and again in 2011, let's say. There were two periods of time when this person was the advisor. So, there would be two values here. It's multi-valued. So, there will be two values. So, it's multi-valued, maybe, and then similarly 1st Jan 2011 to 31st December 2011. Maybe in some of your earlier addition or in some... Ramakrishna's book, I found that you create the third entity there and have ternary relationship. Is that the... In this case... You could do that. If you model duration as an entity, which is probably not very clean, that is a workaround. In general, you can always turn anything into an entity. But... At some point... This actually gets... So, this actually gets to one of the issues in ER modeling that sometimes there is no one single answer. There are many ways of doing it and which way do you take? Sometimes there is no clear way. Sometimes it's clear, sometimes it's not. And in fact, another weird thing which happens is sometimes there are many ways of representing it in ER notation. But when you convert it to relations, they all converge back to the same thing. So, you might say it is pointless. If we convert into relations, we need primary key then. Yes. So, primary key issue still remains. Even if that is multi-value attribute, we need to create different relations for that. What will happen is when we convert it to relations, this start time could become part of the primary key. So, you would have student ID, instructor ID. So, you would have to convert it to the ER model. Because we are making that as a primary key in our relation. We could. We don't want to completely change everything from the original model. Not really, but making the entity or maybe the weak entity something like that. Weak entity will help in certain situations, certainly. But here it won't help you. We'll come to weak entities later. Okay. Thank you. One is what happens to our this new ER modeling notation. And second is how to map it to the relational model. So, when I look at the ER model, it appears as if the start time, end time and the relation of that with the student ID and the instructor ID is like multi multi-value dependency as we identify in the while doing the normalization. It is related to that. And secondly, if I look at the mapping of this to the relational model, then probably the Raghurama Krishna's suggestion that you should take out that start time, end time and create another relation, say duration or something, that makes sense. Then you are able to identify the primary key and so that will make probably. You could do it that way. There are no single answer here. There are many ways of doing it. I mean, not every possible way is correct, but there are many correct ways of doing it. So, what I mean to say is, you cannot represent everything that is there in the relational model. That should not be the goal of ER modeling. There are certain things in the ER model which you are unable to directly represent in the relational model and vice versa. There are a few things which you cannot directly, in particular constraints sometimes are a little difficult to map from one to the other. But otherwise, other than constraints, I don't think there's a... The multi-value dependency is one such thing that is multi-valued attributes. So, the mapping becomes more complex if the relational model doesn't support it. If you had an object relational model with supported sites or mighty sites, then you could actually map that directly, if you wished. What is goal of ER model? Like I said at the beginning, the goal is to have a high level view of the enterprise. If you don't do a diagrammatic notation, you give a set of create table with schema, that's a very low level view. A diagrammatic view helps to get a big picture. But it includes everything, supposed to include everything which ultimately has to be mapped to relational model. It includes other details. No, it doesn't actually because none of the ER notations takes, for example, type into account. Completely. So, it is meant to be a higher level abstraction which hides some details, but gives you the big picture. The goal is to show what all is there in the model, the big picture eliminating the details, so that you can in one diagram or in a few sets of diagrams, you can get the big picture of what all is being modeled without looking at the details. Of course, you can't throw away all details. So, it's a decision on what details you show, what you don't. So, in this particular case, we have eliminated types and a few other things like that. Those are the details. Only type, I guess. Yeah, I think, yeah, it's basically types. And the references are implicit here, not explicit. Yes. So, the relationships are shown, actually they're explicit as relationships. So, there are no foreign keys at this point because there are no attributes which would link that up. So, that comes later. In fact, it's a little cleaner because the moment you have an attribute which is a foreign key that forces you to have a many to one kind of relationship. If you want to do many to many, then you have to have a set of values. So, the relationship is a cleaner way which lets you do an initial design without worrying about cardinality constraints and then you can add the cardinality constraints to the same diagram and from that you can decide on the final schema. If you directly try to do the schema, then you have to worry about the cardinality constraints right at the initial design point. So, it lets you sequence these steps, start without worrying about it, then keep on adding detail till you have a reasonably complete picture. This is a part of top-down approach for something like that. But what happened to the diagram after normalization like we split up the table and it is... That is going to come later. So, normally the steps will be first get an ER model to get the high level view, then convert it to tables and then normalization turns out to be just it turns out that most of the common cases where you would have messed up with normalization would actually be handled if you do the ER diagram but there may be a few cases which you missed. So, normalization is more of making sure you did not do anything and if required decomposing. But there may be changes in our high level design right after normalization. So, most of the time when you if you do an ER model properly the normalization step will not actually need to do anything. It will look at the relations and say yes everything is okay. Typically that is what will happen. It is not always going to happen but so the point is that normalization becomes a verification exercise at that point, not an exercise in further decomposing. And most of the time if you have to decompose you could probably have gone back to the ER diagram and done it differently and fixed a mistake in it which required the decomposition. Is it required like if we decompose the tables do we have to come back and change it? You don't have to but if you want to have a straight mapping between the ER and the final schema yes it would be a good idea to if it was a mistake in the ER diagram it's better to go and fix it there. I have some actually comments to make regarding this one. If you look at say database or any software that you are developing this is just a specialized software that we are developing right. So it has certain process to follow. So what we should be doing actually considering that we may have to go and iterate whatever we have done or whatever we have left actually go back and fix those problems and then proceed kind of the thing. So we need to iterate you to kind of the model right. So that probably demands that after the normalization you go back and change your ER model also to make everything look consistent. Because there will be other people who will be looking at your document and making sense at some point down the line. So that I think is really very good point. You are absolutely right and it's a very good connection to the overall software design model where you iterate and fix it rather than just let it be and let the leave the two inconsistent. You are right. So if the question is why do we bother about multivalued and so on when anyway you are going to split it up. See conceptually maybe something is a multivalued so this is closer to the conceptual model of what you want and the relational schema which you come up with is an implementation which you have made certain design choices and translated it. If somebody who is new to it looks at this, this will make a lot more it will be a lot clearer to them to see this than to see a schema where you have relation which has multiple tuples corresponding to this. So that's exactly what I mean by saying this is a higher level representation. It's easier to look at this diagram and understand what it's going on than to look at the relational schema and understand what is going on. That's exactly why we do ER modeling. Otherwise if it were not higher level we wouldn't bother doing it perhaps. Although it does help in coming up with the correct design with minimal violation of normal form and so on. So we are back here. Part of the notation again you must all be familiar with. A relationship can be linked to the same entity set more than once and then the different aspects here are modeled using roll. So roll names. So here is an instance where you have a course identified by course ID. It has a title and it has credits and a course can also have prerequisites. So prerequisite is a relationship between two courses. It says that one of the courses identified by this roll, pre-rec ID is a prerequisite for the other course identified by course ID. So this is a simple example of where roles are important. Any questions on this? So now let's get on to cardinality constraints the many to one, one to one and so on. Again there are many different notations for this. There is no standardization in the ER community. For the sake of continuity from the previous edition we have and also continuing the same ER notation we have chosen to go with the arrow notation. It's easier for people to understand actually. There are other notations which use cardinality constraint of the form zero dot dot something, one dot dot star zero dot dot one, one dot dot one and so on. So there are other ways of doing it but we have chosen to stick with the arrow notation. So in this case the advisor relationship between instructor and student has arrow in both directions which means it is one to one because that may not be very realistic because we have many students per instructor but this is just a toy example. Now this is a more realistic case where the arrow points this way. So what is that arrow signify? You can think of it as a functional thing that is given a student the instructor is functionally determined which is the same as saying it's many to one. So for a student there is a unique instructor. If you remember the functional dependence notation there is an arrow. This arrow kind of reflects that arrow or the functional dependency. So that's one of the reasons we chose to stick with this notation. So that is in this case it is this one is one to many looking at it this way or looking at it this way it's many to one. The default if you don't specify any constraint is of course many to many. Now this diagram actually shows two things. We are going to postpone worrying about this double diamond. Again you are probably familiar with this but let's not discuss it right now but let's focus on this part the double line again this notation is the same as what has been used in the past. What this denotes is that this guy the section entity this is actually a weak entity but we will look at the weak entity issues later what this says is that the participation of this entity set in this relationship set is total meaning every single section must participate in one sec course relationship. There is also an arrow here so that means that every one of these can participate in only one. In other words every section must have an associated course because of the double line and the arrow implies that it has at most one so this says it has at least one and this shows it has at most one associated course. So you can also express cardinally constraints using this notation so here on this link we are saying one dot dot one which means that a student must participate in at least one and at most one instance of advisor. On this side we have zero dot dot star which means an instructor may not participate in advisor instructor may not be an advisor and star meaning the instructor can be an advisor to many students. Now similar notation also used in UML unfortunately the location of these two is flipped in UML causing confusion so what I mean is in UML this will come here and this will come there I don't know why they did it that way considering it came after this is standard ER notation that is rather non-standard so it's probably safer not to use this because it causes confusion between UML and ER the double line and arrow notation doesn't have that problem although this can express constraints which that cannot exist for example an instructor can be an advisor for at most 20 students we could say zero dot dot 20 here which you cannot express using the arrow notation. So now let's get to ternary relationships at least in the university schema we had a little bit of trouble coming up with a very good example of something which is really ternary and you'll see this example is a little contrived but so what is this example we have an instructor we have a student and then we have projects now a project typically for our students is at least at the B B tech level it's just something they do in a course or as part of their degree requirement but here maybe a project has an existence of its own beyond that so maybe it's a project which multiple students participate in or multiple instructors participate in so think of it as a large project it's a group effort so a project has a meaning of its own so it's meaningful to represent as an entity set it's important to represent as an entity set similarly student and instructor or of course entity sets so what this says is a student participates in this project as an instructor guiding that student so whenever you have a ternary relationship you should ask what if we broke this up into binary relationships so here what if we had student project and a student instructor as two separate binary relationships the issue is that an instructor may be involved in multiple projects so maybe there are two projects and on one of the projects two projects in which both the instructor and the student participates but for project one this student is advised by instructor one for project two the same student is advised by instructor two and now both instructor one and instructor two participate in both the projects so if we split this as a pair of binary relationships we may know that instructor one is the instructor for student one we know student one participated in project one and project two and similarly instructor one and instructor two guided the student but we no longer know that instructor one guided student one for project one but instructor two did not guide this student for the same project we will lose that info if we create a binary thing we lose this so in terms of normalization theory we have a lossy joint decomposition so when you take a ternary and split it into two binaries it's a decomposition and in this case you lose information if you split it so this example really requires a ternary if you split it you cannot represent it but there are certain other situations where it is perfectly fine to split it in which case you should be splitting it any questions on this so this slide says let's split it we have a complete session after this for doing designs so we could I'll give you the choice would you like to do a design at this point and then continue we can use this second session for covering the remaining slide so shall we split it that way so let's take a word would you like to split the activity in the December workshop probably and finish this whole thing and then the second session you will be conducting locally so perhaps we should stick to that if you want to see how we will do it in December but if you wish to do it differently now we can fine so let's stick with the December pattern and finish up the remaining constructs so there's this notion of weak entity set which again seems a little artificial but it actually has a lot of sense so what does this mean if you were asked to define a weak entity set you would say that an entity set that does not have a primary key is referred to as a weak entity set but then you are the designer who has to put a primary key in the entity set this is not a very good definition from the viewpoint of a designer so better way to think of this that there is an entity set whose existence depends on the existence of another entity set so this is really subsidiary to the other one this thing makes a lot more sense so we have several examples here in the university example we have a notion of a course CS 101 you know bio 103 whatever these are course IDs and there is a syllabus defined for the course and so on but every semester courses are offered to students so that is a course offering now you cannot offer a course unless the course itself exists in the database so the course offering is existence dependent on the course itself so that is a good example of something which should be a weak entity set now in a notation we get a section a section is basically an offering of a course and each existence depends on the course existing and therefore it's a weak entity set now that brings back the question what about primary key so why don't we throw in the primary key and we will come to that in the next slide also now in the notation you know because it has to have another entity set which on which it is dependent that's called identifying entity set we need to know for a given weak entity set which is the identifying entity set and we are going to use a double-diamond notation to say that this relationship links it to the identifying entity set in general you may actually have something which is existence dependent on two things so you may have two identifying relationship sets although that is less common typically it would just be one identifying entity set so now let's use the diagram so here is a section here is a course and this double-diamond indicates that section is identified by course and how do we know which direction is it why is it not course is identified by section because of this arrow which indicates that section has a unique course with it so this arrow indicates that identification goes this way that's one clue in fact there are multiple clues the main clue though is that this is a weak entity how do you know it's a weak entity set because we do not have a primary key so primary key would be underlined so in this case there is no primary key instead we have dashed underline which indicates that these attributes are a discriminator now what does this mean so you will notice that section does not have a course ID here this is important the course ID is implicit in this relationship which this section has to participate this is total so the section has to participate in this and that relationship tells you which course ID this was for therefore you should not put the course ID here if you put it here you could make this whole thing the primary key course ID, section ID, semester could be the primary key but if you put in course ID here we have the problem that the relationship is there and the course ID is again representing the same thing so that is duplication to avoid this duplication we do not put course ID here and the moment we do not put course ID here there is no primary key therefore it is a weak entity set so that comes back to the original the starting definition of what is a weak entity set now of course there may be many sections associated with a single course we have to distinguish them or discriminate them so the discriminator attributes have to be present and they are shown with a dashed underline this will be it is many from this side to one on this side if you look at it here it is one to many so now in our design what do we have here we have a semester year and a section ID so what this means is that this same course may be offered more than once in a particular semester and a year if so there has to be a section ID which uniquely identifies our section and this is a fairly common requirement pretty much any university which has multiple sections has to deal with this any questions my question is like by using year data model we are modeling the real situations so if we work with some real applications we collect the requirements then very first step in this modeling is to identify the entity sets so we start identifying first of all strong entity sets that's a straight way procedure because we are capable of identifying the situations in which some primary key is available so we directly identify strong entity set but what's the proper way of identifying the weak entity sets that's a very good question so typically what you would do here is identify both of these entities initially without knowing whether they are strong or weak entities later when you start refining the thing you realize that this entity set depends on the existence of the other entity set and therefore we will model it as a weak entity so it's a two stage process in the first phase you don't want to worry about this you just say that yes there is a in the case of the university thing we know there is something called a section which should be an entity we know there is something which is a course so at this point both are just entities we don't know whether they are strong or weak later we will in my analysis during teaching of this course to the students and solving the various case studies to the various students in the lab I identify that it's very easy to identify the strong entity set so first of all we list all the strong entity set and model them appropriately with the designing of primary keys but when we go into the analysis of every strong entity set we come to know that the redundancy problem is there in storing the entities within that entity set which is a strong one we need to reduce down that redundancy level by separately modeling a weak entity set so further we need to make a deeper analysis in every strong entity set and then we discover the weak entity set because weak entity set is meant for the strong entity set on which it is existence dependent yes so I think we need to specify the clear cut steps of modeling the entity sets that when we are going to specify the strong entity set and when we are going to specify the weak entity set that's a very valid point so one way is you know you could identify something as strong entity for sure and leave out weak entities and then add them later but it's probably easier to just identify all possible entity sets without worrying whether they are strong or weak in the first phase and then make the decision on stronger weak in the second phase that's probably cleaner maybe her observation comes from like the section could be modeled as a attribute multi-value attribute of a course so at later point of time she discovered that instead of making the multi-value attribute it needs to be modeled as weak entity yes that is possible another situation comes to my mind like say we are collecting the courses I as a user some data I am collecting courses of all IITs so I will say IT CS110 from IIT movement so if I collect courses from all IITs then even course itself is a weak entity owned by the institute ID that's certainly an interesting way of looking at it yes you could even maybe in the context or maybe in the enterprise scale of the yes you are absolutely right that's a nice example so if you are combining these things then whatever set of attributes was enough to uniquely identify it is no longer enough so then the two possible approaches are to add the institute ID to every one of these in the year diagram itself but maybe it is better to identify the institute as an entity and then each of these now becomes a weak thing in the institute that is probably a cleaner way so that's a very nice example absolutely list all the strong entity set then you analyze the situation of modeling that is strong entity set in deeper way and then think about the kind of redundancy level you are having in that and then you go for modeling or decide the weak entity set yes so that should be listed in our lecture notes definitely yeah so sometimes it's hard to say that you know you should do it exactly like this because the whole process is kind of iterative so sometimes you know you are not sure you make a decision then you realize no that was wrong and then you go back so it is a evolving process yeah so you can have some general guidelines but eventually each time you do this there are going to be there is going to be some uncertainty which there are ways of resolving the uncertainty but you may actually do a first cut thing and then look at it and say ok no this is not quite right I should redo this in particular this point which was mentioned should this be an attribute or should it be an entity that decision is sometimes not a local decision that is you know I may say that you know address is simply an attribute now if I am building a GIS system geographical information system address probably should not be an attribute you want a notion of an entity which is a city an entity which is a street maybe even an entity which is a building and then address becomes a relationship of a person to a building or a flat or whatever so what you should do depends on what else you are modeling so it is an iterative process you may first do this and then say oh but I really need to do this also and therefore now it should be a relationship not an attribute in particular for an institute like IIT we actually need to model what are the buildings what are the flats because we have an allocation system similarly for hostels and rooms so now the allocation now becomes a relationship not an attribute whereas here we have assumed that addresses are attributes so it depends on what else we are modeling in the database thank you sir