 Now, let us get back to ER modeling, I think all of you are familiar with the basic principles. So, let me skim over those very quickly and then get into actual, this is meant to be a tutorial. So, we will work out certain examples and I will invite you to suggest domains for doing an ER design activity. So, you can start thinking about it while we cover the basic features. We have any interesting domain which you want to suggest, we can try doing a design online for it, that is the whole goal of this thing. So, first of all, all of you know that ER modeling consists of trying to understand the world as a set of entities and then relationships between these entities. This is a very powerful model, the relational model is a low level model, it says what data are we actually storing in the database, a tabular view of the data, it is very powerful, very nice. But when you want to model an enterprise, you do not want to jump straight down into the actual relations which you need, you could do that, but it actually helps a lot to not jump straight into the details, but stay at a somewhat higher level initially to get the big picture and then dive into the details, relational design. So, the ER modeling is traditionally used as a first step in the exercise of database design. You do some modeling in the using the entities and relationship these concepts, then there are diagrams which is a nice notation to summarize a fairly large design within a single page and have it make sense. And then there is the conversion of this ER design into relations which a database can use. They have actually been proposals to have databases which can directly work off the ER design without having to do the conversion, for whatever reason they have not truly taken off so far, so the relational model still more or less dominates. However, there are some things which are intermediate which people are using these days, maybe later I will mention those briefly. So, all of you know what an entity is in an abstract, it is something could be a physical thing, it could be an abstract concept like this code, it is not a physical thing, but it is conceptually a thing, so that is an entity and entity sets are a set of, so each course is an entity and entity set is a set of all courses maybe. Are we going to use this notation entity, entity set loosely referring to either the set of entities or individual entity from the context you can make out. So, you have some instructor set and student set, the notion of keys again you are familiar with that, and then the relationships and relationship set, again formally a relationship is a individual relationship between a particular entity and another entity, if it is binary or between a set of entities, if it is n-ary, and a relationship set is a set of relationships of the same type, between the same entity type, entity sets, again you are familiar with all of this, I am going to skip the details, so this is a binary relationship between instructor and student, which might presumably be an advisor relationship, so that name of the relationship is shown here, and this is just conceptually shown as a set of lines, so this is not formal diagrams, this is just an informal notation. So, the first step is to identify what are the entities and relationships in whatever thing you are trying to design, after that you can get into the detailed design, but the very first step is to identify what are all the entities that you want to model, and the second step is what are the relationships between these entities. So, in this university model which we are working on, we have been looking at the relational schemas for the university model already, we are moving back to the entity modeling. So, if you look historically you know many years back, courses would start from entity relationship modeling as one of the early activities, after some introductory material to give an overview, they would say the first step is to do entity relationship modeling, then you come with a relational design, you do normalization, then SQL, and this is how many projects proceed, but courses were also taught in the same sequence earlier on, and textbooks were organized in the same sequence earlier on, but it became fairly clear that today the need for courses is to have a strong associated lab, and while you are doing entity relationship modeling you cannot do much in the lab, this is a more hands on pen and paper activity primarily. So, we flipped it in the previous edition of a book, we flipped the ordering of the chapters, so that we start with a very simple introduction to the relational model, jump into SQL, and students can get hands on, they understand it bottom up, and then when you come to this point, they actually have the maturity to understand why this is important, otherwise students would come in and say why are we doing all this entity, it is boring, it is nonsense, it is obvious stuff, you know, this is not interesting, but once they have done the lower level and they get a better appreciation for what is the higher level, so we flipped it and I think other textbooks have also followed this today. So, now let us come back to entity relationship modeling, these are ER diagrams and the standard notation is to have boxes for entities and diamonds for relationships, now within this there are a lot of different notations, this is just one of the many possible notations, in earlier editions of the book we used a different notation, that was at one point a standard for ER modeling at DE1FX or something, but that is long, long ago and these days there is no particular standard for basic ER modeling, so everyone does their own thing, on the other hand there is something called UML, how many of you are aware of UML, most of you, so UML is today widely used in the industry, it is a diagramming notation for many things, UML class diagrams are this part of UML which deals with entities and relationships, so the UML notation because it has been kind of standardized, there are many tools for it, it is widely used in industry today, so what we decided sometime back is we will migrate our ER notation closer to UML notation because that is something which people have to learn anyway, it is widely used, so in earlier notation we had this instructor entity let us say and what are the attributes of the instructor entity here, ID, name and salary, if you use any earlier edition of a book or if you use other textbooks even today, they all use a different notation, what is that notation, they have little ovals around with the entities, so that oval notation is nice in a way that is historical but it is not compact, it takes up more space, this notation is a lot more compact and when you have more complex things, the ability to stick a little bit more in a page and yet have it clear what is going on, ovals are a distraction and UML never had it and so in the current notation that we use, we no longer use the oval, it is a little easier to do this. Relationships continue to be diamonds and in fact UML also supports the same notation for relationship, now the notation that we use is not strictly speaking UML because UML, so far what is here is UML, UML has other stuff, in UML these are class diagrams, it is implicit that these are classes, so there must be methods, there is other annotations about public, private and so on which we are not even using, we are not bothered about it at this point in the diagramming notation, that can come later, we can add methods later on if you wish but not at this stage and there is a notion of a primary key which we underline here, again in the UML class diagram this is not focused on, they do not have a corresponding notation, so it looks like UML class diagrams but it is not exactly the same. The next thing is that we already saw entities have attributes, if you have done here modeling you know that relationships also have attributes, this is a very fake toy example which says that you have instructors, you have students, there is a relationship advisor, we may want to keep track of when the student met the advisor and so we add an attribute the last time the student met the advisor, it is fake nobody will do it this way but just as a simple example and notation wise in the new notation what do we do, we put those attributes in a box and indicate that these are attributes of the relationship by connecting it to the relationship with a dash or dotted line, so those are attributes of the relation, this is not another entity, this is not an entity set, entity sets are connected by solid line, these are attributes. Again in the old notation you would just draw ovals around the relationship and the attributes are there, so now attributes can have a different type, there are multi-valued attributes, single-valued attributes, there are simple attributes and composite attributes, what is the composite attribute one which has sub parts, so again in the old notation, oval notation there were several things in there, two ovals within each other and oval with branching ovals going out, all those notations are there. In UML there is no corresponding notation for these concepts, so in our book we added some notation for this, derived attributes were already there, they are methods, this is already there in UML, so we have just used that notation, so this is a conceptual view, name has first name, middle name last name, this is not the diagram view, just a conceptual view and then it can have multiple levels, address can have street, street itself can have street number, name and apartment number, so in our notation we do this by indentation, again you can do all this using UML diagramming tools, although UML notation itself does not make, understand any of these steps, but we can use the same tools for building these diagrams and interpret this as we please, so in particular here we have name and then indented within it first name, middle initial, we have first name, middle name, initial and last name, the indentation shows that these are fields of the name attributes, so then you can have multiple levels of this thing, address has street and within street street number, so the indentation shows you the composite attributes. Now multi-valued attributes we just enclose in curly brackets, so indicates that they can take on a set of values and the derived attribute is essentially a method, so that is shown with parenthesis here, the actual arguments are not shown here, but they can be added, so that is a quick summary of the new notation, some more of it will come up with a little bit. Now in any relationship modeling, the issue of relationships binary or n-ary, it turns out that most relationships are naturally binary, but not all, so this is a very important decision that you make when you come up with a relationship, it is important to consider whether it is n-ary for n greater than 2 or it is binary, you have to think it over and how do you make this decision, let us see some examples which will help us in understanding this process. So there are couple of examples here of things which are not truly binary, so supposing we have students working on research projects under the guidance of an instructor, now this is slightly contrived, so what does this mean? There is a university, there are many research projects going on, each research project may have multiple faculty involved in it and a student might even be involved in multiple research projects, but when a student is working on a research project, that project has multiple faculty, but one faculty is in charge of that student for that project, you get the scenario? So a student may be working on project one under faculty member one, he may be working on project two under faculty member three, both these faculty members may be involved in both these projects, so if you say a student is working on this project, you cannot identify which faculty member is guiding that student, because there are two faculty members associated with the project, which one is guiding this student on this project, you do not know. If you say a student is working with this faculty member, each faculty member is working on many projects, which project is this student working on? You cannot identify it, so if you create these binary relationships student, faculty or student project, it does not give you enough information to say who is guiding that student on that particular project, this is an example of a ternary relationship. Now what are examples of relationships, which are not ternary, but might appear to be ternary at first cut? Let us say that we have a person, mother and father, is it a ternary relationship, does it have to be a ternary relationship? So you have three people involved in a relationship, one plays the role of a mother, one plays the role of a father, the third is the child. Should this be a ternary relationship or is it binary? Parent child is a binary relationship, which is sufficient along with the gender of the person, it is not whether it is a mother or father, if you need that distinction. So a binary is sufficient. Now what is wrong if you use a ternary relationship here, can it lead to trouble? Can you put constraints? Let us say a person has only one mother, one father. Actually these days that constraint is getting violated, you hear a lot in the news about it. So maybe we should update our design for this, but let us assume for the sake of simplicity that there is only one mother and one father, not two mothers or two fathers. Can you specify this as a constraint with a ternary relationship? So for that first we have to get into constraints. So I will come to constraints on binary relationships, then we will come back to ternary. But let us take a simpler case, forget the constraints for the moment. Let us suppose that for a particular individual, I know the mother, but I do not know the father. How can I represent this? If I am only using a ternary relationship, how on earth do I represent this? So I want to create a ternary relationship, then I have to have a relationship with a null entity. That is not a concept which is there in the ER model. It is a ternary relationship, there must be three objects involved. If I do not know the father, I cannot even represent the fact that I know this person's mother using a ternary relationship. Is this clear? With binary relationship in this case, I can represent information which I cannot represent using the ternary relationship, that partial knowledge. So it can work both ways. In the previous example, there were things which I could not represent using binary which I can represent using ternary. Now it is the other way. There is something I can represent using binary which I cannot represent using ternary. So these are the kinds of things which you have to think through before deciding whether it is ternary or binary. It does turn out that it is possible to create fake entities to transform n-ary into binary. But this is a modeling artifact, an entity which we create artificially. I will come to that in a moment. But this is the ternary relationship project guide between instructor, student and a project. So I already answered this particular quiz question. You are not doing quiz questions at this point, go ahead and read it. It is just reiterating what I said verbally. So what is the answer? Two binaries, better here. So what is wrong with the other? Ternary relationship we just discussed. Problem is, we cannot represent this partial information. What about four? An entity set with attributes person, father and mother. What is wrong with this? You can do it. You can create attributes. What is wrong with this? We have duplicate values, redundant values, part of the story. No, there are three attributes. Who is the person? Who is the father? Who is the mother? So you keep three names. So an entity set person, where they say an ID and then their name and their father's name and their mother's name, which can be null if you have partial information. No, that is not an issue. So each person, I am keeping two attributes, father and mother. The names of their fathers and mothers. They are attributes. So the point here is that we are modeling persons as entities. The father and mother are entities. If they are entities, there should be an entity set for that and a relationship. That is the correct way of doing it. Occasionally, you might do it if in your model, you choose not to model father and mother as entities. They are not people in your database. The problem may be that you need an ID and I do not have the IDs of the father and mother. I am just keeping it for record keeping purposes. Then I might just store an attribute, father's name, mother's name. In fact, this is probably more realistic in most institutions. You do not keep track of who are the entities corresponding to the mother and father of each student or faculty member. But your employer records probably keep names of these people. Then attributes are acceptable. But if you are actually keeping those entities in the database, for example, student advisor, if you keep the ID or the name of the advisor as an attribute, you are not recording the fact that this is a relationship. Now what can go wrong? I can store any ID in there. There is no constraint. I can store any name. Now is that person an instructor or not? I cannot enforce anything on that. But the moment I say that there is a relationship advisor between a student and an instructor, I know that the student has to be related to an instructor in the database, not to some random name which may or may not exist or an ID which may or may not exist. It is a relationship. And when I convert it to a relational model, I will convert it to some relation where there will be appropriate foreign key constraints. We will generate those automatically. So what we are doing is instead of getting down to the nitty gritty of saying I am storing an attribute, I have identified the entities. I know that instructor entities, students entities. I should never create an attribute in student saying instructor name or the other way. It should always be a relationship. The moment you have an ID of entity set appearing as an attribute in some other entity set, there is a problem. It should be a relationship. There is another problem with putting an instructor as an attribute of student, an advisor as an attribute. What if a student has two advisors? You cannot record that if you have a single valued attribute. But if you model it as a relationship, you can choose what constraints to put in the relationship diagram and will come up with constraints just coming up. So you should never use an attribute where there is already an entity set which you should be connecting to by a relationship. Now number three, an entity set parent linked by three relationships to person, father and mother. This is sometimes done as a way to avoid ternary relationship. But in this case, it is totally unnecessary. Binary is the better solution than family. So now let us come to the constraints. You are all familiar with this one to one. This diagram says a little bit more. First one to one is each element in one set is connected to one element. Then is it at least one or at most one or exactly one? No. So the default is at most one. You see in this diagram here, A4 is not connected to anything in B. It can also go the other way. So if you insist that everything be connected to exactly one, you can do it in the notation. I will come to that. This is what is called a total relationship. But one to one does not imply exactly one. So at most one. Similarly, one to many. So one on the left-hand side can be connected to many on the right-hand side. But again you can have something on the left which is unconnected. One on the right which is unconnected. Many to one is the inverse. Many on the left, one on the right. So many things on the left can be connected to one thing on the right. And finally, many to many is unconstrained. Anything can be connected to any number of things on the other. Including none. In all these cases, including none is an option. Okay. So how do you show this? This is the arrow notation that we use. Again this is been part of the ER notation. This particular notation is not exactly there in the UML notation. But UML does have a different way of satisfying cardinality which is actually very confusing. In my opinion, they goofed up on how they represent cardinality constraint. So we stick to our notation. So one to one is arrows on both sides. So the arrow from instructor to student says that an instructor has at most one student. The arrow from student to instructor pointing this way says that a student has at most one instructor. So this arrow notation is consistent with the functional dependency notation which we will look at tomorrow. You would already know about it. So you say that something here uniquely identifies something on the other side. So it is the same concept here. The arrows have the more or less the same meaning. And then this is one to many says a student can have a unique instructor. But an instructor can have many students who they are advising. And finally no arrows mean there are no constraints. A student can have many instructors, instructor can have many students. Now what about the exactly one? Sorry this was a bug in the slide. There should have been an arrow many to one that way and many to many. Now there is another notion of participation of actually this part has two different things. The double diamond I will come to later. Ignore it for now. But what I want to show here is that the participation of section in course is total. What does that mean? Section must have an associated course. I cannot have a section which does not have an associated course. How do you note it in this diagram notation? A double line here. This means this entity side must participate in this relationship side. So every section must have a course. Now this double line does not prevent a section from having two courses which is not realistic. Every section is for a course. That can be modeled by having an arrow here. So this says that a section has to be associated with a course and at most one course because of this arrow. And the relationship between section and course is sec course. Now remember in the schema diagram which we saw yesterday section had an attribute called course ID. It is not there now. What happened? The course ID is modeled by this relationship. You should not have course ID as an attribute. We discussed it just a few minutes back. If you put course ID as an attribute what is the implied? We do not know that it is a relationship to a course. It is just some number there. But the moment you model it as a relationship we know that section is associated with a course. Later when we create a relational scheme out of it we will not only add a course ID in this case to section we will also add a warranty. So that we know that the course ID there matches the actual course. It is not arbitrary course ID. So we have a notion of total participation versus partial participation. Total participation double line meaning every entity in that set must participate in that way. At least one instance of the relationship where it participates. Partial means it is optional. So the cardinality limit so far we used arrows. There is another notation which is used here which indicates how many times an entity can participate in that relationship. So what is shown here? Instructor participates an advisor 0 dot dot star. What does that mean? 0 means the instructor need not be an advisor. Star means the instructor can be advisable. Any number of students. I can put a limit. If I want to enforce 10 I can put 10 in there. On the other side I am saying student participates 1 dot dot 1. So the student must participate in the advisor. Every student must have an advisor and must have at most one advisor. So 1 dot dot 1. So this is actually fairly intuitive and easy to understand notation. It is also the converse of the UML notation which actually flips the cardinality constraints across in which kind of might make sense for binary but for non-binary it is really messed up. But even for this this is lot cleaner. So we will stick to this not the UML notation which is flipped. The quiz question is straight forward. Skip that here. Now there is an issue of keys for entities and the keys are identified and shown with an underline. How do you identify what is the key for an entity set? It depends on the enterprise you are modeling. So what should be a key for students? Should it be a roll number? Should it be Aadhaar ID? Could be Aadhaar in the future. But what if a student doesn't have Aadhaar? Do you get rid of the student? No. So then you may have to create your own ID which may be based on Aadhaar where available or some other thing which is which you create. So generally that's what organizations do. So you have to create an identifier for every entity and use that and in the diagram you underline it. It cannot be name because names have conflicts so generally you create an ID. Now what about for relationship sets? Now relationship sets implicitly have a super key which is the combination of the primary keys of the participating entity set. Now this has some implication. When we are saying this we are saying that if you identify such say binary relationships, if you identify a student and an instructor they can be related at most once by the advisor relationship. Now what does this mean? It doesn't make sense to have two different advisor relationships between the same student and instructor. You might be tempted to do this in the following situation. So we had this notion of the time when the student last met the instructor. So at last meeting student, instructor we had last meeting. So now supposing I say that I want to keep the last several meeting times not just one. Now what do I do? Can I create multiple relationships between the student and advisor each relationship having its own meeting time? No. For a given student a given instructor they are allowed to have only one relationship of the type advisor. They may have a different relationship. Maybe there is some other thing not just advisor maybe I don't know what some other kind of relationship which links a student to an advisor but it will have a different name. If you are looking at the advisor relationship a given student and a given instructor can only have one relationship of this type that's the restriction that we are following. And now if you have last meet you want to create multiple meeting you can do that by putting a curly bracket which says this is a multi-valued attribute. So that's how you deal with last meeting. You can create a multi-valued attribute. You should not create several different instances. And this problem particularly comes up when after translating the thing to the relational model then you decide to add it and if you don't update the ER model it can be inconsistent. In general this is a problem. People come up with a nice ER model translate it to relational model then they make changes things are inconsistent. So for the student projects we insist that they go back and update the ER model if they change the relational model. Okay so coming back the key this combination of the primary keys is a super key. Why is it a super key? Not a candidate key. So what's the difference between super key? Super key is enough to uniquely identify but it may have extra stuff. If you take cardinality constraints the student has at most one advisor let's say that's a cardinality constraint. Then why do you need this combination student ID and instructor ID? Just the student ID is enough to uniquely identify the relationship. So one cannot have two advisors only one. So student ID by itself is a candidate key. Instructor ID is redundant when we are identifying a relationship. The student cannot have relationships with two different instructors. So then the candidate key is student ID and a primary key is one choice. One of the candidate keys is chosen. There's only one choice here. So the primary key would be student ID. So you can have primary keys chosen for relationships which are a subset of the primary keys or the participating sites. So when why does this matter again when we convert from ER to relational this will obviously have an impact. When you choose the primary key for the advisor relationship you would if you had the integrity constraint that a student can have at most one advisor you'd choose student ID as the primary key. So now you may sometimes have redundant attributes depending on how you do this modeling. So let's say that we had an instructor and this is something which I told you a little bit earlier. If I put an attribute called department name I'm losing track of the fact that this there is a relationship to a department entity. If I have a relationship and an attribute it is duplicated. So I should avoid that. Some people mention duplication. So if I have both there is duplication. If I have only the attribute not the relationship I'm losing information. So in general you should you would prefer the relationship rather than the attribute if there is actually a corresponding entity on the other side. But of course when we convert back to table the department name would come back as an attribute of the instructor. So it's important to when you're doing design to see whether you're looking at a schema diagram or an ER diagram. In the schema diagram department name would be there as an attribute. In the ER diagram it wouldn't be there as an attribute that would instead be a relationship. You're all hopefully familiar with the notion of roles. I'll try to wrap up the basic notation soon. But the notion of roles was used for prerequisite. Which course is the prerequisite of which course? Because there are two relations. So prerec relates course with core. So you have to identify that which is the course and which is the prerequisite. So we add labels on these edges which are called roles. So that brings us to end of part one of this thing. In part two we will look at other ER notation. But before doing that let's do some exercises. Let's try to do some ER modeling on the flight. Somebody wants to give suggestions? University we have already done. Let's do something else. I will show you the full university thing coming up. But I want to do something different. IPL? IPL. That's a good one. IPL actually is pretty complex and it raises a lot of interesting questions. Complex enough without even considering Deccan Chronicle and so on. And Kiran Lalith Modi and so on. Even just the basic modeling is very complex. So IPL it is. So what are the entities in IPL? Let's start with those. Teams. So entities. Player. Franchise is the team. Team owner? Team owner. Team owner. Okay. Owner. Okay. Stadium. Coaches. Okay. And schedule. That's is it an entity? What is it? There are certain things which are a little more tricky. So let's put it there. We have to deal with it somehow. Empire. Okay. We can't deal with all of these in the limited time we have. So we will pick some subset and go into more detail. Spectators. I don't think we will model it in this thing. Maybe you want to model that spectator who comes in, you know, all the body paint. But the rest of the spectators we don't care about. So let's leave it out. Commentators. Okay. So there are many more things. Let's stop here as far as entities are concerned. Score. Yeah. So what is score? Score. Is it an entity? Is it a relationship? Let's make a call. But this is something that is important. Result. Yeah. So statistics record. So let's keep that on the side. So there are many more things. It's very complex. Let's try to at least do this much. Okay. So now for each entity, there will be some set of attributes. Let's completely ignore that for the moment. Okay. What are the attributes? Let's assume that each has some kind of ID. We can do it for maybe one thing. So team. What do you think would be the attributes of team? Name. Which may be the primary key. What else? Owner. Should owner be an attribute? Owner ID? Should owner ID be a, supposing we have owner as an entity, which we are modeling. Would you put owner here? No. In fact, there is several issues with this. First of all, even if we don't model owner as an entity here, teams have multiple owners. Partial owner. So you can't just throw a single attribute here. Not only are they a partial owner, they have maybe a stake in it. So maybe you want a separate thing which links teams to the owners with the what stake do they have. So you can have a relationship with an attribute which is percentage of owning. So that's a relationship. City. Yeah. Again, assuming we are not modeling city as an entity. If we do, then they shouldn't be here. So that's a decision we have to make. We have not put cities on this side. So it's okay. We'll put city as an attribute there. Captain. Is it an attribute? So captain, if you put it as an attribute, you're losing track of the fact that the captain should be a player. Captain should not be an attribute. It should be a relationship. In fact, that raises some other interesting issues. So let's put the relationships on a new page. I'll keep this to the side. So you can still see this here. Let's put down some relationships. Team to player. So I'm going to write it like this for now. We'll do the diagram later. For binary relationship, at least this is clear enough. Team player. Team coach. Actually, there are some interesting things. Team coach. Team match. Anything on? Team owner. Yeah. Match. Match stadium. Empire. How about match player? Do we need it or do we not? So I see people disagreeing on this. On what basis would we make this decision, whether to include a player in match relationship or not to do it? So player is in the team. The team is in the match. Isn't that enough? Those who agree, those who agree are not following IPL cricket very closely. You know that the team has many players, not all of them play in a match. So if you need to know which players played in a match, I could have a match player. Should this be necessarily binary? Let's say, this is not real notation. Match team and player. Now why would you maybe want to do this? Should it be a ternary or should it be binary? We know match team. We know team player. So is this going to convey extra information which is needed, which you would lose otherwise? Yes sir. What? I mean with which team these players would associate it. I mean. So supposing we have, we already have the team player relationship. So we know which player is in which team. We have a match player relationship. We know which player is in which match. Yes sir. Now we don't need to link it to team because we can look up the team of that player from there. Yes sir. So I am going to call this IPL 2013. I am not going to call it IPL because if I go across years, there are more complications. Across years, what is the complication? The players used to team. There may be in multiple teams. Then I am, if for a particular match, how do I know which team that player was in? Actually I can figure it out if I keep track of the time. If I know what time the date, year the match was in and then I track which year the player was in which team, then I can work this back. So it's not really required strictly speaking. So I remove team from here and a binary is sufficient for now. Yeah match result. Match player. I want to know what was the score of the player. Should this be a new relationship or can it be an attribute of a relationship? So for IPL, so this can have, again there is not notation, so score maybe how many runs they got and then maybe over maiden run, we get that statistics could be kept there. But this is still missing some information. How did the person get out? Who bolded this person? Who's caught? Who's stumped? Whatever. So who affected the run out? These are more complex. So this way I said IPL is complex. So it's something very simple, not just IPL. Any team sport has the same issues and in fact at some point you will realize that the problem is the year model itself gets horrendously complex and it's actually easier to give up and model it in a relational form for statistics for example. But let's stick to this at this point. There are many, many more relationships. But is there any very important one which we have missed? Yeah, we can have any, stadium is there. Any other important thing? So all these statistics which I just told you is one important thing which we need to add. Anything else? Let's stick to this for the moment. So that's a good point. So at what level do we want to model entity? A ball is an entity, an over is an entity. Do we want to keep track of it? A ball was in which over? Who boiled that over? So it's a question of how deep you want to get into it. Yeah, it's going to be complex step by step because we want to see the relationship for everything. So there is a pragmatic aspect of how deep do you get into it? If you are a true sports fan, you know, I think you are because you raised this, then all this level of complexity is obviously important. If you are like the rest of us who at most watches the match and sees the net result and who hit how many runs, got how many wickets, the over by over starting. Yeah, that's a valid point. The point, the meta point I want to make here is what you want to model depends on the end goal of what you are doing. If your goal is to model what gets published in newspapers as statistics, this is enough. Ball by ball is not published. But if you are quick in for one of those other sites, which actually has ball by ball details of what happened, they may want to record that. Ball by ball. And if you are a coach who wants to record what happened ball by ball for players from other teams, you may actually be able to do some data mining on it and decide that this person is vulnerable to balls which were pitched on the offside, boiled by a pace bowler as opposed to a spinner. So this kind of data mining, it turns out is very important. So this was pioneered long ago. One of the early motivating examples of data mining was in the NBA, the national basketball also in the US. So there were teams which actually hired data mining specialists who took all these play by play statistics and used it to decide how to target players from other teams. So they had a strategy for each team, each player. And it actually helped them significantly. Afterwards, many others started doing this. And I'm sure IPL teams do this today. I mean they don't talk about it. We don't hear about it. But I'm pretty sure some or all teams are doing this to some extent. Okay, so now let's get back. Let's look at, so we looked at this entity, team. And now let's look at some of the relationships with team. Team with, let's say player. Can you see this on your screen? Team with player. So we need to create a player entity. What would be the attributes of player? Some ID will be required. Name. Cost. Okay. This is for IPL2013. The cost varies across years. Country. Okay. Again, we're not modeling country as an entity. Then it's okay to put it here. If you do want to model it as an entity, it will become a relationship. You may want that because you want to do something else with the country. But if you don't, this is good enough. So, yeah. Okay. Now how many runs and wickets? These are statistics. Should we keep those as attributes of this person? Probably not. These things may change. Should they be attributes? Should they be derived attributes? So these are statistics are usually computed based on something else. You could model it as a derived attribute, which is defined in relational terms, you might create it as a view, which is an aggregate on the individual statistics of a match. Okay. So we'll leave it out from here for the moment. Anything else? So in fact, are these kind of statistics? It's usually easier to leave it out of this and then model it in the relational part rather than take it as part of the entity there. So derived attributes will mostly keep out here. Anything else? Yeah. So coming back player, I think this is good enough for now. Yes, sir. So now there's a relationship team player. How to distinguish the functional requirements and data requirements critically. Functional and data. Yeah. I mean, most of the times when we are say designing, there are chances that we may jumble up. Yeah. Let's just leave that out for the moment. Let's keep things simple. So I'm not sure what you mean by functional versus data. Yeah. Like adding, deleting something. I mean, if you want to track who added what. So you want a history of what happened. Yeah. So there are several aspects which we have ignored now. One is the temporal aspect so far. Okay. So for example, we assume a player is in a team that's here. But for what time was this player in the team? They may have been in IPL 1, 2, but not in 3. IPL 3 maybe was in some other team. In IPL 4, he may again be with this team. So there is a temporal aspect which we are not modeling. And then you have other aspects such as if an update was done, who did the update audit logs and so on. So we could tag all that in the ER model. But let's keep it out for simplicity. It gets pretty hairy. So this is a high level model. So we may omit some details here and add the details later at the relational modeling level. So captain. Now, should we have a relationship captain or should it be? Could it be a role in here? So what do we mean by role? So there are several ways of doing it. For the team player combination, we could have a multi-valued attribute, roles maybe. That's one way to do it. And the roles could be captain, wicketkeeper, bowler, batsman, etc. You can have multiple roles, captain and wicketkeeper. So that's one way to model it. Another way to model it is to have separate relationship. Say captain could be a relationship. So we'd leave it out of here and say captain. This might make sense. If you want to get a list of captains, you don't have to worry about the role here and keep the captain relationship separate. Now there is some redundancy here. Can a person be captain of team A and a player in team B? Not allowed, right? It turns out that in IPL, this doesn't make sense. But let's say in university, we have departments, we have instructors. Faculty member is a member of a department. Now can a faculty member be the head of a department without being a member of the department? It depends on the organization. In IIT Bombay, it happens. We have had many times. Professor Fatak was temporary head of school of management for a while. He was never a faculty member in school of management. His background is not management. But school of management needed a head at some point when they didn't have enough faculty member. So he agreed to head the school for some time. So in that case, it makes sense to allow the head to be someone who is not in the team. But for IPL, it probably doesn't make sense. So how do you avoid this redundancy? We will come to that. There's a notion of aggregation which is coming up. When we come to it, we will come back to this example. So we're motivated already. Aggregate, that's one part. Let's take maybe one other relationship. And okay, we had score. How do you model score? So we had over here, player and match. So let me start a new sheet. By the way, as a way of notation standard, when I show a player entity once, I will list the entity. But I'm splitting my diagram across many pages. It cannot all fit in one page. So in subsequent pages, I'll show the entity set like this without showing the attributes. I will not, if I repeat it, it's going to be redundant. It can be inconsistent. So I will just show this. So I can have a player match. I can call it player match, or I can give it a different meaningful name. Say played in, that's up to you which one you choose. So that's a relationship. This player played in this match. Now you might want to model scores as attributes of this. So what are the attributes that you might want to have here? Runs, maybe the other ones. Over, yeah, sorry, this is for, maybe I'll have balls faced here. So this is as a batsman. And then as a bowler, I may have overs made in the usual. Made in is not very useful in IPL, but it does happen sometimes. Runs, wickets, force. So batsman, force, sixes. So you can add on many more things there. So these can all be modeled as attributes of this relationship. But now let me complicate life a bit. I want to track who got this person out in this match. How do I do that? Is the person battered in this match? Who is the person who got this person out? Should these be relationships? Should it be a ternary relationship between match player and another player? You could do it. In this match, this player was bowled by this person. In this match, this player was stumped by this person. So you have separate relation, bowled by, stumped by, and so on. It gets pretty hairy after a while. So there are, there is another alternate. You could do it that way, creating separate relationship for each thing. Or you can have fake entities, new entities corresponding to an event. Let's say dismissal is an entity. So that's an alternative. So I choose to model dismissal as an entity. A dismissal might involve several people, right? So there is player. A dismissal has an associated batsman. A batsman is the one who is dismissed. A dismissal has a bowler, typically. These are optional. I am not making it mandatory. But when I come to constraints, a dismissal has to have a batsman. You cannot dismiss somebody who is not batting. So how will I say that a dismissal must have a batsman? A total participation, a double line there, says a dismissal must have a batsman. A dismissal cannot be associated with two batsmen. Only one batsman is there in a particular dismissal. How will I model that? A dismissal can have zero bowlers. So I will not impose a total thing there. But it can have at most one bowler. Now what else? There is a thumb, maybe. That's one way. There is also caught. And then there are other alternatives, run out and so on. Can you be stumped by two players? Typically, the stumped is the wicketkeeper. Can you be caught by two players? No. Can you be run out by two players? Sometimes that happens. Somebody throws a ball, catches. So maybe run out by. There are no constraints there which allow multiple people to be involved in a particular run out. So what have we done? If we had modeled these as relationships, we would have gone mad. There are so many things connecting this. And many of them are optional. So the key step here was to realize that a dismissal should become an entity. You wouldn't have thought of it as an entity. But the moment you model it as an entity, a lot of things get simplified. And a lot of things which would have been an NRE relationship which was humongous and meaningless, become clearly binary relationships. So whenever you have an NRE relationship, think again. Should you create a new entity and create binary relationships? The answer is typically yes. In this case, it was definitely yes. There are other situations where you do want to keep an NRE, but it is rare. In general, you are much better off breaking into binary by creating a new entity. So is this issue clear? Now when you convert this back to a relational model later on, you might say that I have a table which has batsman bold by stumped by caught by as different attributes, which might be null. Null means that this particular dismissal did not have anyone associated with stump. But if you have run out by, then it's a problem. You can't have multi-valued attribute in the standard relational model. As I know, NRE relationship cannot be compensated with set of binary relationships. If it is truly an NRE relationship, there are many situations which are not really NRE and it makes sense to break it up. But let me also say that any NRE relationship can be converted to binary in this mode. We created a fake entity. This is actually a standard way of converting NRE relationships into binary. The scenario, there is a bowler, batsman and a match. Yeah, so dismissal match I didn't show. So yeah, that's important. I don't have space here. You added one more binary relationship, which is not necessarily... Yeah, but then again that parameter is missing, like who bold and who batted. No, it's all there. This particular dismissal, this is an entity. It's a new fake entity. It is associated with a batsman. So if you take a particular dismissal entity, there is a batsman. It's total. So that's not optional. Whereas these other things are optional. A dismissal match is also total. It must be associated with exactly one match. So that has to be total. So there are constraints which we need to add. So we have to think these through. So yes, you're right. I forgot to do that. And when you review the ER diagram like this, people will come up and say, this information is missing, add it. So yes, your point is well taken that I missed these and I had to add it. But the other point I want to make is that it's a lot cleaner to do it with binary. If you try to model this as ternary NRE, it is hopeless. It's wrong, in fact, in this case. Yeah, any other points? Okay. Good. I think that was a good exercise. Brought out a lot of interesting issues.