 the next few slides are how to convert specialization to schema. So, let me finish that and then I will have plenty of time for questions. So, we have actually finished the basic slides. I have a couple of extra slides which I will put up, but we will have plenty of time for questions and discussion. So, let me just wrap this part up, how to represent specialization in schema. So, we need a relation for each schema in a specialization. So, let us look at a specialization, person, employee, student. Let us just focus on this part of it. Let us ignore instructor secretary further down. So, what we are going to do here is one of the method is to have well both the methods have one relation for person, one for student, one for employee, but the attributes vary in the two methods. In the first method, the actually in both the methods, the top level one has all the attributes. So, if you go back to the schema, the attributes here were id name address. So, the person's schema has id name address. What about employee? It has only salve, student has only total credits here, but if you look at the schema here, person has all the attributes, student has total credits, but it has one extra attribute, which is the id of the person, you know student is also a person, which person that is stored here as an extra attribute. So, this is inherited from person, but note that we are not inheriting name street city. Yes, the student also has a name street city, but we are not storing it here, we are going to store it there. So, for this student id, there will be a record in the person relation containing the name street and city. Similarly, for every employee, there is a salve and we store the id here in this table, but we do not store the name street city, because it is there above. So, what is the benefit of this? Let us compare it with the alternative and then I will come back to the benefit. The alternative is to inherit all the attributes from above. So, person is the same, student now instead of just id name, it inherited street city total credit also, total credit is local, but name street city are inherited. Employee, the local one will salve and name street city are also inherited. So, what is the difference between these two schemas? There are extra attributes here. So, what is the benefit? What is the drawback for each of these? So, first of all, supposing a person can be a student and an employee. Supposing we allow that and we do, in fact in IIT, we do allow that. What is going to happen? The name street and city are going to be duplicated here and here. And if we additionally keep the person as a person also, the name street city will be duplicated three times. But if that person is disallowed, a person can be a student or an employee, but not both. What does this mean? The specialization is non-overlapping. Then it is not so bad. What we will do is if a person is a student, we will throw the information in the student relation, but we will not store it in the person relation. So, name street city is stored only once in the student relation. If you have somebody else who is a person, but is not a student and is not an employee either, then that person will be stored only here in person. So, each entity is stored in only one of these couples. There is no duplication. So, this is okay as long as the specialization is non-overlapping. In contrast, this variant never has any duplication. Supposing we have an overlapping thing, does it matter? We can have a record in student with that ID, a record in person with that ID, but there is no duplication information. Name street city is only in one place, total credit only in one place. The same person can also be an employee. So, I will be in one place. So, this schema is better in the sense there is no redundancy. But there is a cost. If I want to find out, if I want to print out all the details of the student, what do I have to do? I have to join this relation with that relation. So, the price we pay is extra join and run time cost and the query complexity to do all of this. So, there is a tradeoff between these two schemas depending on the specialization being overlapping or not, you would choose one of these two. So, the remaining slides after this are basically a summary of the notation. I also have a slide on aggregation. I will come back to this after taking some questions. I have been talking nonstop for a long time and would like to take some questions now. Let me take some of the questions which came on chat. One of the questions is what is the disadvantage of ER module? So, I will take this question to mean, why do not use an alternative to ER? What is the disadvantage of ER? So, what is the alternative? It is you could go and do directly relational modeling, skipping the ER step. The advantage of doing that would be the disadvantage of ER and the disadvantage is this ER takes one more step, we create a model and then we have to convert it again to a relational model. However, it is well worth it. If you ever try to do a relational model directly skipping the ER step, that is when you will appreciate how useful ER modeling is. So, the advantages over a way the disadvantages greatly. Can you please explain discriminator attribute? That is a good question. In case you missed it, a discriminator attribute is just like a primary key except that it is not totally unique. It is only unique in the context of the identifying entity. So, to take our example with section, the section ID we treated as a discriminator attribute or take, well we had several other ones, but the point is the discriminator attributes of section were section ID, year and semester. Now, if we have section ID 1 and 2 for a particular course in a particular year sem, the same section ID may be reused by another course in the same year sem. That is not a problem. The section ID does not uniquely identified, but in the context of a particular course, remember section is identified by a course. So, in the context of a particular course, the section ID has to be unique. It does not have to be globally unique. Such an attribute would qualify as a discriminator, although it would not qualify as a super key of primary key. One of the questions is difference between existence dependency and weak entity. Now, existence dependency is a concept which says that something cannot exist without the presence of something else. So, in our context, say in Moodle, can you have a post which exists without having a user who posted it? It does not make sense. So, we could say that post is existence dependent on user and similarly, it could be existence dependent on course also. A post without a course may be disallowed or may be it is allowed. Now, we could have chosen an year design where a post has a globally unique ID and it is a strong entity or we could have chosen a different design where a post is identified by first the user ID of the user who posted it and then some other ID which is unique within that user. That is a design decision. The existence dependency is independent of the design decision of whether to treat it as a weak or a strong entity. So, I hope that clarifies that question. KMEA, Kerala again. Sir, can you explain discriminator attributes in your attributes? Discriminator attributes are just like a primary key. A primary key has to uniquely identify an entity globally. A discriminator is like a primary key, but within the context of the identifying entity. So, what do I mean by that? I say I just gave you an example. Let me take the, you know, the Moodle example as one more example. We said that an assignment is part of a course. If we treat assignment as a weak entity identified by course, the assignment ID can be a discriminator, meaning I can give an assignment ID of 1, 2, 3, 4 for different assignments of a course. Now, another course may also have assignments labeled 1, 2, 3, 4. So, the assignment ID by itself is not unique. It is repeats across courses, but within the context of a course the assignment ID 1, 2, 3, 4 is enough to identify a particular assignment. Therefore, we will say that the assignment ID in this particular design is a discriminator. And if you want a full primary key for the assignment, it would be course ID comma assignment ID. So, I would say this is a particular course ID and within that this is the assignment ID. The two things together will form a primary key and globally identify this particular assignment. Does that make it clear? Thank you. And one more question. So, in the year diagram you have represented in reply to. So, we are having the input as well as output to the post at post entity. So, is it possible to have such a relationship for the same entity? Yes, it is. The two posts. Yes, absolutely. So, these are relationship which have two connections to the same entity so this is one example. So, our goal was to keep track of you know first of all what is a post? You are saying something. Now, many posts are in reply. Somebody has posted something and then you click on the reply button. So, in reply to identify which posts you are replying to and we decided to model it as a relation from post to itself. Now, we saw another example of this which is course prerequisites. Course is an entity. It is a relationship between course and itself. Which course is a prerequisite of which course? So, that is another example where a relationship has two occurrences of the same entity set and in all such cases we have to give a role label on the edge. So, that we can distinguish between the two instances. One is for example, for prerequisite. One is what is the course idea and the other is which is the prerequisite for this course. If you flipped it you might think that CS 101 is a prerequisite for say CS 201 or CS 201 is a prerequisite for 101 you would not know. So, you need the directionality and that is where the labels come in to make it clear which is a prerequisite for which one. Does that answer your question? Sir, in the assignment we are already having the idea as primary key. In the starting after that we have make that idea as discriminator type. So, is there any rules to make the clear vision in the starting itself while framing the entity sets? Any rules, standard rules? That is a very good question. The question is how do you decide between these two alternatives? We could have made it a strong entity set by making idea globally unique or we could have made it a weak entity set by having a discriminator. Now, there is no hard and fast rules, but there are some things you might want to take into consideration. So, first of all if there was something which was externally meaningful which you could use as a discriminator then maybe it makes sense to use a weak entity set. For in our section example we have to have an identifier for the section. If we have a section we have to like in a school you have to call it a section b section c section 9 a 9 b 8 a 8 b. So, we already have a meaningful identifier externally visible and therefore we chose to use that and make it a discriminator and use it that way. But if you have do not have anything externally meaningful. So, in our Moodle example I D is not externally meaningful when you create a post you can create an ID internally there no external meaning to that ID. So, pretty much in this example in the Moodle example a weak entity is probably a bad idea an assignment might as well have been made a strong sorry not for assignment for a for a post for a post making it a weak entity it was an option it could be identified by the user. But we chose not to make it a weak entity making it a strong entity was probably the right decision. For an assignment we chose to make it a weak entity was that the right decision well that is less clear because within a course you do want to have an identifier for assignment 1 2 3. So, there is already an externally visible meaningful thing, but there are drawbacks to doing that using that also because sometimes people want to change the number for the assignment. In fact, we have done that in Moodle. So, we create an assignment then we reorder it and change the number what was assignment 10 now should become assignment 3 because we moved it up. If we had used assignment number as a you know part of the key for the assignment and made it a weak entity set as I had shown then we would have to do some renumbering and so on it is it is a pain we have to change the value of key for that thing is something refers to it we have to cascade the update to everything. So, let me put it on the screen let me share the screen. So, given that even though the ID is meaningful externally if it may change then may be a better of creating an internal ID which never changes and that is the original version here with the strong entity set. You look at ID here this was the weak one and the original one was the strong version. So, the strong version may be is better in this context than the weak version because then we do not have to modify it ever. So, if the IDs may get modified it may be better to create an artificial ID. The IDs are not going to change like the section ID ABC is not going to change the person may move from section A to section B, but the ID of a section by itself you know A section cannot suddenly become C section tomorrow. So, there it may make sense to create the ID as a discriminator and the class as may be a strong entity and the section is within the class and so forth. Does that answer your question? Thank you sir. Thank you for selecting. While we move to another center let me take some questions. So, the first question is can you give a diagram for condition defined and user defined. So, is there notation for condition and user defined? Now, we do not have it as part of the diagram notation, but in UML you can always add on to the diagram notation by adding text in the diagram annotations. So, we can just annotate it as condition defined and put the condition. So, it is not a formal part of the diagram notation, but you can always add on to it. Well, what is the point of the year notation? It is to help people understand the schema. If you add to some notation and it is clear what it means, no harm in that. The second part of the question is weak entity is not drawn as a double rectangle y. So, this relates to the Chen notation in the notation we used in our book earlier, where a weak entity is shown with a double rectangle and in our new notation a weak entity is shown with a single rectangle y. So, part of the reason is that UML did not have a notion of a double rectangle and we decided anyway a weak entity will not have a primary key. So, there is no attribute underline. On the other hand it should have a discriminator if at all and so, if there is no primary key automatically it becomes a weak entity. So, we decided not to have a double box notation, but yes other notations do use a double box to make it explicit. Thanks for waiting and back to you Shivaji Kolhapur, please ask your question. In the year diagram of Moodle, what is the discriminator of relationship between user and course? At a single point of time one user can participate in one course, but a user can participate in multiple courses at different times. Is it many to one or many to many? So, let me answer that question in two ways. First of all a user can be in many courses at the same time. So, it is definitely many to many. Many users can be in a course, a user can be in many courses. So, every subject that I am registered to there is a separate course in Moodle and I am in each of those. So, it is definitely many to many, there is no other one to many or many to one constraint. However, another way of interpreting your question is let us say that a person has a forget the Moodle case, but let us take something else. If a person is in a department, in our model we said that a instructor must be in a single department, a student must be in a single department, but branch changes do occur. Students have branch change sometimes, instructors also move from one department to another. So, now if you look across time, user may be in multiple departments, student or instructor may be in multiple departments. At a point in time they are in one department. So, how can we model this? The answer is like I said ER models are not actually very well suited to modeling temporal information. So, there is no clean way of doing this. They are modeling information at a point in time. There have been some proposals to extended, but nothing which has been widely accepted. The only extensions for temporal data are in the context of relational data where you take a relational schema and then you decide how to extend it temporarily. So, as practical solution is to take ER diagram for data at a point in time, you work out the relational model from that and then you go back and look at the relational model and see how things change with time and modify it to deal with the temporal aspect. But this might result in some changes which for example, if you fold a department name into instructor, if you decide that temporarily this no longer makes sense, we want to keep track at different points in time, then there are two solutions. One solution is to keep the schema as is that is it represents the current department of the instructor, but add an extra table which keeps track of historical information. So, this is some approach which many people follow. They add an you know where we had instructor department relationship. Now, because across time there may be many different departments, we will create a new relation instructor department which has instructor ID, department name and a time stamp associated and it can have multiple rows to represent historical information. So, that is a good question how to deal with changes which happen across time and I do not have a very strong answer for it, but there are practical answers and this is what people use in practice. You could also go the way of temporal databases where every single fact has a time stamp associated with it and then you do the whole design that way. We have not covered that at this point, but I will maybe talk a little bit about it as part of relational design tomorrow and there the ideas every couple in the relational database has an associated valid time which is at what all times it is valid and then there is a notion of functional dependencies which are based on time and we can use that to decide on a final schema. So, we will talk a little bit about that tomorrow. We have a follow up question, please go ahead if you have any further questions. When the row is inserted into the table, whether any time stamp information is stored because yesterday there was one query to delete all the records which are more than 10 years old. So, suppose I want to write the queries where I want to retrieve all the rows which I inserted yesterday or which are more than some n years old. So, can I write such queries based on time? That is a good question. You have a lot of time related questions. I am happy you brought these up. So, the specific question yesterday was for information more than 10 years old and there we had registration information and the goal was to forget about takes and teachers and so on where the year is more than 10 years old. So, the time there is explicit in the year attribute, but there are many situations where the time is not in the explicit in the relation, but is stored as a time stamp in a couple. So, it is pretty much all databases allow you to declare a attribute of type time stamp which means whenever sorry not the regular SQL time stamp, but auto generated time stamp which means when the row is inserted the current system time stamp is used as the value for that auto generated time stamp. If you do this the rows have an extra field called time stamp and you can use that you can actually access that from SQL. The specific name for that attribute depends on the database, but you can I mean. So, some databases have an implicit time stamp attribute, some allow you to define your own time stamp attribute and say that it is an auto generated value which is generated from the current time stamp. In either case you have access to it in your query and then you can say that delete things where the time stamp is less than more than 10 years old. That was not required in yesterday's assignment, but yes it is something you could do. Okay, we have questions from Seryar Amani Amin. Hello sir, I am Haja from Priya Amani Amin University. Why? I got two questions. What is the difference between ECA relationship in the ER diagram and one to one relationship? So, ISA is a completely different concept from relationship although you in some sense they are connected, but ISA says that this entity is also an instance of this other entity. So, instructor is a person, a student is a person. The student is not related to a person, that may be a different thing. Maybe a person is married to another person, that is a relationship. But ISA says that these are the same entities, not a relationship, this is really the same entity which is could be both a student and a person, that is what ISA means. So, you are specializing persons to students and employees and so on, but they still remain the same entity. It is not two different entities, it is a same entity. Does that answer your question? Yes sir, how foreign entity is represented in ER diagram? Is it possible or not? The question is how is the foreign entity represented in ER diagram? In fact, it is implicit in the ER diagram through relationship. So, this is a good question because I forgot to mention this. When we convert an ER diagram into a relational model, I talked about what are the primary keys that you create. I forgot to tell you about what foreign keys you create and those foreign keys come implicitly from the primary keys, sorry from the relationships that are created here. So, let us use the slides and go back. So, here is an example of advisor. If you see that we took the relationship advisor and created a relation called advisor with two attributes, SID and IID, but note that these are relationship have edges to entities. Now, each of these as a result each attribute of this relation which is created from the relationship, each of these will become foreign key. So, SID is a foreign key to student, IID is a foreign key to instructor. So, these foreign key constraints are added to the table to generate. Similarly, even if we fold the, we had a student-department relationship and we decided to fold that into student by making department name and attribute of student, but because it came from the relationship, department name is automatically going to be made a foreign key referencing department. So, these are the extra constraints we add to the relational schema. In the ER diagram itself, foreign keys are kind of implicit in the relationships that we create. So, there is no need for a separate foreign key. If it, if an attribute, you know if you had an attribute which you thought should be a foreign key referencing another entity set, that means you made a mistake. You should have removed that attribute and created a new relationship between the two entity sets. So, it is good to look at the attribute, see if something ought to be a foreign key. If so, remove the attribute and replace it by a relationship. When you are doing the ER diagram, when you turn it into relations, yes, the foreign keys will reappear. So, thanks for that question and back to you in. Thank you. Thank you. Thank you. I want to show you a few extra slides. The first is summary of notation which we have used, that you are familiar with. So, if you need to figure out what a notation means, the summary is very useful if you have forgotten. And then the next few things have the alternative ER notation. So, the Chen and ID 1 FX notation, both use these ovals. So, an entity set is a box, attributes are ovals. If you have composite attributes, there are ovals linked from other ovals. Multi-valued attributes are double ovals. Derived attributes are dashed ovals. A weak entity set is double box. And generalization, specialization are also triangles, but which look like this. A big triangle with ISSA written inside it. And for cardinality constraints, there are many notations. In the Chen notation, there is a star and so forth, one thing. In the ID 1 FX, there is something called the cross feet. So, many to many is shown with these are called cross feet on both sides. One to one with no cross feet. Many to one has cross feet on the many side and so forth. And then there are total and partial participation constraints. Here, the vertical line here means that the other side, that is E 1 has total participation. And the circle here, which represents 0 means that the other side E 2 here has partial participation. And in ID 1 FX, you can have a relationship drawn as a direct line, without a diamond. So, that is where those things come in. So, that is just a quick summary. I do not expect you to learn all this, but we have provided it in the slides and in the book. We have a few slides on UML, which is the Unified Modeling Language. It is a diagrammatic tool for representing many, many things. Class diagrams is one small part of UML and that is the part which we have used. Our notation is very closely related to class diagram, but there are some differences. In class diagram in UML, there are prefixes to attribute which like plus is public, minus is private and hash is protected. We have got rid of it, because we do not even look at it. And in terms of the dia tool, we make that as implementation defined, which means none of these three tags appears before the attribute means. And then there are methods, which we use as is. Otherwise, things are similar, but one thing to note is the cardinality constraints are supported in UML, but it turns out the way they write it is flipped from the way we write it. If you see here, 0 star and 0 to 1 are shown in the opposite place here. 0 to star is on this side and 0 dot dot 1 is on the other side in UML. So, in this sense, we have not followed UML. It is very confusing actually, I would say. The notation we have used for cardinality constraint is closer to the traditional E R. Then there are some more on variations between our notation in UML. It is very similar. There are very minor differences. And the last thing I want to mention couple of points. One is when do you use entity sets versus attributes? So, here we have instructor with a phone number. Here, we have another variant, where phone has become an entity and there is a relationship instructor to phone number. Which one do you use? That depends on the thing which you are modeling. If you are a phone company, phone is definitely an entity. You want to keep extra information about it. But if you are an employee database, phone number is perfectly fine to store it as an attribute and not model it as an entity. It is useless modeling it as entity. You might as well just store it as an attribute. Then I have a couple of slide here on converting non-binary relationships to binary form by creating a fake entity. I mentioned it earlier in my talk. So, let me show it diagrammatically here. Do not read all of this for now. I have a ternary relationship between A, B and C. The relationship is called R. What I do is I create a fake entity E. I have to give some ID to it. So, it is generated in internal to the system and I relate E using binary relationships to each of A, B and C. Now, how do I generate instances of this? If I have an actual relationship in the real world R, I will create a new entity corresponding to that. I have to give it some artificial key and then I will insert relationship between. So, that particular relationship related three specific entities. I am going to create three relationships, one relating a new fake entity with the original B entity, one with the original A entity and one with the original C entity. So, that would correspondingly be three relationships R A, R B and R C. If I convert it to a relational model, I will have a relation R A, relation R B and a relation R C whose first attribute would be E, primary key here artificially created and the second attribute for these would be the IIDs, the primary keys of the A, B and C respectively. So, that is how you convert non-binary to binary. There is also some material in the book on cardinality constraints on non-binary relationship, I have skipped it for lack of time and then there is some soft on aggregation. Aggregation is actually confusing, it is not easy to understand I skipped it for that reason and also it causes confusion with aggregation in UML, which has a completely different meaning UML. So, aggregation in ER models is a different topic, I do not have time to cover it here, but I want you to be aware of the fact that there is a concept called aggregation, there is a motivation for it and it avoids certain kinds of redundancy. There are a couple of slides here which illustrated and there are more details in the book if you are interested. So, I will stop there.