 is set and we can start on the final part of today's ER session. So, let us get started with the solution. So, the tutorial was to design an ER diagram for a railway system which has a few basic piece of information highly simplified and let me first give you the solution. It is also available on the Moodle side, but not yet made visible because I did not want it to be visible until everyone has had a chance to try it out. I will make it visible at the end of today's session. So, the things which we need to model are basically we start with entities and then add the relationships. So, we have listed it informally here as part of the problem specification. In general what all do you think we would need to model in a railway system at least from a passenger information view point. The actual Indian railways needs to model lots and lots more stuff, but let us just worry about the passenger side of things. So, first of all you have stations of course, you go from one station to another and then you have tracks connecting stations. This is something passengers often do not need to really worry about, but occasionally they do because of rerouting and so on. And then you have trains and schedules for the trains. Then you have passenger bookings which go from one station to another on a train. So, now what are the details which we need for each of these things? We have made some simplifying assumptions here. So, first of all if you see tracks it is a pretty complicated actually. There are often multiple tracks between any pair of stations. Sometimes a track has a platform at a station, sometimes it does not. There are in suburban sections in Mumbai for example, there are fast tracks, slow tracks, some of the stations have platforms on all four tracks. Some have it on two only, some have six platforms. Life is pretty complicated in reality, but we have avoided all that and kept it very simple. And we are assuming that there are tracks connecting stations which are specified, specific stations. And we are going to assume there is just one track for simplicity and we are not going to worry about any further details of the tracks. We are just saying these two stations are connected by a direct track with nothing in between, no stations in between. Again in reality as you know there are many, many more things. There are points which connect to tracks, there are signals, there are cabins, there are what not, level crossings and so forth. Again we are simplifying life greatly. So, these are the basic entities. And now there are schedules which record what time a train passes through. Now what should this schedule be? Should it be an entity set? Should it be a relationship set and so forth? So, let us maybe go to the white board and start working on it. But yeah, before that this is a last couple of points from the problem statement. In reality trains can start on one day and reach next day, the day after that even in some cases. But to make our life simple we are just going to assume a time, the trains reach on the same day. So, you just need a start time and end time, that is it. And furthermore trains go through a number of stations and our schedule includes stations where the trains do not stop. This is sometimes published by the railways, sometimes not. But surely the railways knows when a train is supposed to go through a station regardless of whether it stops or not. So, our train schedule is going to contain the time through each station. And as the problem statement says if the train does not halt, we are going to make time in equal to time out. If it does halt, we will make sure it is at least a one minute halt. And then there is a sequence numbering, why is this important? If you see a relation with just the time in time out and so forth, maybe you could sort it, but you would not know which are adjacent stations and so forth. So, we are going to assume that the data will contain a sequence number, the order in which a train visits each station along its route. So, that is for the track. For the passenger booking side of things, as we said we already need to know which train which is the from station to station. And then we need the usual information about which coach, seat number, passenger names and so forth. Now, if you wanted to build an ER diagram where passengers are modeled as entities, we have to create a person or passenger entity. We have made it clear here that we do not want that. The Indian railways does not really care whether you have travelled with them 5 times before or 10 times before or you are coming for the first time. Airlines on the other hand are different. They have frequent flyer programs which aim to lure back passengers who have been regularly travelling with them and give them good deals. So, that they will, the people who travel most will be loyal to you and would not switch to some other airline. Railways does not need to do that. So, in the case of such an airline system, a passenger might be an entity because they want to keep track of all the journey, the trips that a particular passenger has made. So, again for the railways we decided we do not care about it. So, that is the set up. Now, let us start working on the ER diagram. Get the whiteboard up here. The first is we need to model a station. This station is an obviously an entity and what are the attributes of it? Usually there is a station code to uniquely identify a station. There is usually a name also for it, not just usually, all stations do have name and in general there may be other information, but I think for our purpose there is not much else that we need. That is it. Next we have the tracks which go between stations. So, that is a relationship. We will come back to it. What are the other entity? We have a train which is an entity. So, what are the attributes of train? There is a train number. There is a name. Maybe we should call it train name or some such thing and for our purpose that is it. In reality again there is a lot more detail associated with the train. How many coaches does it have? So on and so forth. What are the types of coaches? Again we are going to simplify our job, but if you want to make an actual booking system you do need that information. How many second class coaches? How many AC, third AC, second AC? How many seats in each? So forth. We are ignoring all that. Next we need to have a booking. So, what is a booking? We are going to treat a booking as an entity. Now if you had a passenger and a train and so forth maybe you could treat a booking as a relationship, but it would get a little messy. Passenger may have multiple bookings on a train and as we saw between a particular pair of entities you can have only one instance of a particular relationship. So, it is actually much cleaner to treat a booking as an entity rather than as a relationship. But now what is a booking for? Is it a booking for a train on a date? Is it a booking for a particular instance? So this is a train. The train is just a number and a name. So now we can think of a particular train running on a particular date. We could think of it as a week entity which is a train instance or some such thing. So that is an option. Let us keep it simple and not worry about it, but if we did wish to do that we could create a week entity link to train which is a particular instance of a train journey. So we are not going to do that. So now what is a booking? So a booking must have some kind of ID, booking ID PNR. Let us call it a PNR because that is what the Indian Railways calls it I think. And then what else is there with a booking? It usually has passenger name. So we could treat that as a multi-valued attribute perhaps, but actually it is a little more complex. Along with passenger name, so each passenger, what all does the thing record? It has a name, it has an age, gender and a seat or birth number. So we can put all that in there. But before that let us put the other details. A booking is for a particular train. So do we have to put that as an attribute? Should train number be an attribute of booking? No, because we are model train as an entity set. There should be a relationship between booking and a train. So let us say booking train is the relationship between booking and train number. A booking is also on a particular date. So let us put the date here. And furthermore a booking need not be from the starting point to the ending point of the train. It could be from any intermediate station to any other intermediate station. It does not matter which station. So what can we do here? We can have a from relationship and a booking to. So booking from one station to another and booking sorry booking from a station and to another station. And we have treated this as two separate relationship here. Now again there are other alternative designs possible. We could have had a turn way relationship which is which associates booking with two different stations. But it really is not important here. In this case a booking has to have a from and a to station and it cannot be associated with more than one from station and more than one to station. A turn way relationship is useful when you have multiple combinations. So maybe if you could have a booking which is from one station to another and the same booking can also be associated with another pair of stations then having a turn way thing might be meaningful. But here a turn way relationship is pretty meaningless. It is much cleaner to go with binary. Do not use a turn way or higher relationship unless there is good motivation. Here there is absolutely no motivation. So we have got a booking is from station to another station. What else do we need? We need to know the names and so on of passengers. So let us make it a multi-valued attribute. Name, age, outer space here. So I am going to continue this over here. This is not proper notation. This is just for convenience. So there is a multi-valued attribute which has name, age, maybe gender. I do not know if you can read that. It is a little small. And then continuing on there is a coach. I think the Indian Railways probably assigns all seats for a particular booking on the same coach. Maybe I am not sure. But let us assume that each person in a booking could be on a different coach. So we have a coach and then we have a seat of birth. Let us just call it seat. So this is now a multi-valued attribute. So this is a continuation. So the curly bracket starts here and inside the curly bracket we have a name, gender, age, coach and seat. So that is what we have as far as the booking is concerned. The last part that we wanted to model is that a train goes through a series of stations. And we want to know for each station what is the sequence number in the path of that train. And then there is one more. There is a track relationship. So there are two relationships which we still need to model. So let us start with the track relationship. Track is the relationship from station back to station. So we can, if tracks are directional that is traffic is only allowed to go in one direction. We might have a from and a to here. But in our simplified model we are assuming there is no directionality to the track. Just that these two stations are connected by a track. And then a train goes through a series of stations. So we need a relationship between train and station. So let us say travels through or we could call it schedule. Train goes through a station. But we actually need more information. We need to know what is the sequence number. Is it the first station in the journey of the train, the second and so on. May not stop but it is the first station that it passes through. So we want a sequence number. And then what else do we need? We want a time in and a time out. So these are all attributes of this relationship travels through. So that is the complete set of attributes. Now what if a train starts at the station? Then is there a time in for that station? Maybe you could track when the rake actually reaches the station. This is something which the railway sometimes announces but usually does not. So maybe time in should be left out and let us make it null. What about for the last station along the route that it does not actually leave that station officially? Of course once it drops off passengers it may be taken off to the yard for cleaning. So there is no time out for the last station. For all of the stations there will be a time in and a time out. Now note that travels through is a relationship between train and station. And as we said a particular train entity and a particular station entity cannot be related by more than once on the same relationship. So the primary key for travels through would be station code which is the primary key for station and train number which is the primary key for train. Now this rules out trains which pass through the same station multiple times within a particular trip. I do not know if such trains exist. I would not be surprised if there are some weird routes that do this. Unlikely but not impossible. But we have ruled out such things here. I mean I do know that there are stations where the train actually temporarily leaves the station and comes back in order to reverse the rake or some such thing. So those are not really modeled here. So what we have is now reasonably complete ER diagram for the train thing which we started off. If there are any questions about this please ask. I am already seeing a few questions on chat. So I urge all of you who have questions to please ask it on chat. The first question I am seeing is how a gender and age a multivalued attribute. Actually what we have here is it is not that each is a multivalued attribute. It is actually a composite attribute which has name, age, gender, coach, seat and that whole thing is multivalued. What does that mean? Think of name, age, gender, coach, seat as one double. The multivalued attribute here is like a set of such double much like a relation nested inside of a entity here. So if a name, if passenger were an entity we could not just take name here. We would have had a relationship with passenger and then age, gender and so on would be part of passenger and then coach, seat could be part of the relationship attributes for the relationship between booking and passenger. So that is an alternative. Other questions on chat. There is a question before this which is differences between ER diagrams and UML diagrams. UML is a big thing. UML class diagrams are the part which are closely related to ER diagrams. So in our slides we do have details of comparing our ER notation with UML class diagram. I kind of rushed through that at the end of today morning session. So if you can go back to those slides and see the comparison, the notation differences between our ER notation and the UML class diagram. Now obviously if you use some other ER notation there are going to be other differences also. Ours is very close to class diagram. There are other questions on chat. Meanwhile maybe we can take some questions live. Team college boy sir. Sir how to convert the schema into relationship? In general. That is a good question. So that by converting this particular schema into a set of relations. So that would be a good exercise. So let us keep this schema in mind. So let us take station. Station is going to become a relation and then track is going to become a relation. These are all entities and relationships. Train is going to be a relation. Before that we need to add the constraints on this because that affects the relational design. I forgot to put constraints on this. So let us put the relevant one. Booking is obviously for a particular train. It cannot be for two trains. So there is an arrow there from booking to train. Now similarly booking is from one station and to one station. So we have and similarly here and moreover a booking must have a from and to station. So this is a total relationship and a booking must be for a train. So that is also total. Now how about train travels through? Is that a total relationship? Train at better travel through at least one station. In fact at least two. So we could actually put a more complex cardinality constraint here. Total means at least one, that means a train should participate in at least one instance of travel through. But in fact a train has to have a start and a stop station and maybe other stations along the way. So the minimum is two. So a train must travel through at least two stations and so let us put two and it can travel through many stations. We really do not have a bond on it. How many? We do not know. Depends. So two dot dot star. This is saying more than just total. Total means one dot dot something. Two dot dot star is a little stronger than just total relationship. And on the other side should the relationship between station and travels through, does it have to be total? Not necessarily. There could be stations which do not have any train traveling through them at least scheduled train. Apparently in Pakistan this has been happening. The railway system is in the shambles and they have stopped many, many train routes. So there are many stations with no trains going through them. Luckily I think that is not been happening in India. So we are not going to enforce a constraint that a station must participate and travel through. How about a track? Can you have a station with no track? I do not think so. So maybe we should enforce that a station must participate in track. So if it must have a track, coming in it must have a track, going out. So it is participation on both sides of track. If trackway directional would be total. Now it must have at least one but it could have more than one also. In fact it is not bounded. So we could just make it total which is equivalent of saying one dot dot star or we could make it explicit here. So let me just illustrate the other notation. I will say one dot dot star on both sides. That is the same as saying total. So those are the constraints. So I think we have all the constraints that we want and now we can start on the relational schema design. So the first step is to create a relation for each of the station, train, track and so forth. And what about this relation? Booking train, booking from booking to, are they going to be separate relations? Now remember these are many to one relationships with total participation. So in all these cases we can fold those relationships, those relations into the booking relation itself. So booking can have an attribute called from, an attribute called to and an attribute called train number. So we are going to do that. There are no separate relations for these three relationships. How about travels through? Travels through is not many to one, it is many to many. So for that we will have to have a separate relation. So now let us start listing the relation. The first one is station. Now note that this is a schema diagram. It is no longer a ER diagram. I want to point this out and I want to draw it this way because many times people get confused between these. A schema diagram is showing an actual relational schema. Therefore attributes like from to and you know train number which would have been relationships in an ER diagram get folded in and become attributes in the schema diagram. So the notation is different, do not get confused between the two. So station has a station code. That is actually going to look much like the entity and it has a name. Then you have travels through this thing is going to have unlike the relationship it is going to have the primary keys of all the participating entities in this case station and train. So train number, station code. So those are the primary key attributes. Since it is many to many both of them together will form the primary key here and then we have the remaining attributes. So train is a sequence number time in, time out. So now what about train? Train becomes a relation also. That is straight forward. There is nothing new here. Train number and name and no new attributes. Now we have a foreign key constraint. Note that since this was a relationship which became a relation, train number will be a foreign key this way and station code will be a foreign key. Now note that the arrows in the schema diagram are foreign keys. They are not relationships they are foreign keys because attribute is here and it refers to that. Then moving on booking is a relation which now has several attributes. It has a P and R. All the basic attributes are the same as before date. It has a multi valued attribute which we cannot put in line. We have to make it a separate relation. So we will hold off on the multi valued attribute. But it also has the relationships. Let me just show it to you. If you can see it now, the booking has booking to, booking from and booking train. Three relationships. Each of those is going to become an attribute of booking as we saw from to and train number and yeah, train number that is it as far as the attributes of booking go. But note that from to and train number were relationships which were folded in. So each of these is going to become a foreign key. So from is to station. Two is also to station. While train number is a foreign key referencing. Now what else is left? We have to have the multi valued attribute which is going to become a new thing. It will become P and R. It is a primary key and then all the attributes of the multi valued attribute which were name, age, gender, coach and seat. Now we have to give a name to this. So let us call it booking details. Now what are the primary keys of these things? For booking P and R is the primary key as we saw. Nothing new. For train, train number is the primary key. For booking details the P and R is not primary key by itself. You need more. So if there are two passengers with the same name on the same booking is that allowed. I kind of doubt it, but if it is allowed then you may need to distinguish them in some way. So instead of name the other option is use P and R, coach and seat as the primary key. So it is not possible for two passengers to be given the same coach and seat. So these three together will uniquely identify a particular tuple in the booking details table. And for that tuple, for that seat who is it being allocated to what is the reagent gender. So that completes the relational schema for this particular ER diagram. So the second part of today's tutorial is an ER diagram for this program. What is this program? This particular workshop which all of us are participating in. So we want to model various entities. What are the things we would model here? We do want to model participants, the people. We want to model resource centers where all of you are located. And IIT Bombay could also be a resource center, a central one, but still it is a resource center. And then we want to model who is the coordinator at each resource center. We want to model the capacity of each center and the faculty who are attending at each resource center. And for each person we also want to know which institute they are from. So there are a few entities which we have to decide on. Obviously person would be an entity here. Resource center would be an entity. What other entities do we have? We might want to model the institute to which a person belongs as an entity. Why? Because we may want to have like restrictions on you know so many participants from a particular institute and so forth. And we want to know which all colleges have participated in this program. If we just turn it into an attribute of person, you can put in any old value there and there will be chaos. So we want to model institutes as entities. So let us keeping that in mind, let us start on the diagram. So we have a person. Now how is the person identified for the purpose of workshop like this? There is usually an ID, name and any other information that the workshop may choose to collect about that person. And then there is an institute which is where they are from. So we may have a unique key for it which is again created somehow. Let us call it instiD. Then we may have a name, location and may be a few more attributes which whatever we wish to keep track of. And now we need a relationship between the two. Where is this person from? So we can say space is a little limited here. And if you can read that it says belongs to, you probably cannot read it. It is too small. That says belongs to. Can a person belong to more than one institute? No. So let us put in that constraint right away that a person belongs to only one institute. And a person must belong to an institute for the purpose of this workshop. You cannot just walk in from nowhere and participate. Therefore a person must have a total participation and belongs to. So that is one part. Then we have a resource center. Again we will have let us call it RID, resource center ID. We have a name, capacity. Now typically resource center is actually part of an institution. It is not separate. So instead of modeling it like that, instead of having resource center ID here, we could have modeled it differently also. We could have used the institute ID. So if you had at most one resource center per institute, then we could just turn this into it could even be a, what are the options we have? It could be a weak entity identified by the institute. Because it has to belong to an institute. And if we only allow one resource center per institute, there is no need even for a discriminator. That is no key and so you can just get rid of RID name. Everything can go and it simply becomes a weak entity with only one attribute capacity or other things which you want, descriptive attribute. And linked by a identifying relationship part of let us say resource center now part of the institute. So we have forced that. We could keep it independent by having a separate resource center ID. I am not sure how it is actually done for this program, whether there are separate IDs for resource centers and institutes. Then we need to know that a person attends this course at a resource center. Now all of this we have simplified by saying it is just for one course. You may want to model courses and so on, but we have kept it simple because we just had a limited time. So we may have a person attends a resource center. So for one course, the person can presumably attend only one resource center and must attend it. Well not must attend. Some people may not attend. They may be coordinators and so forth. So at most one resource center let us put an arrow that. Now we said we also want to keep track of who is the coordinator for each resource center. So let us make that another relationship coordinator between resource center and person. Now depending on whether we allow more than one coordinator we might have arrow or not. If we allow only one resource, one coordinator per resource center then we would have an arrow like that. So what else do we need? Sometimes we have a registration number for a person. We may have a person ID and the person is attaining this resource center, but we have a sequential number 1, 2, 3 within the resource center for convenience. We could certainly do with the person ID, but if you wanted a sequence number, roll number of some kind or registration number we could add that also. The registration number is some kind of confirmation that yes you have registered with us and this is a number which is given as confirmation. So that basically wraps up our year design for this particular task. Again there are many variants possible. We will not go into every one of them. So let us stop the design here and now we will take questions. We have NRA institute, Bhopal please go ahead. Sir, I have a single query. You told in 7.27, slide number 7.27 how entity, so my question is how entity set plays a role in relationship. Give another example other than that prerequisite role. The role. So we actually had several examples right in the previous one. Let me go to the white board. So here we had track which is between two stations. Now here we did not bother giving a role identifier because it is kind of symmetric. However, if you have a directional track and the railways actually does bother about this. If you have two tracks, they have different track number and they may be called something as simple as up and down. But let us say that they have a separate number and then a track is if it is directional also. This is typically how the railways operate. Tracks are direction assuming that at least two tracks. There are segments with single tracks where the track is not directional. Trains go in both directions. But typically you have two tracks at least and tracks are directed. So then track here would look like this. We have not given a role name here, but we would have to do it in that case and role names would be from and so we have role indicators along with the edges. So it is a relationship from station back to station, but we have two separate role names and when we end up creating a relation from the track relationship, the two attributes of that track relation will be from and to. So we have a direction established from one station to another station. Did I answer your question? Sir, can we add another attribute or we can say column in the same table station so that this problem can be solved? Can we add another column for this same station table? Yes sir, for the track. Adding a column to the station table is not going to help you because there may be many tracks into and out of a particular station. So the only way to model this is by having the roles for the track from and to. You cannot link it with the station in any way. Does that answer your question? That if we have a generalized entity that how can we represent its actual occurrence in the program or in the database. So a person is a generalized entity with specialization being employee and student and so forth. So if the question is how do you translate this to a relational schema, then we saw that. We saw two different ways of translating it. In both the cases we had tables for person, student and employee but the attributes in those tables were different. There is also a special case where we can do away with the person table. If we know that the specialization is total that is every person must be a student or an employee and furthermore it is non-overlapping. So everyone can be only one of employee or student can be both. Then we can get rid of the person table but otherwise we have to keep the person table along with the employee and student table. Does that answer your question or is it something else again? That means instantiating an object only from the generalized class. So if you think of it in terms of objects then you have an object which can play multiple roles. It can be a person, it can be a student, it can be an employee. Now most object oriented programming languages insist that an object must have a most specific type. That is you cannot have an object which can be both a person and an employee and a student. It can be a person and a student by virtue of being a student. It is automatically all the super classes. If it is an employee it will automatically be a person also. But it cannot be an employee and a student unless you have one more class below it which is both an employee and a student. Then the most specific class of the object would be that other thing. But that is not a fundamental restriction of the idea of object orientation. In fact some of the oldest object oriented languages allowed to take on multiple roles and to be in multiple classes without having to be in a most specific class. I do not know this was related to your question but it is a point to be noted here. When we go from this to object oriented data, we are not going to cover it here but that was an issue which people had to sort out. So I think the object oriented features in the SQL standard also went the same route and said that a particular object in object oriented SQL must belong to a most specific class. But the specialization generalization of ER modeling is more general. It allows anything to happen and this is actually the right way of doing things. Thank you sir. Government engineering college Bikaner, please go ahead. Sir, I had to query regarding this resource center problem. Actually there might be N number of workshops held at a resource center and a person can be involved in N number of workshops at a time. So how do you model these two? That is a good question. I simplified life a bit by saying this is for a single workshop but in general there can be many workshops. So how do we model that? So let us go back to the diagram and then see how to modify it. So what we have here is somebody is a coordinator for a resource center, attends a resource center and so forth. But actually they do not attend resource center. You are attending a particular workshop at a particular resource center. So what we want is something which models this that somebody is attending a workshop at a resource center. Now we have a few options for this. So maybe we can have a ternary relationship which links person, workshop and a resource center. So first of all we have to introduce a notion of workshops. This DBMS course is one such workshop and then there are other workshops. So let us add a workshop here. Do not have much space in this figure. Let us work with what we have. So we have workshop and there may be a name, date and so forth. So the coordinator's workshop is one workshop. The main workshop which is going on is another workshop. Now we could have a person who is in a workshop at a resource center. So each of these coordinator and attend could become ternary relationships. So now we could simply add that and this. So now we have ternary relationships to model this issue. But there are alternatives. We could have modeled instead of a resource center workshop. We could also have something which is a relationship between a resource center and a workshop. So what we have not modeled is the fact that a resource center is linked to a workshop independent of the coordinator. So even if we do not have a coordinator at this point, a resource center may have agreed to run a workshop. So maybe there is a relationship. We can give it some suitable name. I would not give a name right now which links a resource center to a workshop. So now this relationship can be turned into an aggregation. So I did not really cover aggregation. But aggregation in the ER modeling context takes a relationship and treats it as an entity. So this relationship between a resource center and a workshop could be treated as an entity. So then in diagrammatic notation let me start a new sheet here because this is getting too messy. We would have resource center not going to show the attributes. Just the entity set workshop. This is a center for this particular resource center is acting as a center for this workshop. And now things are related to this relationship. That is the person can participate in this. They can be a coordinator for this and so forth. So now what aggregation does is take this relationship and treat it as an entity. Diagrammatically we show this relationship with a name and identified by the other entity sets a participate and then draw a box around it. And now we can link this to person. So now a person can participate in this. Attend this person can attend this. A person can be coordinator for this. So what we have done is include the specific workshop and we have a nice example of a aggregation. Now the aggregation basically what did it do? We had ternary relationships. By taking one relationship the center for and turning it an aggregation we have now replaced those two ternary relationships by two binary relationships. And it is really easier to understand this. It is a little cleaner. In terms of the relations that get created in the end it will be pretty similar. There is not a big difference. I did not talk about how to convert aggregations to relational schema and so on. All those details are in the book slides which are online. But the final result will look pretty similar. But in terms of the ER diagram this looks a little cleaner. But it is an option. You could do the other one also. That is also fine. It is not nothing very wrong with it. So is there any follow up question on that? Sir, so we treated this resource center workshop as a single entity for the relation in this aggregated resource. Thank you sir. I think we should probably break now.