 In this session, we will learn elements of modeling. This part of the analysis where, just like we did an exercise for understanding the inventory control system, what kind of information that fellow requires, how that information will be represented. We looked at that, right? We are now looking at the formal mechanism of modeling information system needs. This formal mechanism is called ER model or entity relationship model. Before looking at the information modeling, I thought we will quickly look at a familiar example from the academic information system. Remember I said, course list information management? You are all familiar with the course list. As I said, you all submit individual registrations and you have to prepare a list of all students who are taking my course, who are taking Professor Sudarshan's course, etc., etc. The transformation I said is not trivial and we will look at some of the problems. Then we will go over to information modeling and study the elements of entity relationship model. Here is the first assignment. This is not to be submitted, this is an assignment to be discussed here. Let us call it assignment 0. Let us say we have 4,500 students, about 5,000 now. Each student registers for an average of 6 courses. Agree? So you fill up a form in which you list all the 6 courses on an average, some 3, some 5, some 7, whatever. Submit that registration form. Assume that is the system that exists. The academic office has collected all 4,500 forms and they transcribe these 4,500 forms in some way into computer records. But now the academic office is required to produce a course list for each of the courses that is offered. Assume that you want to print a course list for CS 634. Assume that there are 300 courses running. What is the logic that you will design for the algorithm which from this input data, 4,500 forms each containing on an average 6 course codes. From this, how will you prepare 300 course lists each containing anywhere between 5 to 300 students for each course? That is the problem. This is a typical inversion problem. It is not easy as it appears. The assumption is each form has fixed and variable information. So whole number, name, degree, branch is fixed information. The variable information is the registration data which is course, not course, comma code but course code, course name, credit, slot, etc. And assume that all the forms are stored in a computer file. Let us say, how will you produce this? Any idea how will you produce a course list? You have a computer file. Each record is a form. The form has a part which is a fixed part giving your whole number, name, etc., etc. And a variable part which says n courses. The value of n itself is entered there. Let's say 3, 4, 5, whatever courses you register for. And there are entries for those 3 or 4 courses. What is relevant from the point of view of preparing course list is not the credits for the course, slot, whatever. Just the course code and your whole number. So if the course code occurs in your registration form, your whole number should appear in the course list for that course code. There will likely be hundreds of other students who have the same course code although hundreds should appear in the list for that course. How do you get this 4,500 forms data in a file and prepare a 300 form sort of of the list to be output? How would you do that? Any idea? Appears to be a simple algorithm, right? Suppose you had a file of 4,500 forms, physical file. All these 4,500 forms are there. And you are told to prepare a course list for all the courses. How will you go about it manually? Don't forget that till recently IIT Bombay, the course lists were produced manually. Not very recently but 20 years ago, 20. And even today in many institutions, such course lists are still produced by hand. Now you are definitely superior in intelligence and wisdom to a clerk. So if a clerk can do that, you should be able to do that. So here is a quick recap of roughly what we all agreed. Read the file one record at a time. If any of the courses matches CS634, print the whole number and name. That was the first solution for CS634. In a single scan of the file, the course list for CS634 is produced. You have read 4,500 records. However, the entire registration data for all students will have to be processed. Whether or not somebody has registered for that course. So run the previous algorithm 300 times once for each course. That is what roughly amounted to. The file will be read 300 times. How costly is this operation time wise? Okay, that's one question. You can do the other way round. You can keep this 300 one by one. You can take the course and read 4,500, whatever. Ultimately read the file one, storing all the data. So each record is 80 bytes, 4,500 students, this many bytes. All of you are familiar that the modern computers have much more memory than this. So you can avoid making computer read and write data from files by reading all of it into the computer memory and compile lists for all courses in memory. Assume even 500 students per course. Okay, you can still have enough memory allocated to each course as an array and you can prepare that. And that is that brings us to the very fundamental aspect of programming for processing information. And that is you can never assume that all of the data will fit in the memory. Never. You can only assume that one record at a time or typically a few records can sit inside memory. But otherwise the data file will reside on the desk. So this is a better approach. First prepare a file of records with following structure. Role number 8 characters, course code 6 characters. This is very interesting. Let's get back here to see this. Here you got the file which we call the registration file in which as we said we have a student with role number R1 and he has registered let's say for C1, C2, C3. Let's say another student R2 who has registered for C1, C4, C5. Let's say, example. Now what I do is I read each registration file only record, only once. I prepare a new file. This is a new file. In this new file I write a special kind of record. I read this. Here is role number 1 and he has registered for course C1. So I write a file R1, C1. Same student has registered for C2. I write another record where I repeat the role number, write C2. He has registered for C3. I write R1, C3. His registration record is over. I look at the next fellow R2. R2 has registered for C1. I write R2, C1. R2 has registered for C4. I write R2, C4. R2, C5. In same way, so I read this data only once. But for every record that I read, I prepare as many smaller records as I require depending on how many courses that fellow has registered. At the end of a single scan of this file, how many such records I will have likely 4500 into approximately 6. Because on an average everybody registers for 6. It will not be 4500 into 3N. But still large number of records. Now I do a very fancy thing. I sort these records. Notice that this file is sorted or role number if my original file was sorted. But it doesn't matter here. But I sort these records on course code and within course code or role number. The transform file I will get will have R1, C1. But R2, C1, may be R7, C1. That means all the fellows who have taken course C1 will come in the sorted order first. After all these fellows are finished, C2 will come here. Maybe R1 will come and other fellows will come. So in a single sort action of this file, I would get the data reorganized subject-wise. Now I read this sorted file only once. I read the first record R, C1, R1. I start outputting course list for C1. R1 belongs to C1, output the role number. R2 belongs to C1, output. R7 belongs to C1, output. Suppose R1 ends, R2 starts. I will announce course list for C1 is over. Course list for C2 is now beginning. That means just as in a single scan of the input file, I could create all the relevant information about course registration. By one sort I have converted into a position where a single scan of this file can now help me create all the records for all the courses. Do you see this point? Two points I would like to make. I could sort only this because this information has clearly a sort field identifiable, namely course code field. Here there is no such field because this is not a single information. There are multiple pieces. What do I sort? Even somebody else who registered for C1 may not write C1 in the first slot. He might write it in the fourth. So there is nothing that I can sort here. I can only sort these records on role number, which is not useful. So by breaking this information into pieces which are relevant for correlating the student and the course, please note that every record here gives me an association. This student has registered for this course. This student has registered for this course. This association information I am able to extract, one single sort which will take much less than multiple scanning of any file, I will get this. Do you agree that this is a better process? Right. The reason I spend this time is, as you will see in the subsequent discussion, information modeling actually attempts to capture this principle in proper representation of information that you have to handle. Sir, when it's all right now, then won't it make sense to immediately add it to the corresponding course rather than doing all this later on and again sorting it? The whole point is there is nothing like a corresponding course data. There is no role list maintained ever. There is the whole point. The only thing that you need to ever maintain is this. If you index it on roll number, you can always reconstruct the registration data for that roll number. If you index it on course code, you can always get the data for the course. So the presentation to people from a different perspective is a different aspect. The storage is different aspect. Our problem is that in our mindset, just as we have a physical registration form and a teacher has a physical course list, we presume that that's the only way in which information must be stored inside the computer. As we shall see in the modelling, that is not necessary at all. Can you not see that if I have separate information which describes the basic student with roll number, name, hostel number, et cetera, somewhere and basic information of 300 courses, course, course lot, et cetera, et cetera, then the only information I require to associate a student with a course is this information. Absolutely nothing else is required to be stored. You agree? Any other information stored will be superfluous if I have the basic information for both the cases. Right? So let's get back to the modelling. This is a better approach as I mentioned. Prepare a file of records with this form. Read each record of the registration file. For each course code, output one record of the above type. So up to six records on an average will be created per student based on the registration. Then sort this file on course code and produce this list by one by one for the output. This is standard conventional way of procedural programming. This is not how it is done in databases. But to see what is done in databases, first it is important to understand how modelling is done in a model. And that is what we shall see. We will ignore this. I already mentioned that business information system, the data is always stored in non-volatile files on some external media. And entire files are always considered too large to fit in memory. Processing is always done record by record. Reading, writing to media such as this needs significantly more time than to memory. And disk IO is order of milliseconds. Memory read, write is order of nanoseconds. Not one nanosecond, but tens or hundreds of nanoseconds. So thousands of times difference. It is in this context that we look at information system modelling. Once we identify our functional requirements, what that system is supposed to do. Academic office system is supposed to take care of registration. We need to know more details about information needed and how it is represented. What should be the user interface to the information system. To students you should see a screen where you can enter all the courses. To teacher I can see all the students for my course etc. Specifications for processing and control for processing sequence. Which is the crux of analysis. Takes care of defining information structure. So which are the entities and what are the attributes that participate in the information system. What is the functional behaviour? How does the data flow and how it is processed? And what are the control structures? What events will occur? What actions to be taken at that event? What are the state transitions? You all understand state transitions? It's a simple mathematical concept. Entity relationship model which is a classical model of information system. What currently is used by software professionals are object oriented paradigms such as UML or universal modeling language and so on. But this is so fundamental and important and easy to understand. I thought we will start with this. First we define an entity. Entity is an object or thing which is relevant to our needs. Part is an entity. Supplier of parts is an entity. Plant is an entity. Transaction is an entity and entity has some information associated. Student, teacher, course, department, hostel these are all entities. An entity set or what is in object terminology called object class is a set of entities of the same type. So all students form an entity set. The set may be named student but the moment we name a set it means there are 4,500 students in it. Course is an entity. There are 300 courses. So it may be 10,000 parts. So the entity represents a set actually. Obviously each entity has certain attributes. What are the attributes? Characteristic features of any entity in a set. Every entity has many characteristic features. Take a student. What features student has? Student has a name. Student has date of birth. Student also has height, weight. For academic purposes what are the relevant things? Roll number, name. Maybe hostel number. Maybe room number. Okay. So for thousands of attributes a student may have you are going to choose only those attributes which are relevant for that information system. That's the crux of identifying attributes describing an entity in the context of information system. So features that you select must play a role in the business functions otherwise they are meaningless. So part entity for example these could be the attributes. Please note I am using abbreviations which I think can make sense. P number, part number. P name, part name. P location, location in the inventory which bin. P Q T Y, quantity at hand. P R O P, reorder point. P R O Q, reorder quantity. P O date, the part order date for supply. P sub the supplier etc. For transaction entity I could say T date, transaction date. T type, transaction type. T ref, some reference number of all. T quantity. Notice that these names are abbreviated merely for human convenience because we are lazy in writing full names. If you use COBOL programs which permit variable names or attribute names of 31 characters then you are required to write a part, reorder point. In TCS for example there is a rule that if you had a variable name which was less than 20 characters long you will lose your job. Why? Because the program that you write should be understood easily by others. The naming therefore is an important aspect of the analysis and therefore of the design. This can be considered to be desired. Tech student entity. Role number, name, hostel of residence, room number, performance index as we understand it, CPI. In some of the participating colleges it might be marks obtained. Courses registered, hobbies. Why would hobbies be relevant? They may not be relevant for academic office but they may be relevant for example for a hostel, sports secretary to figure out who plays hockey, who plays badminton, who sings well. So this is another point to note. What may not be relevant from an academic information system perspective may be relevant from another perspective or application in the same environment. So consequently just as you have an attribute list relevant for academic office there could be attribute list relevant for sports, attribute list relevant for something else. And the common denominator all of these must constitute that attributes which are meaningful for an organization. Okay. The entity relationship model attempts to formalize this representation. This permits us to depict information that we need to handle. Entity set is represented by a rectangle and attributes are represented by ellipses attached to that rectangle. It's a very simple diagrammatic model. So this is an entity model. For example, he is a student entity. You see all these ellipses S-roll, S-name, S-hostile, S-room. I have added S to indicate that it is student. Of course, in academic context we are only talking about course and student but in a larger context we will say student, role, student, hostel, whatever. But this is understandable by most of you therefore I have kept this here. Notice something else also. Notice that some of these have double ellipses. Notice that one of the attributes has an underline. Underline would probably mean unique that is very obvious. Role number so role number is unique for every entity in the set even if I have 5000 entities. Why can't I choose S-name? There could be two people with the same name. I can't choose hostel because in a single hostel there will be many students. Suppose hypothetically because that condition does not obtain in the hostels here but maybe in some of the participating hostels it may be true that we guarantee that every student will be located in a room. There will be no two students given the same room. Then hypothetically cannot the combination of hostel and room be unique? What could be the meaning of the double ellipses? How is it different from the single ellipses? Yes. As all of you are murmuring these are multiple values. S-courses means a fellow may register for 6 or 10 courses. Similarly a student might have 10 hobbies or 2 hobbies. In general this represents a attribute which could have multiple values but this represents a attribute which will have only exactly one. A student will have one name, one hostel, one room not one CPI. The CPI will change but at any given point in time the student will have only one. So that is why these are single ellipses. This is how I would represent this entity set in a table. You agree there is a very simple straight forward translation. Let's go back to this student s-roll, s-name, s-hostel, s-room, etc. s-roll, s-name, s-hostel, etc. What you see on the top are attribute names. Notice that this is metadata. This is actual data. 78011012 is the actual roll number value but s-roll and please remember that s-roll on its own does not signify complete information about metadata. We know that it is roll number but whether it should be only digits, it should be characters, nothing is specified so far. So the model only pictorially permits you to depict what is the entity and what are the associated attributes. Much more needs to be done to describe this analysis properly. However you can see that if I have such an entity model I can only represent the data in a table straight forward. I write all the attribute names as column headers and one row per student I will have 4500 rows consequently 4500 records in a file is the direct file representation in a data file for the entity set called student. That's very easy to understand. Of course there is a problem. For example the multiple valued attributes but let's consider this course you will agree that course is identified by course code, course name, course credits, course students, course faculty maybe course slot etc. You will also agree that course code is another artificial creation of mind to give uniqueness to code. You will also agree that number of students registered for the course number of faculty members teaching the course could be multiple people. But you will also agree that converting this into a table form is straight forward and consequently the table form into something else in a computer file is also straight forward. So consequently I could have a file or database table which contains information about course codes so there will be 300 odd entries there and there could be 4500 entries for students and that will create and maintain all proper information about students and courses. Barring of course these multiple kind of things. Here is a problem. Continuing further the table representation may have the C code some other information and C faculty assumes here 634 is being taught by Deepak Phatak and Umesh Belloot. Now there are two people here. Same problem that we will face that we were facing while representing the form of registration. Notice that the registration form as presented to us as a user interface is one thing but storing this information inside a digitized file is a different thing. I cannot ever sort this on faculty. If I sort this on faculty Deepak Phatak will come at D but Umesh Belloot will not come at any specific place here. This is where we need to look at attributes in some more details. Now this is something that is not necessarily very obvious so let's quickly go through the types of attributes that we may encounter in life. Single valued attributes is very straightforward. You all know. An attribute which will only have a single value at any time. Role number, name, hostel etc. Derived attributes is another type of attribute. We shall see what derived attribute. Composite attribute is third type of attribute. Attributes with null values is another classification of attributes and multi valued attributes for which we saw those double ellipses. So far you have seen and understood intuitively what is a single valued attribute and what is a multi valued attribute. We will see how to handle multi valued attributes but let's quickly look at other types of attributes also because we do encounter them while modeling. Single valued attributes as I said is very simple and straightforward. Role number of a student, course name date of birth of a person, capacity of a lecture hall, price of sugar at a shop, these are all single values. Common sense, no discussion required. Derived attributes an attribute whose value is derived from other independently defined attributes. For example, age of a person. Suppose in the student entity I had associated an attribute called age. It is valid, it is relevant. I want to know how old is my student. The trouble is that if I enter age say 18. Now that 18 is valid at most for a duration of 1 year. It might be 18 on the day I capture that information. Tomorrow it may change. So this sanity and longevity of that piece of information is guaranteed to be lost sooner rather than later and that is why it is not considered correct to choose a derived value as an attribute. It is better to store rate of birth from which age at any date can be derived at any time. Total salary of an employee is another attribute. It's an important attribute after all that is the salary I get so I would like to know that. But because it consists of a sum of basic pay allowances minus deductions. Better to store separate attribute values and calculate the sum because if sum allows changes, some deduction changes then I should be able to calculate the sum correctly at any point. So you understand derived attributes now and you understand that it is not a good idea unless there are compelling reasons to store the derived. In no formal database the derived attribute alone will ever be stored and reprinted. Even if it is required for speed of calculation for example because you don't want to spend every time calculate the total something you will always store the independent attributes and additionally you must store the derived attribute knowing fully well that this derived attribute may not be consistent at all times. If something has changed the derived attribute must also change. Composite attribute. Consider Ramesh Wadwani who is going to give a big donation we just negotiated for our bio school and I want to store his address. How would his address look like? There will be a street in which there will be house number and street name then there will be city then there will be a state or postal index number and then country the Americans call this pin as zip so I will have to also identify in metadata somewhere then why I call it pin it is zip for us whatever and then future planet and solar system if a future donor comes from Mars for example. All these together constitute the address address itself appears as just one attribute of the donor in it so donor has name amount of donation given and donor has address but the address in turn consists of so many things now there will be occasions when I would like to print the entire address and forget about it there will be occasions when I would like to identify individual components of that address for example I want to find out how many donors from Bangalore do I have I would like to sort the data on city and the city would be lost if my representation of the attribute is only address again in case of composite attribute it is better to put individual components as attributes and define some kind of conglomeration attribute for the purposes of handling the composite entity called address that is also clear null valued attribute what if value of an attribute is either not known for example CPI was an attribute when you join the institute what is your CPI during the month of September is it zero no it's not zero certainly it's not it's not known will it be alright to represent that unknown quantity by zero imagine your parents getting a report card saying zero CPI before you even give exams not acceptable to you it is not known I will have to find a mechanism to say that I don't know this value at this particular there is no known mechanism in conventional programming when you want to represent data slated as null you use very funny kind of internal representation for example at one time people used to put minus one for salary to imply that it is null obviously nobody can have minus one salary if name is not known star star star star star star star human being can figure out that star means star cannot be anybody's name so some problem but then you have to write logic in your program to handle that null value if value is star star star sorry I don't know the name something there has to be a different mechanism for numerical attribute values you can't even guarantee that an artificial value like minus one you have said it happened to be a correct value something zero is certainly not acceptable so there has to be a special way of handling null attribute anywhere in information not known could be one reason not applicable could be another reason for example number of children is the attribute of the employee there is also an attribute marital status now at least in India I don't know about the western world but in India if an employee is unmarried and you ask him to fill in the number of children hesitated to write even zero he will say what nonsense this question is not applicable to me isn't it so when it is not applicable again you have to put a null value you can't put zero there so there are situations like this where you have to handle null valued attributes and you better provide for them what is the impact of null values in numerical calculations let's take these are roll numbers and these are marks obtained in an examination there are 5 students and 5 marks if I did not have a mechanism to represent null value here let's say these students did not appear for exam I would probably put zero here if I did put zero a dumbo program could read these marks to calculate the average and divide the sum by 5 will that be correct average so obviously in calculation of cumulative value such as averages I must not count these at all neither are zero nor count these for division you understand the important implication and preferably this should happen automatically I should not have to write a program to say oh this was the artificial null value so I should ignore it because that will mean additional good for me to write the program and for machine to do the additional null should not be counted if possible you all agree therefore that there has to be a special representation of null possible in information processing applications such is not the requirement for computational application that you usually write although if you look at sparse matrices if some of you have handled them where the actual data is only in few rows and columns others are null actually it is not zero it is null let's look at multi-valued attributes number of courses taken by a student number of teachers teaching a course the problems we face is each of these becomes a list of values number of courses taken by students is a list, six courses the problem is identity of an individual element of the list is blurred to the external interface I can't sort, I can't extract information quickly, I can't index only single-valued items I can index all or sort so this is a pain and therefore this has to be modeled differently and special so how do we handle this model we have seen already that in multi-valued attribute how do you find all courses taught by Umesh Belour let's imagine that you enter this data see faculty multi-valued attributes Deepak Phadak Umesh Belour, here is Umesh Belour if you sort this list of faculty you will get D first then probably S, sorry K then S and then U but if you read only this that Umesh Belour only teaching this the fact that he is also teaching S3, S4, S6, S4 is lost you can imagine what would happen if you sorted the registration forms on the course list the sorting will be only on the first course that anybody has mentioned it is not of any consequence consequently multi-valued attributes are considered impossible to handle and therefore never factored into any modeling now that is something which is not common sense but because you understand the implications which are not good for processing of that information you will appreciate that if we take a bold stand no matter what I will never represent an attribute which is multi-valued in my model all my attributes will be single-valued if that is so then can I still represent useful information after all we have seen both the entities that we studied the actual multi-valued attributes the entity model ensures that each entity in that set can be uniquely identified you have already seen the purpose roll number, course code this particular thing is called a primary key each entity must have a primary key and you typically show it by underlining a key is a set of attributes and that set of attributes must have unique values amongst the entities of that set the second point we make in the model is avoid multi-valued attributes for the reason that we just elucidated we will quickly see the notion of primary key in some details later on when we study databases formally and modeling more formally we will see that there is something called super key there is something called candidate key we will discuss those things later but primary key is a simple concept and we will see how we avoid multi-valued attributes what are the primary key choices we already discussed this suppose hostel and room was unique for a person we did say s-hostel and s-room as a combination can be a primary key and of course there are problems with such a choice as were already been pointed out that somebody may change the hostel or room during the stay here itself and that there can be uniqueness for a long period of 10-15 years because somebody else will occupy that the point that I am trying to make is that in this case in this case it is not correct to choose this but it is possible to have a unique key which is not a single attribute but a combination of attribute in such case in our model we depicted by underlining all the attributes which form such a composite primary key so if in my model I had written s-roll just like that s-name just like that s-hostel and s-room as underline then it is understood that plus s-room together form a primary of course it is not very material for our making a choice here for the reason that we already discussed and maintained here but the table representation is very simple now we are saying each table has a primary key we are saying it has no repeated no multiple valued attributes multi-valued attributes so column head has become the metadata which are attribute names as if I more for the metadata as I just mentioned s-roll or s-name or s-hostel does not tell me everything so for every attribute I must also record in my modeling in plain English language along with that diagram I must say that the value is integer or string for example s-hostel in IIT Bombay is a two digit number but not all two digits are permitted so permissible range of values 1 and 13 otherwise some idiot will enter 28 as hostel number why this specification is required because when you write program somebody has to ensure to validate that data against the specific so that no garbage ever enters our system but there are problems what about a project staff come student who stays in Tansa house so we solve this problem temporarily by giving an artificial hostel number to Tansa house but there has to be such mechanisms that have to be handled during the modeling itself not subsequently after you have written program a row in a table represents information about one entity of a set or one object of a class entity set has a equivalent nomenclature in object oriented modeling paradigm in there we call it a class just as an entity set is equivalent to class and number of entities similar in an entity set is as number of objects in a class so this is nomenclature otherwise conceptually it is same please note that no two rows are same in any table this is absolutely an essential part of our model we are guaranteeing that entity set represented as a table will never have two rows same why because each row has to have a primary key in the worst case the primary key will consist of all attributes taken together and since primary key has to be unique that means any two rows must differ in the value of at least one part of the primary key so no two rows can be identical is guaranteed by this model set of all rows represents the entity set depicted by the model and it permits easy implementation as a sequential or indexed files you would be familiar with sequential file or index files on the machine so this is very easy to implement the entity diagram is rectangle and associated ellipses entity set name names of attributes is what you depict in that diagram it must be accompanied by a detailed data dictionary now this is something that we often don't do but in professional modelling this is required so you think your diagram means you draw rectangle ellipses another rectangle ellipses that's over no each such diagram should be accompanied by one or more pages of English description which is called a data dictionary entry in the data dictionary will represent one the entity set name and the some description about entity and then for every attribute attribute name value representation string okay numeric whatever typical values permitted values constraints constraints means the value cannot be beyond this or this etc primary key attributes for entity set all this must be documented then and only then your entity model is considered complete you agree with that you will notice that none of us use this simple modelling in our conventional computational programs at all but I hope you will appreciate why this is useful to validate whatever assumptions that you make what is an ER diagram this is the crux of the modelling now we are going closer to handling that multi valued attribute each object in an entity set may have association with one or more objects in another entity set for the first time we are talking about two different entity sets being related with each other in some way or the other one student takes several courses one course is taken by several students agreed if you observe courses were appearing as a multi valued attribute for student students were appearing as a multi valued attribute on a course the fact of life is there is one course there is one student there are 300 courses there are 4500 students quite independent of these two piece of information there is an information about students taking courses and we seem to be getting muddled up because this information is broken partly into the student entity attribute and partly into the course entity what if we say this nonsense must stop and I will represent the association of these two entity sets where some student has taken some course some course is taken by some student by a separate relationship or association then I can remove these multi valued attributes from both these entities and create another representation that representation is called a diamond a diamond connecting two rectangles is used to show this association consider this this is the student set this is the course set there are 300 courses here there are 4500 students here what is a registration form a registration form is nothing but 99057943 saying I am registered for IT640 I am registered for CS11 I am registered for so hence each registration form which on an average has 6 courses listed actually represents 6 such lines some student has 2 lines some students have 7 lines some courses have 300 lines some courses have 10 lines each line at one end has a roll number at the other end has a course score so what is the line between one student and one course the total registration information is nothing but the collection of all such lines where each line is identified by the two endpoints the roll number on this side and the course score on this side would you agree that if I could represent all the lines as a set of lines then I don't need those multi valued attributes because once I have represented and how many lines on an average will be there we agreed 4500 students multiplied by 6 give or take a few depending upon the number those many lines if I represent somehow then I have captured all information this association is represented by a diamond in our diagram so I have student I have course and I have a diamond which is called a relationship set or an association set relationship set is what gives the model the name ER model entity relationship model these are entity these are other entity the relationship between these two entities is student registers for this course now here is the beauty student is an entity so it has primary key and other attributes all single valued now course is an entity again it has primary key and attributes which have single value each entry in the table represents one entity of that set this diamond represents what? this diamond really represents those association lines as I said can I not treat these lines as entities of different kind each line is actually an association an register entity if I treat this diamond to represent an entity an entity set then what would be the attributes of each individual entity inside that diamond each entity is a line right if I want to give attributes to this line as an entity what are the attributes here you observable role number is one attribute course code is another attribute now I see that I have this registered for as a set if I interpret this as a set I don't have to create artificially attributes for it I know that each entry into this registered for set I take it as a set is uniquely defined by the corresponding primary key of the student and corresponding primary key of the course code I don't require anything else so each line is uniquely represented by the primary key of participating entities but because I am treating this as an entity set I have the possibility of having more attributes for this entity set can this entity set have any attributes which are other than student or course is there any feature which can be not associated only with student or only with course date of registration is one but there is a far more important commodity all of you are concerned with it the grade can a grade be the attribute of a student no student gets multiple grade different in different subjects can a grade be an attribute of a course that would be very nice everybody gets a grade that is not possible the grade is an attribute of the association a particular student doing a course gets some grade same student with another course gets another grade some other student with the same course gets another grade so effectively the attribute grade is actually attribute of this line each line has this attribute notice that in our ER model there is no way for us to describe this grade as an attribute everywhere it can be put as an attribute of entity student can be put as an attribute of entity course because it is an attribute of this but this model permits us to say that this registered form which is actually an association treating this as if it is an equivalent of an entity set where each entity is defined uniquely by S roll and C court you know what it means every row here every line here the primary key is automatically defined as a combination of S roll and this but I can define an additional attribute for this which is grade which was not possible to be accommodated anywhere else and if I do this then I can extend the implementation of the model in the same way to say that since this is an entity set I will represent this also as a table if I implement this as a table let me first go to that table S roll C court grade maybe there is a course whose grades have been declared he got AB maybe there is a course whose grades have not come null and how many entries in this table for this relationship I will have 4500 multiplied by on an average 6 it doesn't matter how many exact entries please note two very important problems we have solved in one short one we have gotten rid of multi-valued attributes in fact the moment you see a multi-valued attribute you should see the potential for another entity somewhere which needs to be modeled separately as an entity set and then association has to be established sometimes you may have to do it artificially more importantly grade attribute which can only be associated with those relationship lines can be modeled now and now you will understand that if I have a database in the database I design one table which is students table another table which is course table and a third table which is the registration table I don't need to store anything else to get academic registration and grade information do you agree this is actually called a schema for the database a schema is a scheme of tables each table has some attributes and name these three tables actually form a schema for academic database student table, course table, race table external interface may be different if external interface and registration form comes it is wise of to ensure first check whether the student exists in the student table why because I should not permit any LALU to register for any course that fellow must exist he should have paid fees whatever what secondly he should not be registering for a course which does not exist so I should cross check with my course table consequently at the beginning of a semester the activity that the academic office must do is update all the real table I mean the student table and course table to reflect the student table as a student registers keep inserting data into this table and do nothing now if a teacher wants to get all the students in my course when I go to that interface and I click actually this data is being retrieved dynamically so even if you register half a second earlier I will get that similarly if you ever retrieve your own registration data it is never stored in a separate form for your own registration data the moment you put a roll number only the courses for your roll number will be extracted and put in an interface to show you so even if you have updated your registration by changing a course your faculty otherwise are looking at that data will get the latest update and that is because you guarantee okay consistency of data no matter how it is seen by whom because you are storing only one unique thing it is not dependent on multiple this session has some more slides I will request you to keep looking at the posting on the CS634 course page there will be a Moodle page that will be set up as I said by Friday all the slides will be put up there do read the other slides which we have not discussed here and think about that and do appreciate this we will of course then have some additional modeling exercise etc later thank you very much