 Dear students, today we are going to discuss lecture number 37 of the database management system course. The course code is CS403. In the lecture number 36, we discussed different file organization. We discussed the sequential file organization, index sequential file organization and the hash file organization. We also studied different advantages and disadvantages of these file organizations. At the end of the previous lecture, we were discussing the indexes. Index is basically created on a database file and purpose of the index file is to facilitate in the data access. When you store the database, you have to access the data for different purposes and different uses. If you do not have an indexes facility, the only option that you have got is to search for your required record sequentially one after another. If you want to access the database based on any condition, you have the option to read all records carefully from the database file, database, table and then you compare your desired value within every record unless you reach your own desired record. Or your desired record is lying at the end of the file or it is not there at all. When will you find out that when all records have been read, you will finally find out that the record is not there? So, indexes basically help you to make your data access efficient, relatively low accesses. We have just touched on the topics of indexes in the previous lecture and today we are going to read its details. So, let's start today's lecture. Any subset of the fields of a relation can be the search key for an index on the relation. This means that all the attributes that you have in a table, one of them is a combination of two or three attributes. Any combination of attributes can be used to create an index. And the thing is that what will you do on it? You can access the value of the attribute on which you can read the data. You will create an index on it. Search key is not the same as key. This means that the search you are searching for is not necessary to be key, it can also be a duplicate. And index contains a collection of data entries and supports efficient retrieval of all records with a given search key value k. For example, if we have a student's record and we want that we have a key, that is student ID. But still we may not always need the access on the basis of key only, on the basis of student ID only. In fact, it is possible that we need to access the data on the basis of name, let's say, or area. Let's say we say that those people who are related to DG Khan, whose domicile is of DG Khan, or whose domicile is of let's say Lahore. So the way we can read the data access, it can be your key value. You have to search for records based on that key value. For example, you say that those people whose CGPA is between 3 and 3.2. Again, you are giving a search condition on the basis of CGPA. So the purpose of all these examples is that in the real life examples, in the real life situations, it is not that you only need to access the data on the basis of key only. No. Apart from the key, in the other attributes, if we say non-key and it is not unique, you can also read the records on their basis. So all the fields that you need to access on their basis, they can be your search keys. Another factor is that in some cases, you need to access the records on the basis of a particular key value very frequently. But in some cases, some cases that you need to access at any time, let's say, let's say in the basis of name, you may have to access 200 of them in one day. So on the basis of address, you may have to access one or two or three times. So the frequency of access, on which value basis, how frequently do you need to access the data? That is also one of the criteria on which you decide whether you should create an index or not. Let's move on. Indexes also contain auxiliary information that direct searches to the desired data entries. This means that what you have an index file contained, I have already told you that an index file does not contain all the data, that is, what you have in relation to your table, an index file does not contain all of that in a particular order. No, it contains the value of key, yes, because on its basis, it has to help you in accessing one thing. But apart from this, there may be some other information in it. What can that information be? It depends upon the implementation approach of the index. We will read in this lecture that there may be different ways to implement indexes. So the way an index file is being implemented, according to that, the key values you have, apart from that, there may be other entries. We will see what they can be. We can have multiple or different indexes per file, for example, file sorted by STID with a hash index on CGPA and a B plus tree index on screw name. As I just told you, the relation of your table is that it is always stored in a particular order. You define that. If you don't define it, then by default, the primary key order will be stored in your record store. First of all, the student table we have got our key as the student ID. So practically, the record store in the record store is stored in the order of student ID. But what you have is different attributes. As I told you, you need to access data on the basis of values of different attributes. On the basis of value, you need to access the data very frequently. On that basis, you create index files. For example, the names given here, a hash, a B plus tree, these are basically different implementation approaches of the index files. The meaning of these sites is that you have records stored in the table on the basis of value of student ID. They are stored in that order. You have an index file which is made on the student name and it can be implemented through the B plus trees. Similarly, we have an index file created on the CGPA. It can be implemented through the hash. This way, you can have multiple different index files on the same table. Or, theoretically, you can have the indexes on all the attributes, not only all the attributes, but on different combinations of the attributes. The etiquette is possible. Or, DBMS, you will not even deny it. But definitely, there are some negative sides of this thing as well. Let us go ahead. Indexes on primary key and on attributes in the unique constraints are automatically created. You have declared the primary key and the attribute while defining the table is declared as unique through the constraint. So, on those attributes, your DBMS generally automatically creates indexes. And still, we can add more. How do we classify different types of indexes? What selections does it support? In this, you have an index type. Which type supports that selection? Remember, as I told you earlier, when you select rows, you select rows always on the basis of a certain condition. When we read the SQL condition, we studied different formats of the conditions. We used the wear clause. There are different types. We have to see that the index classification can also be seen that which type of index supports which type of selection. This could be one approach. Representation of data entries index. What kind of information is the index actually storing? We will have to see that what kind of information is stored in the index and how much space the index itself is taking. The data that you are storing is going in the table or the relation. Whatever physical record you have, it will be its space. But apart from that, the index file itself is going to take some space. We have different index approaches. We have to see which index methodology stores what extra information and how much space it is going to take. Apart from that, the classification we have can be done on these three basis. One of them is that clustered or unclustered or non-clustered indexes can be done. Single key versus composite key indexes. What is your index key? Single key or multiple attributes. It is tree-based. It has inverted files or pointers. These are different. The final one that we have studied is different implementation approaches that we will discuss. The next topic is How do we create index? As I just said, whatever is your primary key and your unique attributes that you have declared as unique through your DDL, Data Definition Language, you declared them as unique and the indexes DBMS automatically created. After that, as I just said, we still need to study some other attributes to create indexes. For that, we have an accurate statement and it is also part of the DDL, Data Definition Language. You will remember that we used the create table statement to create a table. Now we will see how we use create statement to create an index. Now look at the create statement which is a syntax. If you look at it, the create statement is written in capital. It means you have to give it as such. After that, the unique is optional. That is why it is written in square brackets. But if you write it, you will write it as such. Non-clustered. From these, you will give one option. After that, what is required is index. Since index is written in capital, you have to give it as such. After that is index name which you will provide yourself. On again, this is required. And where you have two options in curly braces, you will give table or view. You will give its name here. You have to produce it yourself. After that, what you have to give is column. Column means you have to give column's name here. Whether it is from the table or from the view. We are going to read about view, but what is the table in terms of table? And again, there is an optional thing in which it is said ascending, descending. You have to create an index here. Whether it is in ascending order or descending order. Your values are in increasing order or decreasing order. This means if it is by default then it is ascending. If you are doing it on the name, then the names starting from A will be from B to C and like that. And if you are descending, then it will start from Z and after that it will decrease and it will go to A. And after that it is comma and dot on N. This means that what you are giving column and ascending, descending you can repeat it. Now look, this means the column on which you are indexing your table it can be a column and it can be a multiple column. Now the question is what is the logic or reason for giving column or multiple column index? The reason is that if we know that the value we are going to index that is unique. If it is unique whether it is unique or primary key then in such cases when you have that attribute that your index is creating that is unique. In that case, there is no chance of the duplication of the value. The value which is coming which will create the index which will attribute all its values there is no chance that you are going to have the duplicate value. So it is obvious that every value has to exist at once and it will come in the same order. But in some cases it is quite frequently agreed to arrange the data on a non-unique attribute. Definitely the same value can be duplicated. What should the dbms do if it finds the same values for two different records? For example, if we talk about name we said that on the basis of name you create its index. First of all we have a new name and by chance the institute for which we are creating the database we have 7, 8, 10 10 different names. So the logic of having the multiple attributes is that you are guiding the dbms that if you encounter the same value for this index then what you have to do on that basis is that you have to arrange the same values on the basis of the other attribute. So what do we mean by this? First of all we have a student name and we have 10 different classes in our institute. What is the difference between those 10 records? We are guiding that if our name is the same then what you decide between them that you will decide let's say on the father name. So the 10 different names that will be arranged further will be on the basis of their father name. First of all since the different names were the same now we have said on the father name which means the alphabet is the smallest it will be the first after that the father name it will be the end like that. So the second attribute will be needed when you are not in a position to decide on the basis of the first attribute. If the decision is made generally there will be no confusion there will be no use of the second attribute. Why? Because the decision can be made on the value of the first attribute only. There is no use of the second attribute of the second column in case of Babar and Jamil. But if we are going to have 2 or 3 Babars or let's say we have got 10 Jamils what will we do in that situation in that case second attribute or it will be interesting if you keep the first attribute let's say ascending then the second attribute you are giving you can still change the order of the second attribute. For example, you say you should do the name in ascending but if the name is the same then on the basis of the father name but it is in descending then what will happen? But the name of the father which is the biggest factor in the order if the name of the father is Zubair then the name of Zubair is Jamil and if the name of the father is Valayat Hussain then it will come after that. It means that the order that you are giving for the first column and for the second column they are not required to be the same they can be different and they can be the same and this principle the value of the second attribute will also be the same so if we don't specify if we don't specify then generally the order in which the records are entered will be the same for example, we had Jamil Ahmad we have only given the name so if the same value of 10 records is called Jamil Ahmad then the 10 records of the own recorders will be the same in the relation of table but when you said that the father name will go to the base of the father name but suppose that we have got 3 such Jamil Ahmads who have got the same father name as well so based on the first attribute you were not able to decide on the basis of the second attribute Jamil Ahmad was Fiaz Ahmad so what happened that Jamil Ahmad is the same so what is going to be the order now either again you will give the third attribute in a normal real life situation the order that does not matter in that case or you will say to yourself that you will leave the order in which you had entered the record now these 3 records they will appear in the order in which they were entered into the database table but first of all if you go to many courses you are a perfectionist if you are a perfectionist you will say okay here is the third attribute you will say on the address what happened you will say name, father name, address if you decide on the name no need to consult the father name or the address if you decide on the name then the father name and address does not matter because name was the first attribute then you go to the second attribute which was your father name if you decide on the name then you move to the third attribute so this concept is to give or to create the index on a single attribute or on the multiple attribute let us go ahead here is the example you said create unique index and you said program underscore program name again this is a naming convention there were two letters you took them from the table the table you are creating underscore and then you gave the name of the attribute on which you are creating the index so the project name is basically attribute and it is programmed so you did it on the program and the attribute on which you are creating the index is called program name and the index is the name of the index file so you said pr underscore pr name what will be the benefit of this now you knew the name of the table on which you are creating the index plus the name of the attribute of that table on which you are creating the index so this is your index file name and when you said on the program you gave the name of the table or in the bracket the attribute name you want to create the index another example can also be created on composite attributes or multiple attributes and if you remember what we have studied one of the classification was that either it is created on a single attribute or on the multiple attribute you just saw the index on a single attribute now we are seeing example of the index that is created on multiple attributes create unique index unique again it is optional it is not necessary it is optional ST name ST again we have mentioned ST means student table and name means name attribute here we said student ST name ascending as I gave you example you said create on student table that attribute student name ascending and then you said ST father name descending meaning you will arrange name of student in ascending order in increasing order but if name is same then you will decide on the basis of the father name but in the descending order I hope you have clear example properties of indexes indexes can be defined even when there is no data in the table or existing values are checked on execution of this command the first thing is when you have created a file even then if you already know that I will need the indexes on these attributes then you can give the command and you can create the indexes now your index files are on attributes those index files will be created based on the values of those attributes even when there is no data in the database table because now that structure as you have table structure and there is no data in it similarly you have got index file as well although there is no data there to arrange but still you have got the structure so as soon as you start populating your database table automatically index file also gets populated the values will automatically move and it starts working just fine the second thing is you do not need to be tense that when I have created a table I need to know all the index files that I have to create on this table and if I miss then what will happen nothing will happen you have created a table you have created some indexes some indexes you have created automatically and on your unique attributes on them indexes are created automatically now let us say you have populated your table and in that let us say 1000, 2000, 10,000 records have come and now you realize now there is a need that you need to create index on an attribute why because you need to access that on the value of that attribute very frequently when we create an index the access becomes efficient on this basis you realize that you need an index on a flat attribute first we say on the phone number on student phone number or let us say on the qualification of the employer we need an index now although the table is populated you will create the index the same command we have given we will give that command and your index file will be created and as many of your existing records they will be arranged in this index file automatically and after that your index file it will start helping you to have an efficient access of the data from this database table second thing is as all the constraints are if the table is populated and you apply a constraint then your existing data will first check on that whether this constraint can be imposed or not if you say on the qualification of the employer or you say on the student phone number you created an index and you said now the unique if you created an index before putting the data in the table in that situation this index file or the DBMS won't let you enter the duplicate value in that particular attribute this will be one situation but if the data was already present and now you said if you create a unique index then your unique index will create if your existing data is that follows this constraint because in future it won't let you enter a duplicate value but what about existing data the veridity the applicability of this constraint on your existing data will also be checked so if your existing data is that was unique then your index it would be successfully created otherwise the DBMS will give you error message that this index cannot be created or the unique constraint of this index it can't be implemented and first if you do not give a unique constraint if you create an index and you don't give a unique constraint in that case if there are duplicate values I hope this is clear to you come on indexes support selections of the form field operator constant when we were studying the select statement we said that the where clause has a condition and that is basically the form of the condition what is the name of the category then you can have operator and there is a long list of different operators and each operator has its own behavior constant means that you can give an expression that will compare so your indexes help you to access such select conditions for example I gave you an example where select star from student where CGPA between 3 and 3.2 you have given a select condition and you have specified a range now it is obvious that if you have created the index on the CGPA it has arranged all the records in the CGPA value in the order of CGPA value now it will say that the range specified in the order from 3 to 3.2 especially continuously means that when you got 3 then any record that does not fall in this condition and where 3.2 is after that any record does not fall in that condition so if you start the 3rd entry in your index file from there and the point where 3.2 is end will you access the records between them those are the only records within your database access this is confirmed that there is no other record that satisfies this condition or that fall between this range this way your indexes they help you to access or to perform the search conditions efficiently then it says that support equality selections either tree or hash indexes help there means where you have equality you said that CGPA is equal to 3.5 this is an equal airport term so this will help you tree and hash will help you but when you give range when you said that less than equal to, greater than equal to between less than greater than the things that fall in your range in that situation the tree based both will help you but when you have hash based it does not support and you know the reason because in the case of in the situation of hash you always compute the value for the individual record you apply the hash algorithm on individual and search for the basis so hash will not help you from now on we are going to discuss specifically because till now our discussion was either it was focused on primary key or on unique key or general key from this point on we are discussing only the secondary key dear students so far we have discussed the secondary key so many times that you do not need much explanation but do win by the secondary key obviously that attribute on its basis you need the access but that is not necessarily unique and I have told you again that in the database applications in the database you need to access a lot of values in which the value is not unique because look what is the difference in this the difference is that on the basis of unique key when you search for a value then you will not get the record if you do not get it if you get it then you will get one and that is your required or desired record but when you talk about the secondary key in the case of secondary key the data you will get that is not necessarily unique you can get multiple records that means in that case if you get a good record then it is possible that all your records were required or it is also possible that one of them was required so this means that on the basis of secondary key in that case you may need to further clarify to further make it sure which one is your desired record either all of them that you have got one of them, two or three so this is possible but I have said that this is also possible that you need all of them and the other thing is that keep in mind that in the case of secondary key you are always going to have multiple records no this is quite possible that when you search for secondary key then it is possible that if you give a secondary let us say name it is possible that 40 times you get a single record and interesting thing is it is possible that you give a name for 100 times the state of your database is such that all the 100 times you get a single record but on this basis because we gave 100 times 100 times we got a single record does this mean that name is a unique attribute or it is a primary key no about your database or your data about such decisions such rules never ever try to define them on the basis of the state of the database on the basis of the data in the database because the data you have that is a time-oriented thing time-based thing that at this time something else can happen I told you that 100 times you gave a query on the basis of the name 100 times you got a unique name but it is also possible that 50 times you are getting multiple entries so I have to say that the decision you have to take whether something is unique or not you have to define the business rules the environment the database you are developing by looking at the rules whether the name is unique or not the name can also be unique it is not a big deal that it is not an unexpected thing that it cannot happen that you should not imagine no there can be some organization that says that we do not allow two employees or two items fine the name can still be unique but what I mean is when I say that in the case of secondary key you may have this it does not mean that if you are getting unique values on an attribute or if you are getting single values in the case of secondary key you should always expect multiple outputs no the thing is whether some attribute is unique or not whether something is primary key or not whether something is secondary key or not it can also be decided on the basis of the business rules you will see the rules of the organization where you are developing the database that says that it should be unique you will declare that as unique not this one whether by coincidence the data I am showing you yes like student name program name program etc. these are different examples on the basis of which you may need to access the data but you will not get the unique output you will get multiple records so this is what we call secondary key that you know records are stored on the basis of key attribute as I told you generally the records stored are stored on the basis of search sorted on the basis of name sort for command execution problematic these three the three options we have discussed what is the meaning of this and what is the problem look we have discussed that our records they are stored in the file on the basis of the value of the primary key now we need the data on the basis of a secondary key now the thing is what are the possibilities we have the first possibility is that we perform the sequential search and you have read that sequential file or sequential access is inefficient in that you have all records records on the table relation records you will have to read all of them and in that the field on which you have searched the search condition first you have said that let's say the student name is equal to Jameel Ahmad so you will have to compare the name of every record and then you have to compare all the records in the number of lakhs you have records in that file you will have to search a lot of records and another interesting thing is that you can after reading all records that is there were 1 lakh records after reading 1 lakh records you will know that there was not a single record but desired or if it was then you will have to look for it so that is the sequential search but you have a sequential search especially in the case of big files where there are more records in that case sequential access is inefficient it will be time consuming so the first problem is this the second question is that if you do that your records are stored on the basis of ID if you need a name you can sort this file change it on the basis of name and on the basis of name you can store it that means you have duplicate copies of the same data you have one copy on the basis of ID and the other copy on the basis of name and in one way the solution is fine but along with the drawbacks I hope they will click in your mind the first thing is there is a lot of redundancy if you have 1 lakh records then now you have got 200,000 records 2 lakh records in one way on the basis of name and the other thing is whenever you have to enter a record you will have to place that record both in the original table in the original sense because it is on the basis of ID it has all the records we call it original in that table and that file that you have sorted on the basis of name you will have to place that record same thing happens when you delete a record you will have to delete it from there you will have to delete it from here well, it is the first thing if you do an updation then again movement will be on both sides this is affordable the other thing is we have just discussed we can have index on multiple fields on the basis of name we need it on father name we need it on cgp we need it on address because we need to access the data on the values of these attributes so this means on all these attributes or on the combination of BAS we need indexes so if we start sorting then think what is happening that data that we have should be once what we have stored we have got 10 different copies of that same data instead of 1 lakh records we have stored a million records because we have sorted that same file in 10 different orders and saved so 10 copies are done and all the examples like delete, update, move all those operations will have to be performed on all these copies so sorting on the basis of that particular attribute is also problematic the third option is that the command we are executing on the basis of when we gave the exact statement we said select star from student where name is equal to Jimmy Ahmed you must be very surprised because we are naming him so now you have said that Jimmy Ahmed show me the record so third option is write the moment when you need to execute this statement you sort your file at that time again I hope that you are fine as theoretically it is fine but at the same time you must be able to identify the problem with this especially in the case of huge files sorting is going to take time obviously if you swap 1 lakh record then it will take time then the second thing is if you want multiples on the field then you will have to swap time consuming and another thing is if you need it repeatedly if you need it on your name then you need different access based on ID or attributes and after 10 accesses you need it again on the basis of name then you have to perform the same activity again time consuming so I hope that the 3 alternatives that we have discussed in front of you the problems in these 3 you must have realized so obviously the solution is that you create index file let's see how index on secondary key is the solution instead of saying that the 3 we have discussed instead of that you create on secondary key that remains in original order however secondary key index maintains ordered sequence of records and provides direct access any attribute you have created on the basis of attribute then you will keep that attribute's value in an order and that record in which this value is present will guide you to that so this means that if you have indexed a record on your name then you have an index on your name all your names are present on your record so you will search for the record on your name in the index because it is in order and after that along with that there will be information on which you can directly approach to the record having this value as the name of the attribute the value of the attribute name so you will get to that record this is simply the approach by which the index files makes it easier makes it convenient, makes it efficient to access the records on the basis of the values of certain attributes now secondary key index implementation approaches are three one is inverted files or inversions the other is linked lists or 3C has B plus trees the purpose of studying these implementation approaches is that you have an idea in your mind if we are using a particular facility if we are using a particular feature of the DBMS of the database how is it being implemented what is the cost involved what is the benefit of this feature see most of the time implementation things they are managed by the DBMS itself most of the time there are certain situations, certain states certain points where you can guide the DBMS but the thing is most of the time implementation thing is they are handled by the DBMS itself and that is one of the beauties of the three level scheme architecture that you do not have physical details they are handled by the DBMS itself but the thing is as a computer professional as a database person as a database man as a database expert you should know that these are different features fine but how they are implemented what is the cost of using these features is that you should not do blind decision you should see an optimal situation see a small thing we say index they make the access quicker fine it is a simple rule, simple statement that is valid also it is not wrong but the thing is there is a cost if you are simply a DBMS user then you create index on all the attributes on all the combination of attributes go for it and you say why I have made the access efficient I will make it efficient by creating the indexes but as a computer professional as a database expert you should know that the indexes when they are implemented there is no extra cost there is no extra storage there is no extra execution involved so that you do not do that decision you should take that decision where you see that it is fine here I know that it has over head but its benefit it outweighs its over head so what should you decide especially in the lectures you will feel that we are not using it directly but mind it for a database expert it is it is not a must I would say but it is very useful it is very beneficial to know these things so we read the implementation so that you can use the features of the database thoughtfully efficiently the approaches here to implement the index files especially the secondary index basically they involve different data structures and the course of data structures you have read and all the things that you have discussed in detail we are not going to discuss about these data structures in detail we will not talk about those data structures or if we will do it just for the sake of introduction or revision here our emphasis will be on how we can use them in indexes and it is also possible that when you read your data structures course at that time you used these things to implement the index you may have read it so if it was like that then it will be revision in your class how you can use it to implement index files and as you know the link list and the trees they are very useful very popular data structures and especially when you do the data structure course you do the different assignments and the implementation here we do not do the implementation here because normally you do it in the same course in C++ we do the assignments here just we will see their implementation how they are implemented and we will discuss this in our next lecture so today's lecture we will wind up here as you have seen what we have discussed is the topic of index and in index we have seen that the purpose of creating index and we are trying to link it that your database or your table has a relation the records the physical storage in that the reference in that order or in an index what is the difference so we will continue this discussion in our next lecture now I would like to say goodbye