 Our next chapter is on indexing and hashing. The hashing part actually I have not really covered here and it is kind of historical term. So, we can just think of it as hashing sorry indexing just forget about the hashing part. So, what we are going to cover here is basic concepts of indexing then with our focus on B plus trees because these are the most widely used index type and the parts on hashing I have dropped from this abbreviated version. It is there in the book slides but I have dropped it here and then we will talk a little bit about multi key access and a few other topics. So, let us lay the groundwork with some terminology. There is a notion of a search key which is an attribute which is used to look up records in a particular file or relation. So, again terminology here a file refers to the underlying storage structure where the records are stored. A relation we will use to mean the logical relation. So, file is the physical manifestation will just say file it could be a collection of files but we will just use the word file. An index file is often just referred to as an index basically has a collection of records called index entries which help us map from a search key to a pointer. So, search key could be may be name may be the student relation has ID as the primary key but you can have an index on ID as well as another index on name. So, if you put an index on ID the search key would be the ID and the pointer would be pointer to the record. Now, what do we mean by pointer? Typically if records are in a heap organization meaning the land up somewhere in the file and never move after that the pointer here would consist of two parts. One is the identifier of the page where the record is which could consist of a file ID offset of the page within the file and so forth. And the second is a record identifier within that page. If you use the slotted page structure the entry in the page header which points to that record is what is thought here. So, all that together constitutes the record pointer or record ID. Now, what is the use of a file like this? If a record might be quite large but this is going to be much smaller. So, indices are smaller which let you search through it efficiently but in fact you are not going to search sequentially through an index. There are ways to jump to the required part of an index which I think most people would be familiar with. So, there are ordered and hash indices as said we will focus on order. Now, the goal of an index is to speed up access to data. So, first of all you have to see what all kinds of accesses it supports. So, the simplest kind of index finds all records with the exact given value for one or more fields. The next more complex index will find records whose attribute values fall in a given range. So, between say 10 and 30. So, simple linear range. So, those are the kinds of things we are going to focus on. Now, many people have been asking queries about geospatial data. There you will often ask queries for things which are you know within 1 kilometer of this spot. Things which are less than you know 4 kilometers downwind of this spot in this month when the wind direction is so and so. So, you have more complex queries there. So, we are not going to look at those today but there are indices which support these kinds of queries also. So, access time is the key thing. The access type and the access time. The other important attribute is the insertion time because if insertion is very slow then up you know insertion to relations gets slowed down. Similarly, deletion time updates can be considered as insert plus delete. And lastly the space overhead of the indexes can also be significant. Usually it is pretty small it is not an issue but occasionally it can be. Now, our focus is going to be on ordered indices where index entries are stored sorted on the search key. So, in our context the entries sorted on id or entries sorted on mean. The summer terminology here we have what we call a primary index, but many people also call this as a clustering index. And I will often use this terminology to mean an index on a sequentially ordered file. So, the file is been sorted on a particular key and the index is on the same key. So, if the index key order matches the order in which the records are stored then you have some nice properties for the index. The search time comes down and so it is more efficient for accessing. In contrast the secondary index is one where the search key order is different from the sequential order of the file. The file may be unordered it may be ordered on some other attribute, but the order does not match that is the key thing. Such indices are called secondary indices or non clustering indices and they have a different performance characteristics. Now many people think that primary index means the same thing as an index on a primary key. That is not how we define primary index, but there are people out there who call an index on a primary key as the primary index. To avoid confusion we tend to use clustering index and secondary index. An index sequential file is basically an ordered file which has an associated cluster or primary index. Now let us start with another concept dense versus spark indices. A dense index is one where an index record appears for every search key value. Note that it does not say for every record it is for every search key value. Here the search key is the ID. So, there is one entry per ID, but if you have something else let us say name it is possible that two people have the same name and then their dense index may have only one entry for that name may not be separate entries to the two records. In contrast, so this is another dense index on the attribute department name where there is only one entry with all the records with the same department name stored together. So, it is clustered on department name and then you have a sparse index sorry dense index with one entry per key value. In contrast, a sparse index does not have entries for every possible key value. It may have an entry only for certain key value. So, this is the same data sorted on ID now and what we have done is instead of having an entry per ID we have only three entries here. The first one points to the very first, the second one points somewhere into the middle of the file the third one points later into that file. Now, as long as the file is stored sorted on this you can create a sparse index like this. How do you use a sparse index? Suppose, I am searching for 15151 is it greater than or equal to this? Yes it is compared to the next one is it greater than or equal to 32343? No. Therefore, I will follow the first pointer and then search sequentially till I find it. But, if I now search for let us say 83821, I compare it with this it is greater, compare it with this it is greater, but this is the last entry. So, I will follow that pointer and then search sequentially to find this. So, what have I achieved with this sparse index? I have avoided looking at every possible search key. I have looked at only a few search keys and found the record that I want pretty efficiently. This is the key idea for indexing that you need to have some kind of sparse index. Now, sparse index has less space and less maintenance overhead, but it requires data to be sorted as I said. Now, how many entries do you create in the sparse index? So, coming back to the previous slide I arbitrarily chose about 4 or 5 records, 1 in 4 or 5 records is indexed. So, heuristic which is typically used is we take the data blocks in which data is stored and the sparse index has one entry per block. Now, again I am assuming that the data is sorted within this. So, it has one entry for the first block, one entry for the second block. What is a block? A block is the unit of fetching or storing data from disk. So, when I read data from disk, a whole block is read. So, given that a whole block is read, there is no motivation to store multiple pointers into the block. Anyway, the whole block is read and then I can search sequentially in memory, it is much faster than disk access. So, I do not need multiple entries per block. On the other hand, if I do not have an entry for a block, if my data is in the second block with no pointer, I have to go to the first block, then fetch the second block, then the third and so forth. So, a good tradeoff is one entry per block. Now, index like this with just two levels is usually not sufficient because this index itself can become very big. So, what we end up with is a multi level index, let us show it pictorially. This is the data which is already sorted on the search key. Then, you have sparse index on these blocks and the sparse index itself is broken into blocks. So, think of this as a single logical array with pointers to each block in the file at each block here in this sorted list. So, this is the first level. The next level of index has one entry per index block here. So, this is the inner and outer index. So, what happened here is, if I had a thousand blocks here, I would have a thousand entries in this level and those thousand entries may fit in say 10 blocks because each block can contain say 100 entries. Now, these 10 blocks, the next level index, I need only one block because I can store 100 entries per block. So, this is just one block. So, the upper levels of the index shrink very fast. They shrink by a factor of say 100 each step if I have a 100 entries per block. So, again we talked of primary index. We said that the data is sorted. Then, we can build a multi level sparse index. But what is the data is not sorted? Then, the trick is the following. First, the index records point to something here. We will call it a bucket. I am not telling you how it is implemented. Usually, this is at the leaf level of tree where you have pointers to the actual records. But these things are sorted on this search. It is a secondary index meaning the actual data is not sorted here. So, this is on salary. So, these things are, the index is sorted on salary. The data which is at the leaf level of the index is also sorted on salary and it has one pointer per record. So, every record is pointed to from here. So, the first one is 40,000 and that points to Mozart whose salary is 40,000. The second one is salary 60,000 which points to L side history teacher whose salary is 60,000 and so forth. Now, any secondary index like this has to be dense meaning it has to have one entry per search key value and further that entry must have one pointer to every record with that search key value. That is required. So, those are the general concepts. These general concepts were applied in real systems for many, many years. For example, this multi-level index was done in a way that matched the physical layout. So, in olden days, you would have blocks which were within tracks and so forth. So, there would be maybe one block over here per let us say all the blocks in a particular cylinder. So, one track might have let us say 100 block or maybe some number of blocks, maybe it was less in that era and then if you take all the blocks in a particular cylinder together, that would have enough blocks here that it made sense to create one index block for that. Now, if you go across different tracks, there would be different index blocks here and so there would be one more level across the different cylinders. So, this was tied to the physical implementation initially, but later people realized that there is no need to link it to the physical organization and that led to the concept. So, that was one idea. The second idea is that as data gets updated, it gets out of order. How do you maintain this ordering of the search keys? So, that is an important thing. So, what an index sequential file does to keep the sort order? You may have to re-sort the file periodically, rebuild the index. This used to be a headache. Once in a few weeks, there had to be some schedule downtime during which the file is re-sorted and re-indexed. On the other hand, a B plus tree does not require any such thing because it automatically reorganizes itself with small local changes. Each reorganization has some cost. It is not free. On the other hand, you do not have to have downtime when the system is inaccessible because all these changes happen in small, small increments as the normal processing is happening. There is a minor disadvantage. There is extra insertion and deletion overhead. Time for insertion and deletion goes up. There is a space overhead. There is wasted space to some extent. But people found out soon enough that the advantages well outweigh the disadvantages. So, they are ubiquitous. Everybody uses them today. So, here is a small example of a B plus tree. This is a toy example where the nodes are very small. In this case, each node has four pointers from the node and three key values are in the node. So, what is going on here? Let us start at the leap level. This is an index on name. Is the data sorted on name? If you look at the data, you will observe that it is sorted on ID. ID is sorted. Names are not. So, what does this mean? What kind of an index is this? It is a secondary index. So, we have to have a pointer per key value here. If you see the key value pointer, this is called the leaf level of the tree. You see here, brand is here and to the left of the key value brand is a pointer to the record with a value brand. We are assuming for simplicity here that there are no two records with the same key value. If there were two people called brand, I would have to store two pointers, which is not a big deal. If there are hundred people called brand, then I would not have space in that node or maybe if I have a thousand people called brand, I am in trouble. So, handling duplicate key values is an issue and we will come back to that later briefly how to deal with it. For the moment, let us assume there are no duplicates. So, there is just one brand. So, one pointer to the record brand. The next one is Califairy. You see that pointer. There is one pointer to the record with Califairy, which is here. And this, I should fix this bug in this slide. The bug is that this pointer should start from here. I forgot to update this version of the slide. So, the pointer to Cric is to the left of the value Cric. And it points to Cric here. Now, by the way, all of this is pictorial. If you think of the underlying actual implementation, this is simply an array of bytes. And if a pointer is 8 bytes, which is typical, then there is 8 bytes of pointer here followed by string, which is the key. If it is fixed length, I know the length. If it is variable length, I have to store the length and the key value. And at the end of the key value, there is another 8 bytes of pointer and so forth. So, this is the physical, physically what happens inside the leaf node. But logically, we are showing it like this. Now, the last thing you will notice here is a pointer to the next node, which is not very important, because you can always go to the parent and find the pointer to the next node. So, here you will see that this node was pointed to from here. The next node's pointer is the next pointer in the parent page and so forth. Now, let us look at this next layer. There is a value Einstein. So, what does that value tell us? It tells us that if you go to the pointer to the left of Einstein, the values which are there are less than Einstein. If you go to the pointer, which is to the right of Einstein, we can be sure that the values are greater than or equal to Einstein, but less than gold. What about to the right of gold? They are greater than gold for sure. But what are they less than? Well, if you go further up, you can figure out that and this node in the tree, things which are less than Mozart are in this sub tree pointed to here. Since this is the last one, things which are greater than Mozart are in this other sub tree over here, which has further children here. So, what this tells us is that if it were greater than or equal to Mozart, we would not have followed this pointer. Therefore, this thing has things which are greater than or equal to gold, but less than Mozart. Similarly, here if it was greater than or equal to Mozart, but less than Srinivasan, we are here and if it is greater than Srinivasan, we are here. So, it is been established that Srinivasan rules BCCI. So, there is nobody greater than Srinivasan there, but in this tree, we have Wu who is greater alphabetically than Srinivasan. So, the point is that if you have a larger value than any of these, it is ok. It will end up at the right end of the tree and you have a value there. So, now, let us look at a B-plus tree by example. Formally, a B-plus tree is a rooted tree satisfying the following properties. First, all parts from root to leaf are of exactly the same length and any update to the tree must satisfy this property. There are many trees which are not fully balanced. So, the path from the root to one leaf may be of length 4, the path to another leaf may be of length 5. In particular for binary trees, you cannot avoid this. There will be different path lengths in general for different leaf, but this is not a binary tree. It has multiple length trees and so, you can make this happen. And the way you make it happen is to allow some flexibility here. Each node that is not a root or a leaf has between ceiling of N by 2 and N children. What does this mean? We have a choice. We meaning when the tree is built depending on what data was inserted, you may have as few as ceiling of N by 2 children for a particular node or you may have as many as N children. So, where did N come in from? What is the value of N? So, if I have fixed length keys, I can determine N directly. So, I am going to assume fixed length. How do I determine N? I know the size of a block. I know the size of a pointer. So, if I come back here, sorry, here, how did I choose that this thing has 4 pointers and 3 key values? That depend on the size of the block. So, this was a very tiny block and I decided that at most, assuming the names are fixed length, I mean, you can imagine that gold has spaces padding it. So, all names are a fixed length which is the longest name here. Who is the longest? I guess Trini Wasen of length 10 or so. So, you would have 8 bytes of pointer, 10 bytes of name, 8 bytes of pointer, 10 bytes of name. So, overall you had 30 bytes of name and 32 bytes of pointer. So, let us say this node was 64 bytes. So, I could back in 62 bytes of information. I cannot back in any more. So, that is how I came up with the number N. In this case, N is 4. The number of pointers that I can have. So, the B plus 3 will have anywhere between ceiling of N by 2 and N children. Similarly, a leaf does not have children, but it has values. So, if you come back here, a leaf in this case had 3 values. How about this leaf? It had 2 values similarly here. Can a leaf have just one value? Sorry, I went the wrong way here. Can a leaf have just one value? No. It must have between ceiling of N minus 1 by 2 and N minus 1. Why N minus 1? The thing is here, a node has N minus 1 values and in a leaf I must have value per pointer. So, I can only have N minus 1 pointers to records because they are only N minus 1 values. The last pointer is used to point to the next page in the sequence. Now, there are two special cases. If the root is leaf, meaning the tree has so few records that there is just one node which is both the root and the leaf, then it may have as many records as have been inserted. If the relation is empty, the index may be empty. But as long as the root is not a leaf, it must have at least 2 children. So, if the root is a leaf, then it can have anything between 0 and N minus 1 values. So, that is the special case when the tree is so empty, the relation may be empty. So, then these constraints cannot be met. So, these are two special cases. Now, the structure of a node I have already showed you pictorially. This is showing the same thing more abstractly. The p i's are pointers and the k i's are key values. So, k i are search key values. P i's are pointers. What do they point to? If this is an internal node, they point to a child node. If this were a leaf node, the pointers are record pointers. So, I am assuming that the structure is the same for both. The next thing to note is that the search key is in a nodar order. I assume that earlier on in the example. So, if you see here, brand caliphary and crick. If you can read it, the font is very small. So, if you cannot read it, these are sorted. Similarly, these are sorted. Furthermore, if you see across nodes, you will see that everything in this leaf sequence order is sorted. Similarly, if you go one level up in a tree, everything across these things is sorted. We have Einstein, Gold and Srinivasan and the top in this case has just one, but if there were multiple key values, it would be sorted. So, the leaf nodes we have already seen that they store pointers. So, as I said, if L i and L j are leaf nodes and i is less than j, that is in the sequence L i is to the left of L j, then the search key values in L i are less than or equal to the search key values in L j, that we saw pictorially. So, as I said, b plus 3 is generalization of the multi-level sparse index. So, the non-leaf nodes form multi-level sparse index on the leaf nodes. So, that is the basic idea of a b plus 3. The difference from is, how do you maintain it when there are inserts and deletes? Given that it is a multi-level index and this is something which we saw, all the search keys in the subtree to which p 1 points are less than k 1. So, if I am at any node of it, whether it is a root or any internal node, if I look at the value k 1, the values which are in the subtree which is below p 1 are definitely less than k 1. The values in the subtree 0.82 by p 2 are greater than or equal to k 1, but less than k 2 and so forth. Till the last one, where you can say the values are definitely greater than or equal to k n minus 1. What are they less than? Well, that depends on the parent. So, here is another example of a b plus 3. Earlier, I had up to 4 children and minimum would be 2 children. Now, I have 6 children up to 6 children with a minimum of 3 children at the leaf node, sorry at an internal node and the leaf node would have between 3 and 5 values because ceiling of n minus 1 by 2 is 3 and n minus 1 is 5, n is 6 because there are 6 pointers here. Any non-leaf node other than root must have between 3 and 6 children and root must have at least 2. So, the point is you can have b plus trees with wider nodes and as you just saw the tree become shorter. So, here are trees are really toy trees with 15 records or so. But in real life, you have trees on relations with millions of records. So, there are multiple levels of that tree. Also, the tree the n there the number of children for internal node n is usually much larger than the 4 or 6 or so that we have used an example. It is usually of the order of 100 or 200 or some such number. So, what this means is the leaf you know each leaf has a pointer to 100 records let us say on average or minimum of 100 depending on what is in. Now, the next level up will have the same kind of structure, but the size of the next level will be 100s the size of the leaf why come back here. If you see here this had a number of nodes at the next level up there is only one pointer per node. So, if you had a 100 nodes here there are 100 pointers from the next level up and if n over here the number of pointers per node maximum is 100. One node is enough at this level to point to all the 100 nodes here. If instead of 100 we have 300 nodes here then they would have to be at least 3 children here. Sometimes it is more because the minimum number of pointers let us say is 200 sorry let us say the maximum is 100 and the minimum is 50. Then if you have 300 nodes at this level you may have up to 6 nodes at this level and at least 3 nodes at this level. At the next level you would have exactly one node. So, that is the kind of thing that happens. So, the number of nodes decreases sharply as you go up each level. So, if you look here this quantifies it in terms of n. So, the root has at least 2 children the level below that has at least 2 times c of n by 2 values and a corresponding number of children little bit more than that. Then next one has at least 2 times ceiling n by 2 times ceiling n by 2 values and so forth. Each level grows by a factor of n by 2. So, the number of levels you will have at most are log to the base of n by 2 of k where there are k search key values on the file. n by 2 as I said is pretty big if n is 200 n by 2 is 100 log to the base 100 of a million is just 3. So, million records will have b plus 3 with only 3 levels. If I have 100 million records it would be 4 levels and so forth. So, even with a lot of values b plus 3's are really short structures and that is important to avoid to minimize the I of. So, now let us look at the operations on a b plus 3 what are the operations insert delete update and so forth. So, let us and before that queries that is a very first one. So, how do you do a query you start at the root and say while the current node current node is initially root if while it is not root let I be the least value which is less than or equal to the key value. If no such value exist said the child to the last pointer otherwise it should be one of them. So, for example, if you are searching for Crick which is here if you look in this node again if you cannot read the text it is we are looking for Crick the we want the least value we least value I such that Crick is less than or equal to k I. So, here Einstein I equal to 2 Einstein is the very first one which is such that Crick is less than Einstein. So, now I will set c equal to pointer I plus 1 sorry this is a special case when v equal to k I here we will set c equal to p I it is less than. So, k 2 was this p 2 is this one and we will come down here and we keep going down till we hit a roof roof leaf I mean once you have hit a leaf we check if the key value is present in the leaf. So, v is the key value if there is an ith entry in the leaf which is equal we have found the record we want otherwise we return failure the record does not exist. So, as I said the typical fan out is 100 of 200 of some such. So, the tree height is 3 or 4 if you had a binary tree on the other hand with a million keys you would have 20 height of 20. Now, why does this difference between 4 and 20 matter it is a factor of 5, but if it is all in memory who cares, but it is not the point is these nodes are on disk every time you want to retrieve a node from this if it is not already in the buffer I have to pay a big price. How big is this price something like 10 milliseconds. So, if I had 20 accesses each taking 10 milliseconds I have just taken 200 milliseconds which is one-fifth of a second. So, my disk can only support 5 accesses per second which is quite bad on the other hand if I use a B plus tree and it had height of 4 what have I done now each one takes 10 milliseconds. So, in 40 milliseconds I have accessed what I want. So, in one second I can have 25 accesses. So, the difference between 5 and 25 is quite significant my throughput can be 5 times as much as with a binary tree. So, that was queries now let us focus on insertions this is the key idea behind a B plus tree. So, far a B plus tree look like a multi level index which we saw earlier the except for the fact that the number of children can be variable and here is how we exploit that in insertion. So, let us do it by example it is not read all the text here. So, the first thing to notice that if a node is full leaf node is full and we want to insert a record. So, the first step is to find the leaf node we have to search down to the leaf node which would contain the value. How do you do this? So, the first step in insert is to search for that key value use the search algorithm which we saw we will land up at a leaf and if the value is already there it is a duplicate. So, we will have to deal with that otherwise we have found a leaf and we need to insert a new value here in this leaf. Now, if there is space in this leaf. So, let us go back to for example, supposing we inserted a new value let us say I do not know what the near click say crack which is below click. So, we are inserting crack we come down to we search for crack we land up at this leaf and there is space in this leaf there is empty space at the end of that leaf. So, what I will do is I will move click to the right and insert crack because it is less than click, but greater than caliphary. So, crack will come in here I will store a pointer to the record with crack I am done that was a easy case. The problematic case is if the node is already full supposing I insert something here something between gold and kim. So, let us suppose I insert Gordon who would be land up at. So, the search for Gordon will land up at this leaf and there is no space now what to do the key idea of B plus 3 is that we split this node and create two nodes half of the records will be in one node half will be in the other node that is a key idea and that has given us enough space for the new value which is being inserted. Now, if I have created a new node of course, I have to do an update at the next level up I need to do something here. So, I will have a pointer to the new node I need a new key value here to differentiate between the two nodes which I have just created. So, all that has to be done at this level. Now, if it so happens that the insert happen here at this level may be I have enough space and I can adjust things here, but if I change it you know if I need to add a new entry to this node there is no space. So, then I have to do the same thing here I have to split this node which can propagate up. So, that split can cause split of a leaf can cause a split of its parent which in turn in some rare cases will cause a split of its parent and all the way to the root node. It is unlikely that a split will reach the root node, but every once in a while it does happen. So, that is the key idea. So, in our example we had Adams, Caryfiri and Crick we added Brandt. So, Adams and Brandt landed up in one node Caryfiri, Crick landed in the second node. Now, we have to update the parent node. So, here is the initial state with again if you cannot read the text it is ok sorry we had Brandt we added Adams I mixed the two up. So, we had Brandt, Caryfiri and Crick and I added Adams. So, what happened this node split Adams and Brandt landed up in the first node Caryfiri and Crick landed up in the second node. Now, I need to add an entry to the parent here to differentiate between this. If I am searching for a record should I be in here or here what is the value that helps differentiate between these two nodes and that is the minimum value in the second node Caryfiri. So, what happened here? This one pointer to this node is replaced by two pointers one to this node and one to the second node and the value in between is the least value here which is Caryfiri. The other keys and pointers here are shifted to the right you will see that gold has moved to the rightmost value and the pointer to the right of gold has now moved there. So, same node with the pointer has moved here. Similarly, the pointer to the right of Einstein has moved there Einstein itself has moved everything got shifted. This is the key idea if there is space you insert there. If this is out of space then again you have to split we will see an example. So, here is this example this is the previous tree and into this we insert a new record lamport. Again if you cannot see the font it is the key step is to find out where lamport goes is lamport less than Mozart yes come down here. If you look here Caryfiri Einstein gold is lamport less than gold no it is greater than gold. So, I will follow this pointer and this leaf has the following values gold cats and kin and I am inserting lamport into this. So, what is the result? I have to split it into two nodes with the lower two values in one node and upper two values in this case in the other half here and half there. So, gold and cats landed up here and kin and lamport lined up in this node. So, what I have done is two nodes have been created I need a new pointer to this node, but is there space here there is not this node already has four pointers down and I am trying to add a fifth pointer. So, what to do I am going to split this node the key question is how do I split this node. So, first think of it this way I will first slightly expand this node giving it extra space. So, what do I store here I split this one into gold kin gold cats and kin lamport. So, what I will do here is add a key value kin and a new pointer to the new record here. So, these two pointers are both added here I mean one is already here this pointer the new pointer is added here with the value kin which is the least value here in between them because that is step one, but this node is too big it would not fit in a actual node. So, now I am going to split it in half with half the entry is going to one part and half going to the other. In this case there are five children right there were four and now a new one fifth child is added. So, I have to split it two three or three two here I have done three two. So, the first three pointers come here the last two pointers go there. Now, if you look at this node it had five pointers and four values when I split it though I have five pointers, but how many values do I have here I have only three values what happened to the fourth value the fourth value. So, when I split these three pointers went to the left this and the new pointer went to the right the key is what was in between these two pointers gold that is added to the parent along with the pointer to the newly created. So, I split this internal node into this and this and in the parent I added an extra pointer to the new node that is here this pointer and the value between this old pointer and this new pointer is gold. So, the split has now reached the root node. Now, it is possible for the root node to be full in which case what do we do we break it into two and add a new root which has a pointer to the two nodes created by splitting the root node. So, that is intuition and the pseudo code gives details of how to do the splitting in the interest of time I would not go over the pseudo code in detail, but you know do read it from the slide. So, if you see at a high level we try to insert if it does not fit we have to split it and splitting a non-leaf node is described here. So, the basic ideas copied to an in memory area which pays for extra pointers and keys inserted into that and then break it into two parts from p 1 to p n by 2 will go to one side ceiling n by 2 and from p ceiling of n by 2 plus 1 up to p n minus n plus 1 will go into other side and the key which splits p ceiling n by 2 and p ceiling n by 2 plus 1 is k ceiling n by 2. So, the entry k ceiling n by 2 and n prime that is a pointer to the new node created here is inserted into the parent. So, again the details are in the book. So, that is an insert at the last major operation is deletion. So, here what do we do the first step is to find the record to be deleted we remove the record from the main file if I am deleting from the file and then I delete it from the index. Similarly, when I am inserting I first insert to the file and then I come in insert into the index. So, now what I need to do for a deletion is remove the search key value and pointer from the leaf node. So, there is some stuff here about buckets will ignore it for the moment. So, it is easy enough to remove value and a pointer from the leaf node. Let us take an example supposing I wanted to delete brand. So, I will delete the record here and I will delete the brand and the pointer from this leaf node and now the leaf node has 2 pointers and 2 values. So, that is simple it is still not under full. But, what can happen is that the leaf node is still not under full. So, if I then do one more deletion the leaf may have 2 few pointers. This leaf should have minimum of 2 and maximum of 3 pointers to actual records. After deleting brand it is ok caliphary crick still satisfy the minimum of 2. If I then delete caliphary also crick is the only thing left there is just one pointer and that node is under full. We have to do something about it. So, what do we do if it is under full the what do we mean by under full one of the properties of a B plus 3 is it must be at least half full whether it is a leaf or an internal node. If it is less than half full we have a problem we have to fix it somehow to restore the half full property. And the key idea here earlier when we overflowed we split and the good thing is that if we were already full and then we added one more and split we can guarantee that each is at least half full that is obvious. Now, if you are less than half full what do we do. So, the idea is that we will look at sibling and try to borrow from that we are under full and we will try to borrow some entries from a sibling. Now, what happens if the sibling is also half full if it lends any entries it will become under full and then we have a problem. But the good thing is if your sibling is at the minimum and you are under full that means your sibling has at most n by 2 and you have less than n by 2. So, you can actually merge all the data in you and your siblings together into a single node. So, there are two parts there is redistribution and then there is merging. If you can redistribute redistribute if you cannot you can merge and you will merge. So, that is intuition. So, the previous slide talked of merging here. So, the point here is the node has too few entries and the entries in the node and a sibling fit into a single node then merge them. How to merge them will skip the details straight forward, but it is kind of the converse of splitting. You have to take a key value which is in the parent and move it down into the merge child I will skip the details. The other case is when you cannot merge you have to take some entries from a sibling and transfer it into this node. Again the leaf and the internal node cases are slightly different, but the basic idea is the same either case. Now, a deletion can also cascade upwards. How does this happen? I delete something the node is under full and I merge with another node. Now, I have to delete an entry in the parent. The parent is under full I try to redistribute I cannot because the siblings are also at minimum. So, I have to merge with the sibling. So, now a deletion propagates to the parent of the internal node and this can go all the way up to the root. In the root what happens was a special case I had a root with two children and after a deletion one of those children became under full the other is at half full. So, I cannot redistribute. So, at this point I am going to merge the two children of the leaf into a single child. At this point the root has only one child that is not allowed a root must have at least two. So, what I do is I actually delete the old root and make the child of that root the new root and then net result is the height of the tree decreases by one. So, that is how the tree grows and shrinks in height splits cause it to grow in height while merges cause it to shrink can cause it to shrink in height. Here is the example of deletion this is before deleting Stringivacin. Let us pretend that in the BCCI the other camp whoever that campus anti Stringivacin camp prevailed and deleted Stringivacin from BCCI. So, we search down for Stringivacin. Since this node we delete the entry now this node is under full because who is the only guy left behind there. So, we can merge it with Mozart sing. So, what we land up with is Mozart sing. So, now what happened to the parent? The parent had two children and those two children merged. So, this entry is gone from the parent there is only one entry the parent is under full. So, what do we do? We have to borrow from this in this case the sibling has enough pointers that it can lend a pointer. What is the pointer it lent? Gold. So, golds the pointer to the gold node came over here the this last pointer is gone the values Stringivacin is gone. So, I have two pointers one to this merged record that is a Mozart record one which I borrowed from here which is to the gold record. What is the value that separates these two? I could search down and find it here which is Mozart or a simpler way is I just look up here and the Mozart is the value that separates these two nodes in the parent. So, I will just copy Mozart down from the parent and copy it down here. So, I have two pointers with Mozart in between. But now something else has happened gold was here it separated this pointer and this pointer. Now, this pointer is gone from the leaf from this internal node. What happens to the value gold? Well gold is an important value because it distinguish between this one and this one. So, we have to actually move the value gold into the parent replacing Mozart. So, if you look down here this Mozart has been replaced by gold which moved up from here to here while Mozart moved from there to there. So, think of it as a slight rotation the value in the top came to the right and the value here went up. So, that is how we do redistribution. This is another example where from the previous one we deleted sing and move. So, if you go back to the previous one sing and move were the last two things here. If we delete both what happens? We try to borrow from the next node. So, we had gold cats Kim Mozart gold cats for one node and Kim was borrowed into here and it became Kim Mozart. When we do that the key value above has to change and in this case it became Kim. So, that is a there are some more examples of deletion with in this particular case the root had only two children and after some deletion the children became under full and got merged and now the root had only one child. So, the child became the new root. So, the new root is caliphary Einstein gold and the tree has become shorter from three levels it became two levels. I will skip the details and I will take a brief pause here to take questions. So, let us go over to center and take live questions first Nandith MGM college you are on. So, suppose we are having a keyword search application then can we use session management technique for the purpose of caching of the results of the search query. So, you want session management to cache results of a search query. Certainly you know any system which supports keyword search like the web search engine they do cache results as long as the results are not personalized. If they apply a personalization algorithm for me which where my results may be different from yours caching may not be very useful, but if it is not personalized then it is useful to cache it. So, if somebody ask the same query again there is no need to execute it you just have the result and you return it and this is very useful because if you see many many people ask the same questions over and over again. So, caching is extremely useful for any search engine, but the connection to session I am not sure. So, keyword search at a search engine this is not done in the context of the session. There are many different users connecting and your it is caching across users. There is a different concept which is when you connect to the database in a session you may cache values locally. So, if you are using JDBC you cache certain things. So, if a query comes again you do not go to the database you serve it from the locally cached value. This is done often hibernate for example does caching of results from the database. There is a problem that the result may become out of date if the database is updated that is a risk. So, generally what happens is you do caching for stuff which does not change much. So, if you say that it is unlikely to change in 5 minutes I can cache it for 2 minutes and I may get slightly out of date data may be it is. So, for example, if you are a railway reservation system if I give a slightly out of date view of seat availability it is ok. I might tell you that I have 2 seats, but when you come in book there are no seats it is ok. You are willing to accept that most of the time this would not happen because initially there would be hundreds of seats when you came to book may be 100 came down to 90, but it did not matter to you. But at the end when the last few seats are there it is noticeable, but that is life. So, caching is used widely in such applications. Does that answer your question? I have a question related to read implementations. I want to know whether if the failure occurs the read system automatically takes over or does it need the human intervention? That is a good question. Eventually human intervention is needed, but immediately what the read system does is one disk is gone it detects that and if a request comes for data it has to serve it from the other disks. In raid 1 it is very easy it simply reads the data from the other disk. In raid 5 it has to read data from several other disks do an XORing in order to recover the data which was requested and then give that data back. So, it does that it keeps doing that until the raid can be rebuilt. So, how is the raid rebuilt? So, in the simplest form raid implementation the most common one the system will wait until you plug in a new disk at which point it will ask you shall I use this new disk and rebuild the raid and you say yes it will go ahead and recover the data on to that new disk. Now, a well designed system will let you do all this without shutting down the system. You can remove a disk and plug in a new one without shutting down the system, but other cheaper ones force you to shut down the system and change the disk and then boot it back up at that point it will ask for repairing the raid. Now, higher end systems will actually keep a spare disk. So, there are many disks in the system and it will keep a spare disk. As soon as there is a disk failure it starts rebuilding the failed disk on the spare one which is initially lying idle. So, it is doing nothing and it starts rebuilding it on the spare disk and recovers completely using this spare disk with no human even coming into the picture. A human must be told though because one disk is dead if one more disk dies there may be no spare disk left. So, human is warned, but the human can take some time if it is a middle of the night does not matter they can come the next day. In some cases with multiple spare disk they can come in you know once a week and remove all the dead disks and plug in new disks. So, many data centers follow this approach they will you know periodically once a week they will go through all the machines find all the dead disks remove them and replace them. We have techno India Rajasthan please go ahead. Sir, I have question from previous section. Sir, how to check the null values in the store result set? In the. Sir, first I get the get the values in the previous statement and then I get the value in result set. And now I have to check the values which is null in the result set how to do it? How to do that? So, the JDBC interface lets you call RS dot WAS null. So, supposing you fetched a value get int immediately after that you can check if RS dot WAS null that means the previous value which you fetched was a null value. Now, RS dot get int will return an int there is no null value in the integer domain. So, the integer which you get back into Java is some value which the JDBC system gives you maybe it is 0 maybe it is minus max int or something. But, in reality in the database it was null. So, at that point if you say RS dot WAS null you can figure out that the value was actually null that is how the JDBC allows you to access it. Immediately after fetching you have to check. Sir, I have one more question. Suppose, if I am having some data in table 1 of database 1. And I want to fire a query and the result set should get store in table 2 of database 2. So, how to do that? So, if you want to do it from JDBC you can write a small program that runs a query on database 1 gets a result set. And for each retrieved row it also opens a connection to the second database. And each row that is retrieved from the first database it takes those values and inserts it into the other table of the second database. So, it is easy enough to write JDBC program that opens connections to two databases at the same time two connections. However, if the data volume is very large this is not necessarily the best way of doing it. It turns out that if you insert one record at a time naively the number of records you can insert per second is relatively small. So, typical implementation might top off at 100 records per second. If you would insert one record per you know one insert statement at a time with auto commit on you are going to get may be 100 inserts per second. If you have a million record table this is really slow it takes a 100 per second how many seconds is it going to take? It is going to take 10000 seconds that is a lot of time more than 3 hours horribly slow for a relation which is not all that big it is only a million records. Now, it turns out there are some things you can do to tune this. First of all you can turn off auto commit and do a commit every let us say 1000 records or 10000 records that speeds up things significantly. And from 100 per second you might go to 1000 per second that might still take you 1000 seconds a lot better than 10000 but still pretty slow. So, an even better way is to do you know most databases have a way to dump well one or more relations into a file in some flight as form as key form. And so you do a dump and then on the second database you do a restore from that dump into load from that dump into another relation. How do you do this is database specific but most databases have utilities for dumping a relation into some as key format, comma separated value or other format. And on the other database there is corresponding way of importing data from this comma separated value format into the database. So, use the appropriate tools on the two databases it can be much faster. So, that same million records which took a you know 100 seconds something with 1000 seconds with JDBC might run in say 10 20 seconds with this mode. Why is that so because there are lot of overheads to JDBC shipping queries across getting results back then again shipping it over it is all very slow. But if you dump a database relation it is very fast copying a file from one machine to another is very fast loading from a file into the database directly also very fast. Okay sir usually in SQL we fire a query for this select star into table 2 in database 2.mdb we used to give some extension of the database. So, in PostgreSQL what is the extension of the database file. Okay I see what your question was your question is how to do it directly from the database. So, many databases support the ability to create a view of a relation on another database as a local relation it is a view and that view will allow you to query as well as to insert into the view. So, it is just a copy of that logical copy of the relation but not a physical copy. So, if you have a database which supports this you can first create a view on a separate database and then insert in you know select star from table 1 that is the method you said insert into the second view and the database system will automatically copy the data over yes that is a valid alternative for that. Now, does PostgreSQL support the ability to create views on remote databases I am not aware of this I have not used it as far as I know it is not there but I am not 100 percent sure about this. So, you can use one of the methods I told you. Because here my many colleagues were searching the file extension of the database PostgreSQL database and we were unable to find that. So, I was curious to know what is the file extension of the PostgreSQL database. Now, there is no file extension per say the database is stored in a data directory which if you went through the PostgreSQL installation thing the directory parts are given there. If you go in there there are directories and then within those there are sub directories there is a data directory within those there are many sub directories. There is no particular naming scheme there is no dot extension but those directories have files for different relations and within those directories there are multiple files which are parts of a relation. So, PostgreSQL does not tell you which is where it is possible may be to find out indirectly but directly you cannot look at the names and find out anything. Thank you sir. We have Sarvajani college Surat please go ahead. Sir, I am having two questions my first question is what is host pair and what is the use of what is the use in context with different rate levels. And my second question is can you throw some light on R tree real tree with respect to geospatial data what is that and what it is used for sir. So, the first question hot spare I think you are mixing up two terms in rate systems a spare disk is one which is not being used and can be used whenever some disk fails it will be used to repair that failed disk. Now, physically it may be sitting somewhere in the rack but it can be used the rate system will logically remap this thing. So, that it will rebuild the failed disk on the spare disk that is the notion of a spare disk there is a separate notion of hot spare which is at the database level. So, it is a hot spare is a copy of the database or it could even be a machine but let us stick to database level now. It is a copy of the database sitting somewhere it a hot spare is one where every update which is done in database copy in the one is also being made in the hot spare copy. So, the hot spare is up to date all the updates done here are also done there. Now, if the first copy of the database fails the hot spare can take over what I mean by take over the people who are submitting queries should know that they should go to the hot spare. This is done at the application server level. So, this is hidden from the end users. So, now the application server or some intermediary server will re-root request to the hot spare database and the end users do not even know that something happened they may see a slight delay when the switchover happens but after that the hot spare is processing queries and all transactions that committed in the first place would have been copied to the hot spare also. There are some issues with what happened in the very last second before failure that is discussed in the recovery chapter later on but that is what hot spare is. So, if you want to learn more read the chapter on recovery. Now, coming to your second question on R trees. So, far we have talked of B plus trees. B plus trees assume single linear sort ordering. So, if you saw we built it on names, we built it on integer all of which have a sort order. However, if you want to build it on spatial location let us take two dimensional location x y coordinate. Now, if I want to find what thing is exactly at this part I can still use a B plus tree meaning if I index on x comma y and the look up is also on x comma y exactly what is at that spot. I can build a B plus tree on multiple attributes and do an equality query on x and y no problem. However, the problem arises when I am looking for range queries. So, if I in a B plus tree I can look for all names between let us say, Soumen and Sudarshan or Sudarshan and Soumen in the alphabetical order. So, that is a range query. Now, in the context of spatial data I may want a range query which says give me all things which are located within this rectangular region. So, that range query of this form in two dimensional range query cannot be supported using a B plus tree. So, what you do? This is a common kind of query in spatial databases. Another kind of query is you know instead of rectangles I may have circles, but another kind of query is may be find me the nearest hospital to this spot. If there is an accident I want to take the victim to the nearest hospital. So, the query is find me the nearest hospital to this spot. Again hospital locations are there in a spatial database you could look at the crow flies distance or if you have road network that makes life more complex. So, R trees do not apply there, but if you just look at the physical distance which is the closest. So, nearest neighbor queries. So, these kinds of queries can be answered using an alternative index. There are many alternative index structures designed for spatial data. There are KD trees, there are quad trees, there are KD B trees, then there are R trees, R star trees, R plus trees. This is a huge number of different tree structures which have been created for this purpose. One of the very first ones and probably the most successful one is the R tree. So, R tree is a practically a very useful structure in terms of you know guarantees about complexity it is not very good. Its worst case complexity can be bad, but in practice people find that it works very well for spatial data. So, it is very widely used for spatial data. So, if you want to learn more about R trees it is not there in the chapter 11 which we are covering right now, but there are details in later chapter on I think it is 24 or something like that on spatial data. So, there is a description of R trees. So, you can learn more over this I will answer your question. Yes, thanks sir.