 Today, we are going to go on to the next topic which is indexing. Now, most of you probably know what an index is. Every textbook has an index at the back, where if you want to look for coverage of a particular topic, you search through that index and then find which page of the book has a material related to that topic. So, if you look for map reduce in this book for example, I am not sure it is there in the index, but let us say it is there. So, if you go into the index and look for map reduce, it will tell you page so and so the book has a coverage of map reduce. Now, there are a few points about this index which are true for all indices. First of all, the index is usually much smaller than the actual data. So, the index at the back of a book may be 10, 15 pages for a book which is easily 500 to 1000 pages. So, the index is much smaller than the data. The second thing is that the index is organized in such a way sorted in this case, such that given a word such as or a phrase such as map reduce, I can quickly find which page of the index has that. Searching through the book sequentially for the phrase map reduce would take a long time, but because the index is sorted, I can quickly find the page which has that within the page I can find the entry and thereby I can find out which page covers map reduce. So, that is the basic property of an index in a database too. The idea is you have a relation and you have a query in this case, it is not words usually, but it is attributes. So, I may want to find where is the record for employee 1111. So, that is based on employee ID. So, if I have an index or employee ID, I do not have to search through every record of that relation to search for 1111, I can use the index to find where that record is. Now, what do I mean by where is that record? As we saw yesterday, records are stored in blocks of a file. So, the index has to tell us which block of which file has that record. Once we know that, we can seek on that disk and retrieve that block of that file and then access the record. So, that is basically the job of an index. So, today we will start with basic concepts which I have already started discussing and then move on to class of indices called ordered indices. Again what I described is an example of an ordered index and we will spend a fair amount of time on B plus tree indices. The book also has coverage of an alternative indexing methods based on hashing. For lack of time, I am not going to cover it here. Not only is it lack of time, but also the fact is pretty much all databases support B plus tree indices. Several databases also support hash base indexing, but the general experience has been B plus tree indices work just as well as hash indices and give some extra benefits. So, most people use B plus tree indices. So, first of all some terminology. When I want to create an index, first of all the index should have a key. It is called a search key. So, I am going to store records in the relation and each record has an associated key value. So, I am going to insert an entry in the index for that record. So, what is the index entry in the index file you are going to have? It will have a search key value and a pointer to the record. What do you mean by pointer in C or C plus plus java? You are used to pointers which are in memory reference. That is the address of a memory location. Here a pointer is not a memory location because the data will the index entry should exist even if the database system is not currently running and the data is actually persistent on disk. So, by a pointer here we mean a pointer to a disk location. What is a pointer to a disk location? At one level you can think of a pointer as you know block address in the disk to which you can directly go. But more realistically it is usually file identifier and a block identifier within that file and then you actually have to do a little bit more work to traverse the file system data structures to find exactly which block of disk has that particular record. So, index entry is a search key and a record ID or a record pointer. As I already said there are ordered indices and then there are hash indices. Ordered indices keep the index entries sorted. So, now there are few types of indices which I mentioned. Now, if your goal is to find records which exactly match a value then you know these two types work. If your goal is to find all records in a particular range. So, supposing I want to say I want to find all employees with employee ID between 300 and 1000. This kind of operation is supported very well by an ordered index such as a B plus tree. Whereas, indices such as hash indices do not support a query which asks for everything within a range. If you say give me the employee with ID 432 it will find it. But if you ask for everything between 300 and 1000 it can still find it but it is going to be very inefficient. You have to look up 300, 301, 302 and so forth and with strings you cannot even do that with integers you can. So, those are the access types. Now, given a particular access type there is a question of how much time does the index take to return records satisfying the specified conditions. Now, what do we mean by time? You could measure the time in terms of you know microseconds or milliseconds but that depends on the specific implementation. So, at our abstract level we are going to look at the complexity. So, we are going to say it takes logarithmic time or logarithmic time to the base 2, logarithmic time to the base 100 or some other such large value and so forth. That is the kind of analysis we will use for how much time does an index take or we will say it takes linear time. In the number of records in that particular relation which is indexed. So, if you have a million records and the index takes time which is log to the base 2 of 1000 is 10. So, log to the base 2 of a million is 20. So, it may take 20 accesses and what do I mean by access? Again most of the time an index is on disk. So, an access on disk would mean random access to some block of disk and if the index is on a hard disk if I take 20 accesses each one takes 10 milliseconds on average. I can say that to access a record through the index I will take 10 times 20 or 200 milliseconds which by database standards is quite low. So, 20 accesses to get to a record is actually very poor for a database index. In fact, you can do much better. A good index can let you access even for a file with hundreds of millions of records. It can let you get to that record in about 4 or 5 accesses instead of 20 and 4 or 5 times 10 milliseconds is 40, 50 milliseconds which is much better than 200. So, that was access time. Another metric is the insertion time. Given a new record how long does it take to push it into that index? Similarly, deletion time and finally the space overhead of that index. So, how much extra bytes does it take to store some to index a particular relation? Now, I am going to focus on B plus 3 indices, but I should mention that there are a few other types of accesses which cannot be effectively handled even by a B plus 3 index. For example, if I give a point and say give me all let us say that the records are locations of shops or locations of people. Locations in terms of let us say the latitude and longitude, these are often used these days because your GPS systems can tell you your position in terms of latitude and longitude. So, if you have stored the locations of various things in a database and my query is of the following form. I say here is a point. Find me all people within a radius of 100 meters of this point. Now, this kind of a query which looks up a two dimensional space. Here is another kind of query. I will say here is a two dimensional latitude, longitude space. Now, find me everything which is inside this box. The box is defined by two latitudes and two longitudes. So, that defines the four sides of a box and I say return everything inside that box. These kinds of queries which work on two dimensional data cannot actually be handled that efficiently by a B plus 3 index. So, there is another class of indices called spatial indices which I am not covering here which handle such accesses. A very popular tree index can actually handle spatial data. So, if you are interested in that you can go read those up. We have a little bit of coverage of spatial indexing in one of the later chapters of a book although I am not going to cover it here. So, our focus is on B plus 3 indices and it is on two types of accesses. One is a point access. Here is a key value find the corresponding record or records. Why records? Because there may be duplicates. If I index let us say on if I index an instructor relation on department name then there are certainly duplicates. There are many instructors per department. So, if I say find instructors in the department equal to computer science I will get many results over there. That is a point query still. Point query may return multiple results. A range query as I said earlier gives a range of values. Using this instructor example I may say give me all instructors whose salary falls in the range 50,000 to 80,000. If I had an index on salary then it is a range query on that. Now, some more terminology we are focusing on ordered indices. There is a notion of primary index which is actually the terminology is unfortunately a little confusing. A lot of people think of a primary index as an index on a primary key. In fact it need not be. A primary index is one where the search key of the index is ordered in exactly the same order as the actual data of the relation. So, if I store the instructor relation sorted by employee ID and the index is on employee ID and it is sorted again on employee ID. Then this index on employee ID is a primary index. It is also known as a clustered index. Why clustered? Because if I give a particular employee ID or a range of employee IDs these records are going to be consecutive in the file. So, the set of answers to a query here will be consecutive or clustered in the file. So, it is called a clustered index. In contrast a secondary index is one where the sort order of the actual data is different from the sort order of the indexing search key. So, let us say I have the same index on employee instructor ID, but the instructor relation is not stored sorted. It is in whatever order I insert records it is stored in the relation and it is not sorted on employee ID. Then what I can say is that if I search for consecutive employee IDs they may be anywhere in the instructor IDs. If I look for consecutive instructor IDs they may be anywhere in the instructor file they need not be consecutive locations. Similarly, if I build an index on department name on an employee relation where the employee relation is not sorted on department, but it is sorted on employee ID. Now, given a particular department the employees of that department are going to be scattered all over the relation they are not together. So, now in this employee relation which is sorted on employee ID an index which is on department would be a secondary index. Secondary indices are also called non-clustering indices because the records which satisfy a particular query department equal to computer science are going to be scattered across the file. So, an index sequential file is an ordered sequential file with a primary index. A little more terminology there is a notion of a dense index which basically has an index record for every search key value in the file. So, here is an example of a dense index. I have this relation which in this case is actually sorted on ID the instructor relation and the index is simply a file containing employee ID and pointer to the record pairs. Now, this file is smaller than the employee file. So, it may be faster to search through this than through this one, but as we will see there are more interesting cases coming up. This one is very straight forward. Now, here is another situation where a dense index does not actually have a pointer to every record. So, a dense index is one which has a entry pointer for each search key value not necessarily for each record. So, the difference is illustrated here. Here the relation is sorted on department name and I have a dense index on department name. Now, when I have the biology department I have a pointer to the first record. I do not need to store pointers to consecutive records of biology because they will all be in the same place. Well, here there is only one biology record. So, let us look at computer science. The entry for computer science stores a pointer to the very first record for computer science. The next two records are also computer science, but there is no index entry for those because we know the file is sorted on department. We know those records will be consecutive. So, there is no need to store one entry per value here. So, a dense index still has one entry per search key value. In contrast, a sparse index contains index records for only a few of the search key values. Now, such an index may appear to be useless, but it is actually very useful as long as the index is on the same order as the underlying data. So, coming back to our same example where we had instructor, but this time the index is on instructor ID, but the key trick is instead of storing an entry for every instructor ID, we are only storing entries for three out of the many instructor IDs. So, what is the use of this particular index? The use is supposing I am searching for 1 2 1 2 1. I go through this file. I say 1 2 1 2 1 is greater than the first one, but it is less than the second one. So, since it is greater than or equal to the first one, but less than the second, I will follow the first pointer. Now, this is not the record for 1 2 1 2 1, but I can search down in the file consecutively till I find that record or I hit the record which is greater than that. So, in this case 1 2 1 2 1 exists and I find the record. If I search for 1 2 1 2 2, I will find 1 2 1 2 1 that is smaller, then I find 1 5 1 5 1 which is bigger than 1 2 1 2 2. So, at this point I say that is it. The record is not there in the file and I will return failure. So, what is the point of doing this? The point is that the number of records I have to search is now one third of the total number of records when I have three entries here. Obviously, three is insufficient for a very large file, but if I have a thousand entries here, I will need to search only one thousandth of the relation instead of the whole relation. So, clearly the benefit is enormous. So, to continue this, if I search for let us say 3 3 3 3 3, it is larger than this and less than this. So, I will come in here and then search down till I find the record or I fail. So, that was a very simple way to search. Now, the idea of a sparse index is very important because first of all the amount of space it takes is less than a dense index and this becomes very very important when the indices are large and here is an example and I am going to see how to show you how to build multi-level indices using sparse indices next where they are critical. But first of all here is an example of how a sparse index is usually built even with one level. So, I have data and the data is sorted on some attribute and I can only build a sparse index on that attribute because it is sorted. So, the data is sorted and the data is divided into blocks as we saw yesterday. So, I am going to have an entry for the very first record of the first block, then I am going to have an entry for the very first record of the second block and so forth. So, what is this going to do? The number of entries here is equal to the number of blocks. Each block typically would hold many records. Rule of the thumb is you know a medium sized record in a relation may be 100, 200 bytes. There are a few much larger records, but this is typical and a page is typically may be 16 kilobytes. So, in 16 kilobytes, if you divide by 200, how many records are there in a page? You have about 40 records in a page. Therefore, the number of entries here is 1 40th of this plus an index entry is much smaller than actual record. So, the index size would be typically less than 1 percent of the relation size. So, that was a sparse index with one level, but if you have a really large relation, even a sparse index with one level that index may become very big. So, if I have a gigabyte worth of data, then the number of entries here may be gigabyte billion records. If you have a billion records, the number of entries is a billion divided by 40 let us say, which is still a lot of million. That is 25 million records entries in the index that is very large and searching through these 25 million entries itself will take a long time. So, the trick is I will build multiple levels of indices. So, it is very easy to understand if I have a picture here, this is the actual data. The first level index has one entry per block here. This first level index itself is very big. So, its entries are divided into blocks. Now, remember that the entries here are sorted also. The data is sorted, the entries are sorted all on the search key. Now, I build one more outer index, which has one entry per block of the inner index. So, if this index has 25 million entries, the next level index has one 40th of that. So, that is what is that come to about 800,000 entries instead of 25 million and 800,000 may still be large. So, then I can have one more, which will be let us say, if you divide by 40 again, it becomes 20,000 and so on. You may build a few levels of indices. So, multi-level index will have 4 of 5 levels and within 4 of 5 levels, you can index a huge amount of data. That is the key insight for indexing in a database, that you have multiple levels and each level consists of number of entries which are divided into blocks and the next level index has one entry per block below. So, such indices have been in use for a long time and they work very well. They have been around for easily 45, 50 years. Now, roughly about 40 years ago, people realized that there is a drawback of a basic multi-level index like this and the drawback is how do you keep the index up to date as you insert and delete data. So, supposing I insert a number of records here in this level and the thing has to be kept sorted. So, I have to insert it in this block here. Now, this block may become too big and I have to split it into two blocks. Then, I have to add an entry here. Now, adding an entry here may require moving all the data below, which is rather tedious. So, then people said, let us take this idea. If this becomes too big, I will split it into and I add an entry here. If this becomes too big, I will split it into and add an entry here. This key insight that I can maintain this by splitting nodes and then propagating this split up became the core of this data structure called the B plus tree. So, we are going to see what the B plus tree is shortly. But before that, let me also show you an example of a secondary index. Now, here is an index on salary. It is a secondary index because the relation is sorted on employee ID not on salary. So, the secondary index has one entry per salary value and here if I have two people with the same salary of 80,000, who are these guys? Singh and Kim both have a salary of 80,000. So, I am going to have two pointers, one to Singh and one to Kim over here. So, there is a set of values. For each value, I am going to have a set of pointers to the records. So, that is a second way index. So, coming back, as I said, if I have to keep the file sorted and I have to keep the index up to date, while I am inserting or deleting data, the standard solution in 45 years back would be to rebuild the index periodically. So, the index goes slightly out of order. You have some extra entries and then you rebuild it. Now, rebuilding turned out to be very expensive as data grew larger. So, the B plus tree indices were introduced as a way to avoid rebuilding a whole index, but instead to incrementally update the index as data is inserted or deleted. So, what a B plus tree does is it reorganizes itself with small local changes. So, the changes affect a small part of the index. No insertion or deletion requires you to go and change the whole index. So, each update can be done fairly fast, but over time, of course, the whole index will change if you have a very large number of inserts and deletes. So, as a result of reorganizing, performance will not degrade. In index sequential files, which were the predecessor, IBM had in its mainframes, had index sequential files. So, they used a different structure where you created overflow blocks and there were a variety of tricks that they used, but all of those over a period of time, as you inserted more and more data, the index became progressively inefficient and had to be rebuilt, whereas B plus trees rebuild themselves incrementally instead of one big bang rebuilt once in a while. So, there is a small overhead due to B plus trees, which is first insertion and deletion take some more time and more importantly, there is a space overhead of a B plus tree index, but that is viewed as perfectly acceptable given the benefits which it gives. So, that is why they have become universally used. So, here is an example of a B plus tree index. Now, so far when I showed you a sparse index, it was kind of turned around. The data is here and then level 1, level 2, level 3 left to right or right to left depending on how you view it. In contrast, when I show a B plus tree, it is essentially the same thing, but I am going to turn it around. So, I have data at one level, first level of the index above, next level above that and so on until I hit the root of the index. So, if you look at it this way, an index is clearly a tree. If you looked at it the other way, it is still the same thing, but it is not obvious that it is a tree data structure. When you turn it around, it is obvious that it is a tree data structure. So, that is why it is a B plus tree. So, what is the index itself? These are the entries of the index, the levels of the index. Down here is the actual data. Now, this font may be too small for you to read from the display, but you can see it in the book later on. So, what I will note is that this particular file is sorted on employee ID, whereas the index is on the name of the instructor, rather, it is sorted on instructor ID. The index is on name. So, I need to have one entry, one pointer for each record in this index. And now, you will notice that for each record, there is an incoming pointer from the leaf level of the index. So, if I search for a name, I will find an entry in here till I find that name. This seems to be a slight bug in this slide. This pointer here should actually have come from here. It should have been shifted one left. So, coming back, let me repeat what that error was. There is a pointer here going down and then pointing to Crick. That pointer should have come from the left of the word Crick here, not from the right hand side. That was shifted right by error. Now, what is this index leaf doing? It has one entry per name over here. If there were multiple people with the same name, we will see how to deal with it later on. For the moment, let us assume that there are no duplicates at this level. So, what we will have is for each entry here for brand to the left of it is a pointer, which points to the record for brand. Similarly, for caliphary to the left of it here is a pointer, which points to the record with caliphary. And similarly, for Crick to the left of it, this should have shifted left. There is a pointer to the record with Crick and so on for each name, which is at the leaf level. So, for every name, there is an entry in the leaf level and a pointer to the record. The pointer in this case is to the left of the record. Now, the rightmost entry of the leaf has a pointer to the next leaf, which lets us navigate by starting from some leaf and then going right, right, right till I hit the last leaf, which I am interested in. So, that allows navigation at the leaf level. Note also that every name is here and the names are sorted. Although, the relation is sorted and employee ID not on name, the leaf level of this index is sorted on name. So, that is clear what the leaf should contain. Now, one level up, you will notice the structure looks a little bit different. At this level, there are two names in this particular case, but three pointers. Now, here the meaning is as follows. Here is a name Einstein. The pointer to its left basically says every name, which is less than Einstein is in the subtree, which is pointed to on this side. Here, the subtree is just a single node, but going up to the root of the tree, it is a little more clear. So, here is Mozart and the left side pointer from Mozart says that every name in this index, which is less than Mozart will be in this subtree pointed to here and you can easily verify that. If you see here from Brandt up to Kim is in this subtree pointed to here and Mozart onwards are in the other side. So, everything less than Mozart is in this subtree. At this level, this is the very last pointer. So, this level also says one more thing. It says everything, which is greater than Mozart is on the right side over here pointed to by this pointer and that is also easy to verify. You have Mozart, Singh, Srinivas and Wu all greater than or equal to Mozart is on this side. Now, coming down back to this node, everything less than Einstein was on the left. What is over here in this subtree? This contains all records, which are greater than or equal to Einstein, but less than gold. So, each pointer here contains values, which are greater than or equal to the value to its left and less than the value to the right. So, here Einstein and El Said are down here, whereas gold itself is in a tree, which is to the right of gold. So, everything above gold goes here. Well, let us be a bit careful. It is not everything above gold. It is everything, which was less than Mozart, because we came into this tree only with values, which are less than Mozart. And then on this side is everything, which is greater than gold, but greater than or equal to gold, but less than Mozart is in this subtree. That is the idea here. So, that is how this data structure, that is the property of the data structure. Now, how do you build this data structure? We are going to discuss in a little bit. But before we see how to build, the first question is, how do you search for a value in this data structure? And the answer should be very clear. If I search down here for let us say gold, I will start at the root. I will see Mozart. Is gold less than Mozart? Yes, it is less. Therefore, I will go on the left side and come to this one. Now, I repeat it here and I search in here for a value, which is less than gold, less than or equal to gold and a value, which is greater than gold. In this case, there is a value, which is less than or equal to gold, which is this one. There is nothing greater. So, it is the last one here. So, I will follow this pointer and come down to this leaf. And here, I find gold and I am done. What if I search for, let us say, cadds? If I search for cadds, I will find it is less than Mozart. K is less than m. Come here. Cadds is greater than gold and gold is the last century. So, again I will follow the same pointer and land up here. What if I search for El Saeed? This is slightly different. El Saeed is less than Mozart. I will come here. El Saeed is greater than Einstein, but less than gold. Since it is less than gold, but greater than or equal to Einstein, I will follow this pointer. And here, in this node, I find El Saeed. So, this is basically how I searched on. If I search for a value, which is not present, what will happen? I will land up at a leaf and then I will see that the value is not there in the leaf and say, sorry, the value is not there in the index. So, that is basically how I search in a B plus tree index. Now, here are a few properties of B plus tree indices. First of all, as you saw in that figure, the index has multiple levels and each pointer is from one level to the next level and all the leaves of the index are at one level, the bottom most level. In other words, all paths from the root to the leaf are exactly the same length. Now, if you have a binary tree, binary search tree, you will know that the different paths to the different leaves may have a different depth from the root. So, the path lengths are different. It varies and in extreme cases, a binary search tree may be very skewed. So, searching for some keys may take a very long time. B plus trees make sure that does not happen. They make sure that the tree is balanced. There are many other balanced tree structures which are well known, which are covered in a data structures course. But, a B plus tree is one such balanced tree and in fact, it is balanced in the sense that every leaf is at exactly the same depth from the root. Furthermore, a B plus tree must satisfy this following property. Every node other than the root of the leaf must have between ceiling of n by 2 and n children. What is n? We are assuming that the key values are fixed length and the assumption is I can pack at most n minus 1 key values and n pointers in each of the internal nodes. n is this number. It is defined based on the size of the pointer, size of the key. So, the keys are names which are 20 characters plus a pointer which is 8 bytes. Then, this adds up to 28 bytes and if I have a 16 kilobyte page, I can calculate how many entries and pointers I can have if I fill that page. That is the value of n. A page will have n pointers and internal node that is will have n pointers and n minus 1 key values. The property of the tree is that an internal node may not be packed full. There may be spaces in it. We saw that in this structure. In fact, both of these internal nodes, all three internal nodes have some wasted space here. They have space for a key and space for pointers which are currently unused. If you look at the leaf level, this leaf is packed full. It has space for three keys and all three are occupied. This leaf is not nor is this or this. So, in fact, this is a property which is fine for a B plus tree that a particular node may not be packed full. But, it must be packed to at least half. In fact, ceiling of n by 2 is the minimum number of pointers which will be there in an internal node. Now, why is this important? If this minimum number is 2, then the height of the tree may be very large. It may be a very tall tree requiring a lot of disk accesses to get to the root of the tree because each node of that tree may be on a different disk page potentially. So, the idea is that n should be large. So, n by 2 is also large. So, if n is let us say 200, n by 2 would be 100. So, each node of the tree has at least 100 children. So, now, if I have a tree with three levels, how many leaves does it have? The root has may be up to 100. Actually, root does not quite satisfy this, but on average actually the root will also have a lot of children. So, let us for a purpose of simplicity assume the root has 100 children and the next level also has a 100 children. So, each of the 100 children of the root has 100 children. So, 100 into 100, 10,000. With three levels of the tree, I have 10,000 leaves. The leaf level, one level and one more level which is the root. So, I have 10,000 leaves each of which itself may have 100 pointers. So, let us say 10,000 to 100 is 1 million. So, 1 million record entries can be stuck into a B plus tree with just three levels. So, as you can see the B plus tree is going to be very short. So, because it is short, the number of times you have to go from one node to another is much less than in a binary search tree. This is an important property because node access is very slow with a disk. Now, a leaf itself also has a minimum occupancy property and it is a little slightly different for a leaf because a leaf basically can have up to n minus 1 values. One last pointer is reserved anyway to go to the next leaf. So, this is the maximum. The minimum number of values it can have correspondingly is ceiling of n minus 1 by 2. That is a minimum occupancy for a leaf. Special cases, if the root is not a leaf, it has at least two children. This can happen. We will see why later on. And if a root is a leaf, it can even have less. If a tree is essentially empty, just the very first record is added, then the root is also a leaf, it will have just one entry. So, we saw a picture of a B plus tree, but this is the formal structure of a node. Each node has pointers p 1, p 2 up to p n. Each node has key values k 1, k 2 up to k n minus 1. And this is identical whether it is a root intermediate node or leaf. All nodes have the same basic structure. The way the pointers are used is a little bit different. For an internal node, the pointers are pointers to other B plus tree nodes which are below them. For leaf nodes, the pointers are record pointers. The search keys inside a node are ordered as k 1 less than k 2 up to k n minus 1 up to here. We will assume here again node duplicates. A leaf itself, as I said, each pointer is a pointer to a record. This is a blown up leaf node where brand caliphary and creek all point to the corresponding records. And the last pointer is a pointer to the next leaf. So, we looked at sparse indices. If you look at what a B plus tree is, if you just look at all the nodes at a particular level, that level actually forms a sparse index on the level underneath. And so each level is a sparse index on the level below. So, a B plus tree is an example of a multi-level sparse index. So, now here is another example of a B plus tree which is actually for the same data as before. But now supposing the node size is bigger. By the way, in all our examples, the node sizes are fairly small. We have maybe 4 children or 6 children in this case at most. In reality, nodes are much bigger because they correspond to disk nodes. 16 kilobytes is common. In fact, sometimes even bigger. So, the number of children as I said is usually much larger 100 or so. Obviously, we cannot draw 100 children in a page of a book. Therefore, we use very small fan out of 4 to 6 for examples, but it is actually much bigger. This is called the fan out by the way. The number of children of a node is called the fan out of the node. So, here is an example with n equal to 6. With n equal to 6, there are 6 pointers. Therefore, each leaf node must have between ceiling n minus 1 by 2 and n minus 1, which is in other words between 3 and 5 values in every leaf node. And every internal node must have between 3 and 6 children. Ceiling of 6 by 2 is 3 and maximum is 3. So, between 3 and 6 children. So, notice something here. This is an index on the same set of names. But because my nodes are now wider, instead of 4 children, I have 6 children, the tree actually got shorter. From 3 levels, it is now down to 2 levels. So, the wider a node is the shorter the tree.