 Hi and welcome to this lecture on data structures and algorithms. In this session, we will discuss heap sort. What is the motivation for heap sort? Well, so far we have seen that the insertion sort algorithm and the selection sort algorithm both give you order n square performance. Question is could we do better and what is better? That is exactly what we want to understand by looking at a possible lower bound for sorting especially based on comparison. Insertion and selection sort both invoked comparison of elements and in fact we discussed the analysis of insertion sort based on number of inversions which correspond to number of swaps and all that is basically comparison. So, the key observation will be as follows, we will try and view each of these algorithms that we have discussed so far or any other algorithm that we might come up with in the future based on comparison as a run of an algorithm on a large binary decision tree. So, here is what we are saying, we are saying that we have some large binary decision tree, this is all for a given array and each of these algorithms or each of the runs of these algorithms would correspond to a path in this decision tree. So, this might be the path of insertion sort whereas for selection sort you might have a different path. So, let us try and understand what this tree is. Which node in this tree is basically a comparison between pair of objects, is x greater than y or y greater than x so we will use the index i for a permutation and i1 will be an element of the i permutation x i1 less than x i2 would be a kind of comparison we would make. The result of each comparison is a yes or no and that corresponds to the branching. So, the sorting algorithm is basically a series of comparisons to decide which permutation of the sequence s is a sorted permutation. So, what sits at the leaves of at each leaf of the permutation tree what you have is the permutation. The number of permutations is n factorial therefore we have n factorial leaves, the number of leaves is n factorial because the number of permutations happens to be n factorial. Now, what is the minimum height of such a tree well we know that since this tree is binary actually it is balanced it is a complete tree because every pair of elements and they are compared you will have two outcomes and there will be no comparison that leads to a dead child so you will have a complete balanced tree and the minimum height of such a tree is log n factorial and in case you have not seen what log n factorial is you could expand so what is log n factorial so this can be viewed as summation over i equals 1 to n of log of i and this we know is less than equal to summation over i of log of n because log of i is less than equal to n as long as i is less than equal to n and this is nothing but n log n so this gives us order n log n. This is of course an upper bound but we also can find a similarly a lower bound what we could do is substitute one half of these numbers with n by 2 so we know that the upper half of these are all going to be greater than equal to n by 2 so I know that this summation is greater than equal to i equals n by 2 to n of log of n by 2 and well if you expand this this is nothing but omega n log n the n by 2 is log of n minus log of 2 and that just gives you n by 2 log n minus n by 2 log 2 so this is omega n log n so this is what the tree of permutations looks like here sigma i stands for the i th permutation so each of these leaf nodes is a permutation and all together there are in factorial such unique permutations we have actually made an assumption that we are dealing with unique elements it is possible that some of the numbers get repeated analysis is not very different in that case as we discussed the number of leaves is n factorial and this implies that min height of such a tree is omega of n log n. So what we are saying is for any algorithm that needs to make these comparisons and reach one of the permutations for a given sequence s to reach from s to the correct permutation let us say this is a correct permutation sorted permutation so you cannot but avoid traversing the tree from the root to the leaf which means you will have to traverse at least n log n. Now is there an algorithm that helps us achieve this lower bound so recall that a min heap abstract data type stores the index and the value of the smallest element in the root and the smallest element is defined as that which has a lowest value this must hold not only for the root but for every substructure. So in general the value of the parent of i must be less than equal to the value of i which is all the other descendants we see that 40 is less than 70 and 90 and so on this is actually a min heap and this min heap can be structure can be constructed for a sequence s this can be done in at most order n log n time and this invokes a sequence of heapification operations on every subtree we will recap this when we discuss sorting using the min heap. So the idea behind min heap sort is to do the following given the sequence s convert it into p by calling the heapify function on s once we have that you could use p to recover s as follows you can retrieve the smallest element of p and keep adding them to s for all e equals min of p add e to s this you do in an appending manner and then go back to b and delete e from p that is what we have shown here first you insert elements from s into p while maintaining the min heap nature of list p. So recap that this is nothing but invoking p the min heap as a priority q abstract data type now for every insertion you are going to traverse the height of the existing tree. So initially the tree is very small so you traverse a tree of size 1 log 1 and then you have 2 elements log 2 and so on up to log n and this is order n log n. Similarly you delete element e from p this you do by removing that min element and pushing it into the sequence s this is order n when you remove the first element and then it is log n minus 1 and so on again eventually it is order of n log n. So order n log n overall is it possible to avoid constructing an external data structure p and instead do in place heap construction. So before we do that we will see what the array corresponding to a min heap looks like. So the element 10 which is the smallest element is going to be stored at position 1 the immediate children are stored at positions 2 and 3 and thereafter we have the leaves stored at position 4 to 7 but what exactly is the structure here the leaves of a node with index i are at positions 2i and 2i plus 1 this is a property that holds for every node and you can easily verify this in this simple example. So the basic idea is as follows the root of the entire tree will be at the first position position 1 its children will be at positions 2 and 3 and in general for an element with index i which is basically root of some subtree its immediate children will be at positions 2i and 2i plus 1 of course for i to have children this will mean that all these roots of subtrees should lie within the first n by 2 elements of the array it is n by 2 floor and that we can see here for 7 elements the intermediate non-leaf nodes cover the first 3 elements of the array. So to do in place min heap sort here is what we will do we realize that since the top of the list is always going to be the minimum element amongst all the elements that follow it we will we will grow the min heap as follows we initially build a min heap convert the entire array into min heap so what we will have is the smallest element here I am this will be a smallest however it is not necessary that the next element is the next smallest and so on. So what we do is invoke min heapify on the next node so the next minimum node must be between positions 2 and 3 you do a comparison and thereafter push thus next smallest element to the second position now we may have to do little bit more work because the tree that has resulted might not completely satisfy the min heap structure if you just treat this third node as a root node in fact it may not be even a completely balanced tree in general the min heap tree would have a bunch of nodes and is complete except for the last layer so except for the last layer the tree will be complete. So therefore you may have to restructure the node to make it complete so that that min heapify should take care of let us discuss this in place min heap sorting in some more details all this is happening within a single array so formally a min heap of n elements is a complete boundary tree with all levels except the last one being filled the last level is filled from left to right as I pointed out the value of an item at parent is less than equal to the values of children and the minimum element will be at the root so in the array setup the child of node k will one be at 2k and the other child will be at 2k plus 1 of course this is provided the latter two are less than equal to n if not then one of them may be empty and that is why it is possible that the last level is not necessarily completely filled up. So the idea of heap sort is use the left portion of s up to index i minus 1 to contain element sorted so far this is the element sorted so far and the right portion stores remaining elements in a heapified form and here is the algorithm or formally given input sequence s the first step is to convert s into a min heap and you would like to do this in place of course you do not want to use an auxiliary data structure this can be done. So the idea is to iterate from the left to the right we already expect after the build min heap operation we expect the first element of this array to be the minimum element so what you need to do subsequently is min heapify the remaining array we are not building a min heap from scratch because we know that to a large extent the heap structure is satisfied it is only about where to push this element new element at the top and retain the heap structure at each of its children and correct at the child at which some violation might take place. So this is the min heapify which needs to be invoked on this remaining sub array this you keep doing till you hit the length of the array s what is the min heapify sub routine so if you are at a position i consider its children who are at position l equals to i and r equals to i plus 1 so if the array element at l happens to be less than the value of the array element at i then you know that the min is l which case you will need to do some reordering or restructuring so you basically do some bookkeeping and keep track of which of these is to is the min if i happens to be the min there is not much to do in fact you could stop likewise for r you make a comparison with i you do not need to compare between l and r because each of them only needs to be ensured as a min heap structure so once you have found a minimum element check if that min element is i if it is not i then what you need to do is replace the l with an i or r with an i corresponding to which of them turned out to be minimum and that we are keeping track of through min so si is swapped with s min whichever of these is and then after you again call min heap if i s min there is no need to min heap if i other subtree the subtree that did not correspond to the min element and the reason for this is we already had a min heap before i got introduced so the whole purpose here is to ensure that insertion of a new element will honor the heap structure the build min heaps of routine builds this entire initial heap from the array s and again this happens in a in place manner so you begin with the first non-leaf node from the right hand side so s is 1 to n by 2 to n so we know that this is the first non-leaf node or eternal node from the right hand side so you scan this array from n by 2 floor from this index and till you find the left hand wall end of the array you invoke min heap if i on the sub array rooted at that element i am spanning until the end of the array so for an arbitrary index i what this will mean is invoking min heap if i with respect to this sub array what are we doing in this process well first of all we do not really care for the last n by 2 nodes because they are the leaves and all we want to make sure is every subtree has optimal min heap kind of structure the desired min heap structure and moving left is basically like having a new element added and you need to make sure that that element is compatible with the existing min heap structure so you can just invoke min heap and have i trickle down into that branch which gets affected because of i now this complexity is as follows min heap if i needs to traverse the height of the tree so the actual expression for this is summation i equals 1 to n log i where log i is the cost invoked for the ith invocation and we know that this is at most n log n question is can we get a better bound is there a better upper bound the answer is that log i being upper bounded by log n is too loose we could actually tighten it and we do it as follows we change the summation from summation over the nodes i equals 1 to n to summation on a height the height can go from 1 to log of n and what we find is this height can repeat for multiple examples in fact for tree of height n for tree of height h at most n by 2 raise to h plus 1 nodes exist right this we know from the fact that an n element heap has height log n so h equals log n work backwards and find the number of nodes the maximum number of nodes in a tree of height h so what we can do is add this scaling factor n divided by 2 raise to h plus 1 but this is now applied to a tree of height h so you will of course have the original linear time complexity in the height so we just rewritten the expression here which I am marking in yellow slightly differently in terms of height and it turns out that with some amount of manipulation this is nothing but order of n so in particular we have made use of a very important equality which is summation over k equals 0 to infinity of k times x raise to k is x divided by 1 by x square and in particular we have set x to the value of half so you can convince yourself that by setting x to the value half and using n here well in this total number of nodes you can actually get the same expression so this is order n so what is interesting is we have found a more efficient way of building them in heap so that is order n however our continuous invocation of min heapify in heap sort will still constrain our upper bound to be n log n so recall that our min heapify subroutine was log n and this will be invoked n times and that makes the complexity n log n so we have found one algorithm which runs in order n log n and that is a heap sort question is are there other algorithms is this unique where it turns out there are a couple of other algorithms and a very important framework for discussing some such algorithms is the merge sort framework that will be in the next session thank you