 Hi, I am Ganesh Ramakrishnan and I would like to welcome you to this course on data structures and algorithms. Today, we will begin with our first lecture on algorithms, we will begin with sorting. We will begin with the paradigm for sorting, which leverages comparison between elements. So, comparison in the sense, you look at the ith element of an array and compare that with the jth element of an array and perform operations based on the outcome of this comparison. We will consider some abstract data types for sorting that build upon this comparison for sorting. In particular, we look at selection, insertion and heap sort based on different choices of the abstract data type, we will have these three different versions. We will also consider the in place variants of these three sorting algorithms and perform analysis. We will then discuss a slightly different paradigm for sorting, which is a divide and conquer based approach. We will discuss the two instances merge sort and quick sort and also discuss challenges involved in coming up with an in place version of merge sort in particular. We will also perform analysis, runtime analysis. So, the basic idea of comparison based sorting is take pairs of objects and perform operations based on the outcome of comparing those two elements in the pair. The result of each comparison is basically a yes or a no. It turns out that for algorithms that are based on comparison, there is a lower bound in the sense that there is no way an algorithm can perform better than a particular value and that turns out to be n log n, where n is the length of the array being sorted. But what is also interesting is that there are several algorithms whose upper bound or worst case performance happens to match the lower bound for comparison based sorting. So, in one sense the lower bound has been achieved even in the worst case by several comparison based sorting algorithms. So, we will discuss different implementations of the abstract data type in particular the priority queue and we will discuss how you could do away with the use of an explicit external abstract data type and embed the functionality of the data type within the array itself. That is what gives you in place variant. In particular if the auxiliary memory that you consume through the abstract data type is limited and not does not grow with the size of the input, you could come up with the in place equivalent variant. We also discuss an important tool for proving the correctness of an algorithm called the loop-it variant and we will discuss three aspects of loop-in variants. We will discuss the complexity of the algorithms and also present lower bounds on running time that apply to comparison based sorting algorithms in general. Priority queue is a very natural choice of an abstract data type. The priority queue lets you specify the highest priority element and both insertion and selection from the data type, abstract data type could be based on priority. So, different implementations of PQ lead to different sorting algorithms. So the basic idea of sorting using priority queue is that an input sequence S is iterated upon and for every element that you iterate upon within S, every element E plays that into this auxiliary data structure P. As I pointed out this is done for each element E of S, thereafter you pick elements from P in an intelligent manner. So you would intelligently enumerate elements from P and put them back into S. Of course you want the intelligent enumeration and the insertion to be both as efficient as possible. So let us discuss some specific implementations. So when the priority queue is an unsorted list or an unsorted array, the algorithm for sorting is called selection sort, the choice of list as against the choice of an array does not make any significant difference. You could also use a sorted list based implementation of the priority queue, in such an instance the sorting algorithms called insertion sort. You could replace the list with an array, you will have some slight efficiency improvement but the order of complexity does not improve as such. A more interesting implementation of the priority queue is as a heap and that gives you heap sort. So let us start with selection sort, an unsorted list. What is the complexity? So insertion could happen either at the beginning or the end of the list P. So in either case the complexity is order 1, it really depends on how the list is stored from the beginning to the end or the other way around. In either case finding the index of min would require you to scan the list. This is basically an order n time, n is a size of the list as you are scanning. Delete requires just deleting that specific element, elemented position m. In this process however, you will need to shift back elements and make the list compact. So the compaction time would depend on the size of the list. So initially the compaction time is this order 1, then it is order 2 and so on. So the overall time required is order n square. As I pointed out at every iteration, scanning the list would require order n, where n is a size. Compaction in the worst case can be order n when you find the element to the other end of the list. Interestingly the best n worst case running times for selection sort are both theta n square. So recall that f n is order g n, if for all n greater than equal to n naught, f n is less than equal to m times g n for some m. And the relation between theta and O is as follows, f n is theta g n. If g n serves as an upper bound for f n, that is f n is order g n. And in addition, you also have that g n is order f n, which means f n for some constant m serves as an upper bound for g n and g n for some other constant serves as an upper bound for f n. So it turns out that theta is at the tight bound n square holds for selection sort. So what is this index of min? As I pointed out, this index of min requires to iterate over the element of p starting with particular index i to find the minimum element. And we have defined index of min more generally in terms of the starting offset i. We will make use of this particular subroutine in several cases. Now is there an in place version of selection sort? It turns out that there does exist an array based in place selection sort. What is the idea here? The idea is once you find the minimum element of s starting at a particular position i, which is the index of the min between position i and position n, the end of the array. You swap the elements at positions i and m respectively. The idea is to get the effect of finding the minimum element and storing back through a single swap operation. So you do not need to wait till you have emptied the array rather as you find the minimum element. You know that the position i needs to be updated and made consistent with sorting and that is what swap achieves. The new element that went into position m is not going to really affect the structure of the auxiliary data structure. As a result, this operation is not really going to hurt the unsorted list at all. An interesting point to note is the stopping criterion which is i equals n minus 1. So, it turns out that by the time you have exhausted finding minimum element between position i and the end and especially for i ranging from 1 to n minus 1, so by the time i is n minus 1 it turns out that you have already identified the nth element to be the largest element anyways. So, the answer to this question why does it suffice to execute until i equals n minus 1? The answer is that the nth element has already been unanimously found to be the largest element in the array. How do we prove the correctness of selection sort? So here is the claim at start of each iteration of the ith repeat loop the subarray s 1 to s i minus 1 consists of the i minus 1 smallest elements of s in ascending order. So, as I pointed out earlier instead of emptying the array altogether all you are doing is treating the first i minus 1 elements of array s to be updated with respect to sorting and treating the rest of s as the abstract data type or the unsorted list that we deal with as we keep updating s. So, the scan is over s i to s n whereas the insertion is on the array s 1 to i minus 1. We need to discuss three properties of this loop invariant condition to be able to use this condition to prove correctness of the algorithm. The first is initialization you need to prove that this condition of s 1 to i minus 1 being sorted holds prior to the first iteration of the loop. You need to prove that after each iteration of finding the min and swapping this condition is maintained and at termination invariant helps show that the algorithms actually correct because at termination what you are referring to as the sorted sub array in the invariant condition turns out to be the entire array itself. So, we will quickly show that all the three properties of initialization, maintenance and termination are satisfied for this loop invariant property. So, how does this hold at the beginning? Well, it holds at the beginning because initially your i is 1 and therefore you are referring to an empty sub array. What happens at the next iteration is you find a minimum element of the remaining sub array and insert it that a minimum element at the beginning of this sub array through a swap. As a result, this loop invariant condition holds after the first iteration. In fact, what we have just proved is the maintenance property. You could use a same argument to prove that the loop invariant holds after the second iteration provided or assuming that the initialization has been already done which is the prefix sub array is already sorted. Another argument is that every element of the prefix sub array must always be less than equal to every element of the remaining sub array. And therefore the new element that gets appended can only be greater than equal to elements in the existing sub array. And finally, termination what does this entail this loop invariant property entails that all the elements from 1 to n minus 1 are sorted. And by virtue of the algorithm and swapping we know that the entail element must be greater than equal to every other element in s. So, as a result the termination condition which is s from 1 to n minus 1 is sorted is sufficient to show the correctness of the selection sort algorithm. So, in our next session we will discuss another implementation of the priority queue namely the sorted list. In fact this gives us our next sorting algorithm called insertion sort. Thank you.