 Hello and welcome to this next lecture on data structures and algorithms. In this session we will introduce a new sorting algorithm, a sorting algorithm based on a paradigm of divide and conquer. In particular, we will discuss the simplest algorithms for sorting based on divide and conquer called the merge sort. The idea of divide and conquer approaches to sorting or any other problem is very clearly spelled out in the name itself. The first phase is divide, which means divide the problem into a number of sub-problems. In the case of sorting this will mean that you split the array s1 to sn into two parts. The next step is conquer, which means be able to solve these two sub-parts and this is often done by solving them recursively. So, the second step is conquer by solving them recursively, which means you will again split each of the parts into smaller parts and so on. Finally, having gone down all the way in your divide and conquer, you need to go back all the way up and combine. Combine is the path upwards, combining the solutions to the sub-problems and divide is a path downwards and conquer basically does division as well as combination. So, in the case of divide and conquer or sorting is also called merge sort. The simplest implementation is the merge sort algorithm. You divide the n-element sequence into two sub-sequences of ny2 elements each. Now, it is not really necessary that the two parts be of the same size, but the simplest implementation says that we keep them of same size. Conquer is to solve each of these sub-problems recursively. So, this will mean sort the two sub-sequences recursively using merge sort. So, merge sort or divide and conquer based sort has to be invoked on each part. Finally, combine is to go all the way up by merging the two sorted sub-sequences to produce a sorted answer. So, remember that combine will be invoked recursively in recursive calls to conquer. So, here is an example. We have an array 7294. Our first step is to divide. So, we have divided into two parts of equal size. We want to determine the sorted list on the right-hand side is empty as of now, but we will populate as we go down. So, the first step is divide. Now, what you do with the left half is all left to conquer. So, in fact this left call is nothing but conquer. In fact, you will do two conquers. This is divide one, conquer one. You will also do conquer two. The division is only one, but there are two conquers. Now, within each of the conquers we will have further invocations of divide and conquer. What you will do after all of these conquers or within each iteration of divide and conquer is get together the outcomes of the conquers and combine. So, the outcomes will be combined here. Let us see this further in action. So, the conquer on the left-hand side has further invoked a divide. This is a conquer and we call it conquer one, but conquer one has a divide 1, 1 dividing 7, 2 into 2 arrays size 1 each. So, there is divide. What follows next is combine. So, you are going to take the outcome of each of these and combine them and that you see in this arrow ahead going up. This is a combine operation. What you do next is however, note that the combine operation requires you to place the elements in the right order. You merge these elements from the two sorted lists. Similarly, on the right-hand side we would have got another sorted list. Imagine the right-hand side giving you a 4, 9. So, followed by 4 goes to 4, 9. Again, this needs to be communicated back and this happens through a combine. And what combine will entail is interleaving elements between 2, 7 and 4, 9. Which means you will have 2 followed by 4 followed by 7 followed by 9. You are not really bothered with comparisons within each group, but you are concerned about comparisons between selected elements the left and right subgroups respectively. So, this is complete execution of merge sort for this example. So, you get a sorted list. So, you can visualize any merge sort instance through a complete binary tree and the number of communications in terms of divide and combine are of the order of the height of the stream. So, here is the merge sort algorithm. What we have done is we have invoked merge sort on each of the partition. So, we have partition S into 2, S 1 and S 2. Invoke merge sort on S 1 and merge sort on S 2. So, the first step of partitioning is a divide step. The next two steps are the conquer steps and finally, you have a merge step as a function itself says merge or combine. In fact, it is a merge that gives merge sort its name. We have called this an in place merge sort, but I would like to highlight an important point. You could store these partitions in an auxiliary array, a separate array and get them back into the original array. Merging will mean getting them back to the original array. It turns out that an in place merge sort is somewhat expensive. We will avoid too much of discussion of the complexity analysis, but an in place merge sort is basically attained by plotting part of the array while using the rest as working area for merging. So, the idea is sort part of array, part of S and use rest as working area for merging and this actually leads to slightly bloated complexity which is order n square. However, if you made use of an auxiliary data structure, merge sort using additional space and now how much is this additional space? Well, it turns out that you will need at least order n space. We will show that this leads to a better upper bound which is order of n log n. So, somehow in place merge sort we have not accounted for all the time that is required to copy elements and bring them back to where they belong and so on. Whereas, in additional space we will always ensure that there is some place to push on these elements before they are brought back. The merge sub routine over two sequences S 1 and S 2 and as I pointed out we are making an assumption that additional space is made available. This additional space could be a part of the original array itself as was an in place merge sort, but that is generally not a good choice as far as time complexity is concerned. But, irrespective of that if we use two sorted sequences S 1 and S 2 with n by 2 elements each and want to merge them into an S, what you need to do is look for the first element at the head of S 1, compare that with the first element of S 2 and append that first element which is smaller the S 1 not first element append that to S. So, what are we doing here? So, if S 1 is 1 to 4 and S 2 is 3, 6, 8. So, this first element is going to initially pick one append that to S this is my S then compare the next two first elements. So, of course, in this process you also remove the first element of S 1 compare the first two elements of S 1 and S 2 respectively and you find well amongst 2 and 3 2 is the smallest again get rid of it compare 4 and 3 and you will 3 is the smallest get rid of it and then 4 and 6 and 4 goes and it will be 6 and then 8. So, basically leveraging the fact that both S 1 and S 2 are sorted and avoiding comparisons within S 1 or within S 2 and restricting comparison to set elements between S 1 and S 2. The complexity of this merge is basically order n it just requires one scan of both S 1 and S 2. The other remaining part is basically to take air of remaining elements. So, it is possible that S 1 has 1, 2, 4, S 2 has 3, 6, 8. Now, once you have conveniently inserted 1, 2, 3, 4, whatever remains in S 2 needs to be also inserted. So, you iterate on these at the end and this is what we have shown here inserting remaining portion of S 1 or S 2 as a case may be. So, let us analyze merge sort by construction we are dealing with complete binary tree of divide, conquer and combine operations. We look at the number of operations and the number of sequences which need to undergo those operations at each level of this tree. So, we expect that initially at the first depth is depth 0 the number of sequences is just 1 and you are dealing with a sequence of size n. At the next level you are looking at 2 sequences each of size n by 2 after all all the elements get divided at each level. So, you need to make sure that number of sequences times the size must always equal n. So, going down at the ith level you can easily fill in then you have 2 raise to i sequences because that is the number of nodes at level or depth i in a complete binary tree the size will therefore, be n divided by the number of such nodes or number of sequences. So, what we see here is that each recursive call divides the task into 2 tasks of half the original size. So, therefore, at any height i the work done is 2 raise to i number of sequences times n by 2 raise to i. So, it is order n at each level. Now, how many such levels exist well that number of levels is the height of the tree which is order log n. So, the total running time is therefore, order of n log n again we have assumed that we have auxiliary space to store elements as we divide and merge. So, this is assuming auxiliary or additional storage space this we have illustrated in the next session we will deal with an interesting variant of divide and conquer base sort called quick sort. Though it is divide and conquer an advantage of quick sort over merge sort will be that an in place implementation of quick sort will be very natural a natural in place implementation. However, as we will see there is a disadvantage and there is a disadvantage will be that merge sort which guarantees n log n irrespective of the input. You would have the same advantage in quick sort complexity depends on input and in particular the worst case is order n square the same that is offered by insertion or selection sort. Thank you.