 Hello everyone, welcome to the session on Parallel External Sort Merge which is one of the operation of the parallel sorting techniques in the query processing technique. At the end of this session you will be able to formulate the steps in Parallel External Sort Merge and you can apply the parallel sorting techniques for a given relation. It means you can for a given relation you can apply this sorting technique and you can parallely sort the things. So, this parallel sort already we have seen earlier also it applies an intra operation parallelism. It means one operation is parallelized for multiple processors using multiple processors. So, here a single operation we are considering as a parallel sort. What is this? This is an intra query parallelism technique. We have already seen earlier also that interquery and interquery are the parallelism techniques for query processing. So, this parallel sorting technique in this one there are two major sorting techniques. One is called as a range partitioning sort and another one is called as a external sort merge. For the first one you can refer my earlier video. Now today we will see this external sort merge technique. Few assumptions are there before starting this one. The assumptions are we are considering that we have n number of processors and we have n number of disks and you can see here that processors are P0, P1 till Pn-1 and disks are D0, D1, Dn-1 and a relation is there which we have already partitioned using any of the technique. The relation may be round robin technique or hash partitioning or range partitioning whatever. So, a given relation which we want to sort that is already partitioned by few partitions. So, those partitions we are assuming here as partition R0, R1 till Rn-1. What is our objective or objective is to sort a relation a given table R which is deciding on n number of disks on a particular attribute A. What is that attribute on which attribute you want to sort it? For example, if your relation is employ table and if you want to sort it by salary then salary will become an attribute. So, here that employ relation we are sorting by a salary attribute. So, consider now this assumptions are these one. We have processor P0, P1, Pn-1 and we have disk D0, D1, Dn-1. Our relation is there which is already sorted by some of the technique and that relation has sorted as partition R0, partition R1 or partition Rn-1 and here we are applying the parallel external sort much. Let us see the steps in this. So, the first step is sort the relation partition RI which is stored on disk DI on the sorting attribute of the query. So, what we are doing? We are sorting a particular relation already it has we are considering that already it has partition. So, on every partition on every disk we are applying the sorting. So, every disk is parallely sorting. Now, after the sorting the second step is identify range partitioning vector V for this one. So, here what we have done whatever the earlier partition technique is there, whatever the earlier disk contains are there for them we are applying the sorting technique and after that again we are going to partition it. So, for that on what basis we want to partition now in this one for example, if we are partitioning by salary then we have to consider a range our vector based on that salary based on the processor. So, here we are considering P0 to Pn-1 processor. So, for that the partition vector we are taking. Now, the next one is what the third step every each processor is now performing this is the actual parallel technique which we are applying here. So, every processor PI is performing a merging. So, it is merging see already we have seen that those are sorted the data is sorted in different disks. Now for example, consider processor P0 what P0 will do every disk partition will take whatever the range of the P0 processors is there it will send all the records to that one. For example, if a salary range is below 14000 then all the records in every disk are coming to the processor P0 for that one. So, what we are saying here the data are actually transferred in order that is all processors send first partition into P0 then all processors send the second partition into P1 and so on. So, based on that and finally we are concatenating we will see this by an example proper example is given here. Now so what is our relation we are considering a relation as employee relation with the three attributes that is employee ID, employee name and employee salary ok. Now what we want to assume we are assuming that there are three disks and those are partitioned by round robin technique D0, D1, D2 these are the disks which we have considered and along with that the processors are P0, P1 and P2. So, in this what we can say processor P0 is associated with this D0. So, whatever the processor P0 is taking it is working on it is working on the contents of the disk which are D0. Now at processor P0, P1 and P2 the relations are already partitioned ok. So, those partitioned we can say that employee 0, employee 1, employee 2 respectively what are these partitions these partitions are the table partition this is a relation employed. So, this is now partitioned as employee 0, 1, 2 and irrespectively what you mean by this employee 0 is stored at D0 and the associated processor is P0 similarly it is so on. So, this is our initial database scenario you can see the scenario here that employee 0 is there employee 1 and employee 2 this is a relation and already these are partitioned D0, D1, D2 the partitioning is simple round robin partitioning technique is there. So, you can see that the partitions are like this. So, salary now you can see here that salary is totally divided in all the disk ok there is not any uniformity for the salary attribute. Now let us apply this query that is select star from employee order by salary. So, we want to sort a relation by salary. So, we want the sorted relation by the attribute salary. So, here actually the table is earlier partition some other attributes are there like employee or whatever ok, but we want to partition that by salary let us see. So, apply the first step what is the first step sort the data stored in every partition that is every disk using the ordering attribute of salary whatever the partitions are having the data those are first sorted. So, what will happen in that the first means below less salaries will come at the early stage and then the salary will move on like this every disk is containing the sorted order ok. So, you can pause the video and observe the change in the data what the partitions are having you can see here what is the change in this data what is the change. Let us see here the data is sorted you can see that the data is sorted here all partitions are sorted you can see 5000, 6000 like this here also and here also ok. So, this is the change this is the change for the after the first step. Let us go for the second step what is the second step we are identifying the vector we have three partitions. So, we have to go for three vectors what are our three vectors we are considering here that our range 0 is 14,000 and less range 1 is 14,000 to 24,000 range 2 is 24,001 to more. So, here we are considering that the first disk or first range is having all the data of the employees who are having less salary than 14,000 and the middle one is containing the salary range 14,000 to 24,000 and the third one is containing more than that. So, this is our second step. So, you can see here distribution of employee 0 according to the range vector what is the range vector that is we are saying that the range vectors are less than 14,000 greater than 14,000 to less than 24,000 and then after that more than 24,000. So, you can see here that the contents of this disk are now distributed as here this d 0 then this one and as well as this one here no contents are there because we do not have any salary more than 24,000 here ok, but here we have one salary which is more than 14,000 so we have entered here ok, so that is the next step. What is the third step now the above distribution is executed in all processes parallely ok. So, the same distribution is working parallely in all the disk and upon receiving the records the various partitions are again means merging means p 0 will go for the first range p 1 will work for second range and p 2 will work on the third range. So, simultaneously all the data's are distributed you can see here. So, what will happen this d 0 all the contents below 14,000 are coming similarly for disk d 1 all the contents which are within the range are coming here and the so on ok. So, what we can say here that this process is done at all the processors for different operations. You can see here we have shown only this d 0 contents where all the records of these one which are in the range less than 14,000 are there. Similarly, it will work for the remaining two. So, parallely these are working and so finally you can see that the contents are this is sorted this is sorted and this is also sorted where all these are already have this one. You can see the contents below 14,000 all have come into this disk d 0 which is working on processor p 0 and here this is processor p 1 is working on this where all the contents be within the range 14,000 to 24,000 are coming in this one and processor p 2 is working on this where the contents from more than 24,000 are moved from this one. You can see here that from the first, second and the third partitions every contents are moving here. So, finally our table has now sorted properly just we have to merge all of them. So, that is what the third merges this d 0 content d 1 content and d 2 contents. If we have merging all these one you can see that the complete data is sorted. This is all about the sorting technique. These are my references.