 Hello everyone, welcome to the session on Parallel Sort. In this we will see rain partitioning sort. Now let us see the learning outcomes. At the end of this session you will be able to formulate the steps in rain partitioning sort and you can apply the parallel sorting techniques on a given relation. It means you can sort the technique by the relation. This parallel sort it is applying an intra-operation parallelism. What is intra-operation parallelism? It is the type of intra-query parallelism. Now intra-query means a single query is during the execution it is parallelized and in that intra-operation is from the query an operation a single operation is again parallelized ok. So consider here an example an operation sort in a query is executed in parallel. Now what are the parallel sorting techniques? There are two parallel sorting techniques basically range partitioning sort and external sort much and in this video we will see the range partitioning sort. Before starting with the steps of the range partitioning sort how it works, how it is applying on the parallelism and all, let us see some assumption on that. We will assume something and based on that we will apply the steps on it. Assume that there are n processors P0, P1 till Pn-1 and there are disks D0, D1, Dn-1 means there are few disks are there and few processors are there and every disk is associated with one processor and that processor is working on the contents of those disks. The relation we are considering here as a relation R which is partitioning as R0, R1 till Rn-1. So again the partitions are also n partitions and the partitioning technique which has already applied on that might be round robin technique or it might be hash partitioning technique or range partitioning technique or whatever any other partitioning technique that is the assumption. But what is the objective of this range partitioning sort that whatever the given relation is there the relation say you can say relation or a table which is relation R which is residing on n disks which is divided in n disk which is partitioned on n disk on a particular attribute A. Say for example I want to sort on a table of the student record and from that I want to apply I want to sort the student record by roll number. So here attribute A is a roll number on which we want to sort it. So see the scenario here the hardware scenario I will be showing here that the operation which I want to parallelize is sorting operation and we are using here the range partitioning technique. The hardware scenario is there is processor 1, processor 2 to till processor n or if you are starting with processor 0 then it you can say processor 0, processor 1 till processor n-1. Similarly disk 1, disk 2, disk 3 and the relation is R and this relation or a table is now partitioned as relation R1 partition R2 partition Rn partition R1 is stored in disk D1 and which is associated with processor 1. So whatever the operations are going on those are going on in processor 1 with the data which is stored in this disk. Now let us go for the steps in the range partitioning sort. The first step is already we know that the data is partitioned but in the way we want to partition that is not there. So therefore the first step is for partitioning that relation. So partitioning the relation R but on what basis we are partitioning? Now we want to sort the relation on a particular attribute say for roll number in a student record. So we have to know partition in such a way that the partition is based on roll number that is attribute A. So the vector we are taking here range vector as V and those vectors are again V0, V1 or V1, V2, V3 like that and these partitions we are sending to particular processors. So send the partition records which fall in ith range to processor PI where they are temporarily stored in DI. It means that if we know that disk DI is associated with processor PI. So whatever the new partitions we are doing those again we are sending to those disks. Step 2 is we know that now our sorting is based on roll number for a student record and roll number wise already we have partition. Now every disk is containing the records. So what we are doing now every disk is containing that based on the ranges. So we are now sorting, applying the sorting on all the disks parallelly with the processor all PI's. So sort each partition locally at each processor PI. It means all the processors are parallelly sorting the data which is containing in their respective disks and finally we are merging all the sorted results. Let us see this by some example you will understand more in that by the example. So consider this example. Now what is there in this example the relation we are considering as employee. Employee is having employee ID, employee name and salary is the data here. Now what is the assumption that already the employee record is partition. So there are 3 disks and the data is partitioned as D0, D1, D2 and with the every disk is associated with the processor. So processor P0 is associated with D0, P1 with D1 and P2 with D2. And all the processor P0, P1, P2 they are containing now they are having the relations with the in the disk and what are those relations now the partition relation we can say partition 1 as employee 0, partition 2 as employee 1 and partition 3 as employee 2. So this is the scenario we which we are assuming here. Now so whatever that example is there now we are visualizing that example. So see here what are the what is the table and how we have divided that the division is here that this employee relation is there with the contents employee 0 partition is having this contents in this D0, employee 1 is having these contents in this D1 and employee 2 is having this contents in D2. But now see here that you can see by this one that the data whatever is partitioned is there that is might be based on employee ID. Now what we want is we want to execute a query what is that query select star from employee order by salary. So we want to select the contents of those tables table but by the salary we want to sort it ok. So here the table is not partitioned according to the salary we know that it is partitioned by some employee ID and we do not know what is the technique we it has applied. But we require that we want to sort it by salary and we want to partition it by salary. So let us go for the step. So what is the step one earlier we told that we have to partition that and what is the partitioning technique we are applying we are applying range partitioning technique. So identifying a range vector ok for the partitioning we have to identify what is the range vector are there. So as we are sorting with salary so the vectors are divided as V0, V1, V2 and we are considering these vectors what is that employee 0 the vector which is containing the range all the salary which is below 14000. And the next vector the range for is salary range is 14000 to 24000. The third one is 24000 onwards all the contents ok. So our ranges are three ranges. So that we have done here this is our earlier data ok. So the these contents are there we have now moved all these contents and what we have done we have partitioned the data accordingly you can see here that all the contents in this are less than 14000 salary here less in between 14000 to 24000 and here more than 24000. So this is the first step which we have applied. You can see here the earlier scenario is this one now the scenario is changed by the step one. So the partitioning has done but still the data is not sorted. Now see pause the video and observe the change in the data in the individual partitions what is your observation thing and you can write on a paper. So what is that? You can see here that whatever the down contents are there those are the data which are sorted in all partitions ok. So that is what the second step is we are sorting parallely all the processors. So here you can see that this is sorted so all 14000 less than salary is sorted but now here all the salary between 14000 to 24000 records those are again sorted and this is the again the next one. So that is the step two which we have applied. What is the step two? It has sorted the data in every partition so parallely all the processors have sorted the data associated data which are associated with the disk. So every processor sorts is this contents in parallel and what we call it this as this is called as a data parallelism why? Because parallely the data is red and it has means we are operating on that data. So here the operation is sorting so this we call it as a data parallelism. What is the final result? Final result is after the processors completed the sorting you can simply say that whatever the vectors are there you can see the image that all the data in every vector is automatically sorted only the thing is you have to merge them and how you have to merge them? V1 that is partition 1, partition 2, partition 3 you can just merge them accordingly partition 1, 2 and 3 and that whatever the data we are getting that is the sorted data ok. So that is here you can see here ok. So this is there, this is there, this is there and it has merged. So the data has already sorted in this. So this is what the range partitioning sorting is there where the parallelism is applied. You can see how the parallelism is there? Parallelism is only applied for the sorting earlier the partitioning technique one more thing is that it is again applying the partitioning technique why? Because based on the range partitioning. These are my references. Thank you.