 Everyone, welcome to the session on inter-operation parallelism in the parallel databases. This is one of the parallelism technique in the parallel databases. At the end of this session, you will be able to apply this inter-operation parallelism techniques for parallel query execution. Talking about the forms of the query parallelism, you can see that basically there are two types of query parallelism techniques, inter-query parallelism and intra-query parallelism. Again, intra-query parallelism is divided as inter-operation where every individual operation is parallelized and inter-operation where in a query, whatever the operations are there, those operations are parallelized. So, again inter-operation parallelism is divided as pipeline parallelism and independent parallelism. So, today we are talking about, in this video we are talking about inter-operation parallelism. So, inter-operation parallelism, how it works? In a same query or a transaction, if different operations are there, then those are concurrently executing. Those are called as, this is called as inter-operation parallelism. Two different operations or more than one operations are parallelized of a single query, right? Whereas in inter-operation, what will happen? A single operation is parallelized, say parallel sort, parallel joint like that. But here, multiple operations are parallelized. It is having two forms, one is called as pipeline parallelism, the second one is called as independent parallelism. What is there in the pipeline parallelism? Actually, it is working like producer and consumer thing, means one of the query, one of the operation is executing. The result of that operation is providing to another operation as an input. But how it is there? The second operation is not waiting for the first operation to complete. As on the first operation started its execution, as on the tuples are generated, the results are generated, those are given simultaneously to the second operation, ok. Consider here that output of one operation of A, operation A is there, which are consumed by the second operation B, ok, without completing the first operation. Why? Because as on the results are generated, those are provided to the next operation. So, in this one, what happens? Operation A and operation B are parallely working, like an assembly line, where multiple operations are parallely executing, but output of one is given to the input to the second one, so in the same way. Whereas in independent parallelism, what happens? It is totally independent, working independently. It means the multiple operations in a query, ok, are independently working without means considering anything, like multiple operations in a query which are not dependent on each other are executing, ok. Rarely this particular parallelism is working. Consider this scenario here, how the parallel execution is going on? Operation one, operation two, operation M, ok, the processors are associated with them, but the result of operation one is provided as an input to the operation two. Again, the result of this is provided to operation M, ok, but how they, those are not waiting. So, in this case operation two is not waiting for the completion of operation one. As on the data is generated for operation one, it is, it goes on giving that to operation two. So, parallely these operations will work independent, parallely by getting the output of one and providing the output to another one also. So, this is what? This is called as a pipeline parallelism. Take an example, consider the join of four relations here, like R1 joins with R2 joins with R3 joins with R4. Now, if we want to take this operation, this is a query, then how parallelism I can apply in this? See, the parallelism I am applying here is, I am taking the join operation of R1 and R2 and that you may say that I am storing in temp one. As on temp one is generating, that temp one is given to the next join operation, so temp one joins with R3 and the result is again stored as temp two. As on temp two is generating, that is given to joining operation of R4, so parallely what will happen? These three are working, the join operation of this, join operation of this, join operation of this are parallely working at processor P1, processor P2 and processor P3, ok. The scenario is here, you can see that R1 joins with R2, joins with R3, joins with R4, ok. So, the first processor P1 is doing this one, like R1 joins with R2, is done by processor P1. As on the results are generated, those are given for the next processor P2, so that is provided. The data is coming here and parallely, it is taking the join operation of R3, that is done at processor P2. As on the generated data is there, as on it is generating, it is consumed by the next processor P2 here and it is during the join operation of temp two with R4. So, what will happen here, that the pipeline is produced here, the pipeline is provided in these one. As on the data generated, those are given to the next one. So, processor P1, processor P2, processor P3 are parallely working and the data they are somebody is producing and somebody is consuming in a pipeline, this is a pipeline parallelism. So, what is happening in the pipeline parallelism, each of the operations are executing in parallel, there might be some time difference is there, because in our example, processor P1 always starts first, then processor P2 will start and then processor P3 will start, but parallely everyone is executing. And the result of one is provided to another one, so sending result tuples to one computation to the next operation is there for the further result calculation. And pipeline able, what will happen, it will applicable to the pipeline able join evaluation algorithms only, for example, index nested loop join, so wherever pipeline able operations are there, there only it is applicable. What are the limitations of this, what are the factors limiting this particular utility of the pipeline parallelism? What will happen in the pipeline parallelism, it is not good choice if the parallelism is very high, if we need more parallelism, high degree of parallelism is required, in that case it is not a good choice. It is useful with small number of processors, where the less number of processors are there, it is very useful in that, like in our example, we have only three processors, we want to take the join operation, so parallely we have done it in a proper way. But it is not applicable where pipeline is not suitable, for example, consider this example here, select average of salary from employee group by department ID. So, if this is the thing we cannot do the parallelism here, we cannot do the pipelining here, why? Because in this one we want to do the grouping first and then we want to calculate the sum of the salary and then we want to do the average, so in such cases it is not applicable. So, therefore, what we can say, we cannot expect the full spade up for this one. Now, observe this diagram, pause the video and you observe the diagram and see that what kind of parallelism is applicable here, there is a difference in the earlier diagram and this one, see what it is, it is actually the independent parallelism, how you can say operation 1, operation 2, operation M, so all these operations in parallely executing, but those are not dependent, why those are not dependent? Because the output of this operation is not going to give anywhere, the output of this one finally it is collecting somewhere, but it is not dependent on this one, means operation 2 is not dependent on operation 1, operation M is not dependent on any of the earlier operations here, so this is called as the independent parallelism, why? Because here every operation is independently working, ideally it is not applicable everywhere, let us see the example for this, again the same example we are taking, the join operation of four relations R1, R2, R3, R4, how we can do parallelism in this one independently, we are taking two, two join operations like R1 joins with R2, then which is stored as temp1 and R3 joins with R4 which is stored at temp2, you can see here that temp1 and temp2 are parallely and independently doing the work, those are not dependent on each other, ok, so here we have applied independent parallelism, finally what we are doing, we want to do the collection of the result, again we want to do the join operation, so temp1 joins with temp2, we are taking the join operation here, actually the independent parallelism works for processor P1 and P2 for these two here, here for P1 and here for P2, we are applying the independent parallelism and then we are getting the result, so P3 has to wait here for the completion of P1 and P2 or we can apply earlier, we have seen the pipeline parallelism, so we can apply the pipeline parallelism and the result of P1 we are giving to P3, result of P2 we are giving to P3 and then we can combine the output here at the end, so here we have used independent parallelism also and the pipeline parallelism also, so basically talking about the limitation, it is not providing high degree of parallelism because independent parallelism means there are not much operations in a single query where independently they are working, ok, this is the scenario, so here R1 joins with R2, R3 joins with R4 are parallely working, so processor P1 and processor P2 are working, as on the results are generated you may apply pipeline here, so temp1 and temp2 is doing that work and finally the results are provided, ok, so here independently these two are executing the join operation and those are given to the next one, these are my references, thank you.