 Hello everyone, in query processing today we are going to see block nested loop join. So learning outcome for this session is we will be able to analyze the cost of join operation using block nested loop join algorithm. So in the previous lecture we have already seen nested loop join. So in nested loop join we have already seen that we are going to compare tuple by tuple the relation r and s. So here we are going to process the relation on block basis instead of tuple basis. So here every block of inner join will be paired with every block of outer join. And within this each pair of blocks now every tuple of one block is compared with every tuple of the other block over there to generate the result. So this will save major block accesses. So let us see how this works. So in the nested loop join we have seen that we have read tuple of outer relation r and then compared this with all the tuples of inner relation s. Now the variation here is that we have to bring the first block of or we can say that the sequentially the number of the next block of relation r and the relation s to memory. After that the every tuple from the first block that is block br is compared with every other tuple in the block bs. So once we have finished the comparison the next block of tuple of relation s will be read. Again the tuples from the same block of r will be compared with the tuples of relation s. In that manner all the tuples of block the block br will be compared with all the tuples of relation s. And once this comparison will be finished the next block of relation r will be read and again this will be compared with all the tuples from all the blocks of relation s. So here instead of only bringing the tuple of relation R and comparing with all the tuples of relation S, we are bringing the blocks of the relation R and S and then comparing the tuple by tuple. So as we have discussed primary difference in the cost is that here in nested what you can say that in previous nested loop join this was a comparison against tuple. Now this is the comparisons against block. So let us look at the performance. In worst case again assume that the buffer can hold only one block of relation. So how many block transfers will be required? Now since we are reading this relation R only once so that we will be requiring Br block transfers. Whereas the relation S will be read as many blocks are present for the relation R. So for the first block entire relation S will be read means that all the BS blocks will be brought to memory. Then for the second block again all the BS blocks will be brought to main memory and that is why the total number of block transfers for S will be Br into BS and therefore the total number of block transfers will be Br into BS plus Br where of course Br and BS are the number of blocks in R and S. So how many SIC operations will be required? Again each scan of inner relation require one SIC and this inner relation has been read Br times. That is why for relation S we will be requiring Br SIC operation whereas for outer relation we are going to read it block wise that is why again Br SIC operations will be required for R. Therefore 2 into Br that many SIC operations will be required for this block nested loop join in worst case. What about the best case? So assume that if both of the relations are not able to fit it in main memory or one of the relation is able to fit in main memory then it is better to use smaller relation as an outer relation. So if neither of them is not fitting into the main memory then we have to go by block by block. Then what is the best case that inner relation fits in main memory? In that case it will be that inner relation is since it is available in main memory it will be read only once and therefore block transfers will be again same Br plus BS and 2 SIC operations which is same as nested loop join. So consider the example now again the same example that we have seen for the nested loop join that student and techs are the two relations the number of tuples in students are 5000 and the blocks are 100 whereas number of tuples in techs is 10000 and number of blocks are 400. Now again outer relation is student and inner relation is techs. So what is the variation here is that first block of relation R and the first block of relation S and then first tuple will be compared with first tuple then first tuple will be compared with second then first tuple will be compared with third and so on with all the tuples from this block. Then second tuple of the relation of the first block is compared with all the tuples of this first block of relation S and so on. this all the comparison has been finished then the next block will be brought to main memory. This block will remains in memory only. So, here is the case. So, as we have seen first block all tuples of block for one will be compared with block one of relation s. Then all the tuples of block one of relation r will be compared with next block then the third block fourth block and then the last block. So, assume that if there are three tuples it will be compared with all the blocks of this tuples. So, once this is finished take the next block. So, how many block transfers we have done? So, for the first block all the BS blocks has been read. For the second block again all the blocks will be read and that is why we are having BR blocks. Therefore, the total number of block transfers for this relation s will be BR into BS. Whereas the blocks of r will be read only once this is going to be BR. So, number of block transfers will be 40100 which is very less as compared to block transfers in nested loop join. And the sick operations are also less. So, this is only 200 which was 5100 in nested loop join. So, again you can go back and see the result in the nested loop join. So, if it is best case that there is both relations fit in main memory or inner relation fit in main memory then we will be requiring only 500 block transfer whereas to sick operations. Now pause the video. Now if we will reverse the relations that text is the outer relation and student is the inner relation. Then find out the number of total block transfers and sick operations required. So, for this particular block first block of relation r we are going to read all the blocks of students that is why and there are 400 blocks in the relation text. So, total number of block transfers will be 400 into 100 plus 400 that is again 40400 which was 40100 in the previous scenario. Whereas the sick operations if we can see it has been increased to 800. Best case is remains to same that the 500 block transfers and number of sick operations will be 2. So, we have to see that the smaller relation will be sorry the relation which is the larger relation can be read only once and if it is possible try to fit it into the main memory that is why the number of block transfers will be reduced and number of sick operations will also be reduced. Now we can improve the performance of block nested loop join. How we can do is that if we are having the key defined on this attributes which are going to be used for join then this will be more faster. Another way is that we have seen that we are bringing only one block of relation r and one block of relation s. Instead what we can do is that we can bring m minus 2 blocks of relation r in the main memory one block of s and then one block for writing the output. So, instead of one to one mapping what we can do is that m minus 2 blocks of relation r will be compared with one block of relation s. So, which will reduce the number of block transfers. So, number of scans will be reduced from b r to b r divided by m minus 2 because instead of one we are bringing for example 2, 3, 4 and so on whatever the m is the value accordingly the number of transfers will be reduced and that is why the cost will also be reduced. So, if you can see that instead of b r into b s it will be b r divided by m minus 2 into b s plus b r block transfers and same is the case for 6. Let us see with example. So, previously we have seen the m was 3 now it is m is equal to 4. It means what instead of one we are bringing 2 blocks of relation r one block of relation s. So, all the tuples will be compared with the first block then the all the tuples of the block 2 will be compared with the first block. Once this has been finished these 2 will be in memory only next block will be brought to main memory of relation s again the comparison will be done in this manner all the blocks of s will be compared against these 2 blocks. And that is why the number of transfers will be reduced. So, it was earlier 40100. Now it will be becoming b r that is nothing but here number of blocks are here 100. So, 100 divided by 2 that is 50 that has been read compared with 400 blocks of s and so total number of comparisons or the block transfers will become 20100. So, thus we can improve the performance of this block nested loop join by taking as many blocks of relation r as possible in main memory. One way is that we can scan the inner loop alternatively forward and backward. What will happen due to this is that. So, if the data is remaining in buffer only. So, this can be used in previous scans. So, which will reduce the number of disc accesses and of course, if index is available on inner loop join attribute we can replace the file scans with more efficient index lookups. Thus how we can improve the performance of join operation using block nested loop join. Thank you.