 Hello friends, today we are going to see merge join. So, learning outcome for this session is students will be able to analyze the cost of join operation using merge join algorithm. So, previously we have all seen nested loop join and block nested loop join, today we are going to see merge join. So, this is used to compute the natural joints and equi joints. So, consider that we are having R and S 2 relations and the of course R intersection S will give you the common attributes. It is assumption that both the relations are sorted on this common attributes while performing the join operation. So, as you know equi joint means it is of the form R dot A is equal to S dot B. So, where A and B are the set of R are the attributes or set of attributes from the relation R and S. So, if you will consider these two relations student and text now here ID is going to be the common attribute on which we can perform the join operation. So, let us look at the algorithm how merge join works. So, it is assumption that again both the relations are sorted on the common attributes on which we are performing join operation. So, PR and PS are the pointers to the first tuple of relation R and S. Now, this will be executing the join operation till we are not reaching the end of file or end of relation. So, in this what we have to do is that we have to first read all the tuples which are pointing by the PS or for all the tuples from S having the same value for the common attribute. So, assume that ID is a common attribute then we have to read all the records of ID of relation S having the same value for ID. So, that is what it is that take the first record to the set S then go on reading that records of relation S till the value for this joining attribute is same. And once we are get we have found all the tuples having the same value for that join attributes then we will do this next comparison with relation R. So, once we have gathered that particular records in set S then the tuple which is pointed by the point of PR will be compared with all the tuples which we have gathered in set S. And those we are satisfying the conditions that will be added to the result. So, first record will be compared if it is then second record till you the value of the join attribute is same as the value of join attribute in set S. Once you find the mismatch then we have to go for reading the next tuples from relation R again sorry S again and again start the comparison. So, this is what algorithm is you can go through this algorithm and we will see with this algorithm with example. So, consider again two relations one is employee and the second one is department. So, common attribute here it is department ID. So, as per the algorithm this PR and PSR pointing to the first tuple and this is a TS which is pointing to again the first the TS is nothing but the first tuple which we are taking to the set. So, first value that is 0 1 has been taken. So, the next pointer is TS 1 which is pointing to the next record of S. So, as we have seen we have to bring all the records of relation S having the same value. Since department ID is a primary key for this department relation we will be getting only one record in set S since 0 1 and 0 2. So, 1 or 2 are not matching. So, S will have only a single record and then we have to start comparison of this record of R with this S. So, first record is compared with this and this department ID is 1 this is also 1 and that is why this record will be added to the result. Then go for the second record again the department ID of the second record is 1 here is also 1. So, second record will also be added to the result then then it will be moved to the third record. Third record is having again the department ID same as what we are having in relation sorry set S and that is why the third record is added. Now, the fourth record is having this department ID 2 whereas, we are having the records of department ID 1 that is why we will fetch the next record from relation S. So, this S this is the first record which was pointed by TS that is going to be 2. Again there will be only a single record since it is in primary key. So, this is now compared with the again records of relation R. So, this is 0 2 and this is also 0 2. So, this is going to be matched that is why this record of this 1004 nilish 2 and cells will be added to the result. Next record is having again the department ID as 2 which is same as what we are having it in S. So, it will be added and this next record will be the values having 3 which is which is not same as S that is why we will move the next pointer of this S or we will fetch the next record of S and here it will be now 3. So, again comparison will start. So, this is 3 and this is 3. So, record of this will be matched then again the next record of 1007 having department ID 3 is going to be matched then the next record is also having the department ID as 3. So, this is going to be matched now this is 4 and this is 3 which is mismatched that is why we have to read the next record of this S. So, once we have read this again it will be matched and the record will be added. But now the next record or the pointer there is no record for the record in R and that is why this algorithm will stop though there is a record in this department and thus we are getting the result of this join operation using merge join. So, here we have considered employee as an outer relation and this department as inner relation. So, let us switch this relation sequence. So, now department consider it as an outer relation whereas, this employee as an inner relation and then we will see how this algorithm will work. So, what we have seen is that first fetch the records having the same ID same value for the join attribute join attribute is department ID. So, first record is 1, second record is also 1 department ID. So, that will be again taken into the set S. Third record is also having the same value that is why it will be also taken into S. In previous scenario department ID was a primary key and that is why we were getting only one record. Now here employee ID is an primary key whereas department ID is a foreign key and that is why we are getting three records. Now this is two which is not same as one. So, we will stop fetching the records of S and then we will start comparing the records of R. So, first is compared with this one and the record will be added as an result. So, again the first will be compared with second. So, since there are three records now pointer is not moved. So, first record will be compared against all the records and the third one will be this. And then now once we have compared all the pointer will be moved and this value is 2 whereas this value is 1 and that is why the next records of relation employee will be fetched. So, at this moment stop the video and try to write down the next records or the sequence of records that we will fetch in relation and we will get the result. Yeah of course, this is the value 2. So, these two records will be brought to will be written to set S and these two will be compared with this record pointed by PR. And these two will be compared and the result will be against both the records will be taken into result. Then this is 3 and this is 2. So, this is mismatched. So, this 3 will be brought to main memory. Then again this 3 and all these 3 records will be compared and will be added. Then the fourth record will be read and this will be matched and then it will stop because there is no further record in employee. I hope you have understood that how more join is working. So, what is the requirement here is that each set SS of all the tuples with the same value for join attributes. So, it means that in this scenario this is the set SS should fit in main memory. And if it is not fitting it in main memory, then we have to perform block nested loop join for comparison. Okay? If either of the relation is not sorted, first we have to sort it on the joining attribute and then only we can perform more join. What about cost analysis? So, since all the records will be read only once, so this is going to be the result as the same as number of blocks. So, number of block transfers will be sum of block transfer in both the files. So, it will be BR plus BS. So, if BBB buffer blocks are allocated to each relation, then the number of disk will be BR divided by BB plus BS divided by BB. So, if either of the relation is not sorted, of course for sorting that relation, the cost or the cost required for sorting that relation should be taken into consideration. But since we are having a single pass through both the files in this merge join, merge join request is more efficient as compared to other join provided both the relations are sorted on the joining attribute. So, consider our same example which we have seen for the other join operations that is the records or the student and text relation here. So, block transfers, so students since containing the 400 blocks and text is containing 100 blocks, I think it is a reverse one. So, student is containing 100 blocks and text is containing 400 blocks, total required is 400 block transfers. So, SICK operations currently we are assuming that one buffer block is allocated to each input relation that is why total SICKs will be 500. So, SICKs are more and but the block transfers are very less in this merge join. So, you can compare the result of all these three joins looking at you can see the result in various join operations over there. Thank you.