 Hey guys, welcome to SSUnitex, we will decide and today we are going to start with the lookup transformation. So what is lookup transformation? So that we will see in this video. So to use the lookup transformation to reference data from another source into the data flow stream. The lookup transformation upends columns from matching data to your source data. So a lookup transformation is similar to left outer join that we have already seen in the previous video. All rows from the primary stream will exist in the output stream with additional columns from the lookup stream. For example, we are having one of the source and we want to get the data and here we have one of the lookup source. So this lookup source will be going to match based on certain column values and here we want to see all the columns from the source table and from the lookup table and the matching column will have the data and non-matching column from the source will not have the data. This is the left outer join that we have already seen. So go to on the browser and we will try to understand in practical. So first go to on the Azure Blob Stories and here under this container here we can see these two files. First is the EMP data and second for the payroll data. So inside the EMP data go to the edit. So it has employee ID, employee name, employee address and department. Similarly inside the payroll it will have the four columns as well with the employee ID, payroll month, salary and tax amount. So we just want to get all the data from the employee table and the matching data from the lookup table which is the payroll table. So go to on the Azure Data Factory and here let me try to add a new data flow. So under this data flow first of all we are required to add the source here. So we have already created the source for the employee data. So I am going to use the same. So this is the employee data. Now we can go in the projection and under the projection we can see this employee data. The data type is string. So this data type should not be string it should be sort. So we can make the change here. Now we can go in the lookup source. So inside the lookup source, so let me try to create the source here. So inside the dataset let me try to add a new dataset and this is the Azure Blob Stories click on continue. File which is the Delivated Text click on continue. Let me call this as payroll dataset. Now we can select the link service that we have already created SSU testing and here we are required to select the file path. So we can browse and here go to the input folder and select the payroll file click on OK. Here first row as header. Yes that is true. Then we can click on OK. So this will be created. So as we can see now go to on the projection so we can see all these so employee ID is here. Let me try to make this as sort now everything looks OK. Now go to the source and try to add the lookup transformation here. So here we can see this lookup transformation. Let me click on that now under the lookup transformation we can check the property. So here we can see the properties like the output stream name that is lookup one. We can call this as lookup primary stream. So primary stream is the source one that is OK then the lookup stream. We have already created that so that is the source true. Now here we can see the option for the match multiple rows match on inside the match on we have three options any row first row and last row. So don't worry for this now I'll cover this later in this video. Now here we can see the lookup condition. So lookup condition should be based on the employee ID from the source and employee ID from the lookup. Now everything looks OK. Now we can go in the optimize and under the optimize we can see the option for the broadcast and partition option. So I'm not going to touch the partition option here but inside the broadcast we can select the auto or we can select the fix or off. So let me try to manually select the fixed this time. So here we can see the broadcast option for the left stream or the light stream or the right stream. So here if you can see in this info so it is saying if your worker node can held the entire stream in memory choose this option. So basically it is saying it will be going to hold all the data into the memory and after that will be applying the lookup. So in the right stream that is the lookup I want to keep this in the memory so we can select that option. Now go to the data preview and under the data preview we are required to on this data debug option. We can on this so it will generate the cluster and that cluster will take little bit time so we have to wait. So data flow debug option is on now we can refresh it. So once we refresh it it will be going to have all the data as we have seen in the left join. So all the data from the source stream and the matching data from the lookup stream. So as here we can see we are having ID 1 2 3 3 2 times because in the source we have repeated. Now we can see the 4. 4 is not available in the lookup source so that is why we can see the null values for the lookup columns. Like the ID is null then we can see the other values which is the payment, salary and tax amount. So now go to the here lookup setting again. So here we can see the match multiple rows. So before that go to the blob storage and here we can see this 3 ID so 3 ID is coming 2 times and ID 2 is coming 2 times in the reference table or the lookup table. Now go back to the employee one which is the source table and source table is having ID 2 1 time and ID 3 2 times. So that we can verify from here ID 2 is 1 time and ID 3 is 2 times. Now go to the S0 data factory again and here we can see the match multiple rows. So what it means so once we select this so it will be going to return all the matching rows from the left along with the right table. So like ID 3 is coming 2 times in the source and 2 times in the lookup so output will be 4 rows for ID 3. So this is something with the join we can say like 1 2 3 and 4 so we can say that ID 3 is coming 4 times and ID 2 is coming 2 times. Now go back to the lookup setting again and unselect this match multiple rows go to the match on here we can see the any row first row and last row. So what it means so let us try to select the first row. So once we select the first row here we can see the sort condition so sort condition should be based on the source 2 column which is the lookup. So let me try to select here for the tax amount. Now go to the data preview and try to refresh it again. So once we will refresh so as we have selected the first row so it will be going to return the first row from the tax amount. So ID 3 is coming 2 times and ID 2 is coming 1 time because in the source it is available only 3 2 times and ID 2 is 1 time. So here we can verify the tax amount so for ID 2 the tax amount is 6500. Now go to the blob storage and go to the payroll and under this payroll we can verify that. So the first row for ID 2 is having 6500 so that we can see and for ID 3 it is 10000 here we can see that should be 10000 here and 10000 here. So this is the first row it will be going to return the first row of matching data. Now go back to the lookup setting and here we can select the last row. So for example as we have seen ID 3 is coming 2 times and ID 2 is coming 2 times it will return 7500 and 11000. Now go back to here and try to verify so here we can see 7500 and 11000. So this is the use of first row and last row and last one we can see the any row so any row it will be going to get whether the first or last. So it will be the faster among all these three because there is no need to do the sorting based on the tax amount. So now if we want to load this into some destination then we can add the sync here and under this sync we can go to the inline query and inline data type that should be the delimited text and after that here we can select the link service. So this is the link service go back to the setting here we can select the output path click on OK. So this is we can do let me try to check the data preview here. I am not going to execute this to loading into the output because we have already seen this in multiple times. So we can just publish this and after that we can create one of the pipeline and where we can use the data flow activity to execute this data flow. So here we can see all the data so this is all about the lookup transformation. So thank you so much for watching this video see you in the next video.