 Hey guys, welcome to SSunitech Social Decide and today we are going to see about the asset transformation in the SEO data factory. I would strongly recommend to watch this complete video because here we are going to see the real-time use of the asset transformation and this I have already implemented in multiple projects. So what is asset transformation? The asset transformation enables you to build custom rules inside your mapping data flow for data quality and data validation. So for example, if we are getting the data from the source, so before going to load that into the destination, we are required to check whether the data is the correct or not. If data is not correct, then we are required to load that into some error file or the error table inside the SQL server and the correct data will be loaded into the destination. You can build the rule that will determine whether value meet an expected value domain. Additionally, you can build rules that check the row uniqueness. So this is the first property that we can check whether the values are having the duplicate data or not. If we have the duplicate data, then we can mark that data as incorrect data. Next, the asset transformation will help to determine if each row in your data meets a set of criteria. The asset transformation also allows you to set the custom error message when data validation rules are not met. So basically here we can do the testing of your data by three ways. First is the expect true, second is the expect unique and third is the expect exist. So what all these three? Let's go to the source and we'll try to see there in the practical. So here we are having this source which is the sales data and we can see it has total six columns with the sales order ID, sales order date, customer ID, quantity, value and country. So here first we can see inside the sales order ID, we have the duplicate data for the ID 9. So this is the error rows. So we are not required to load that into the destination. Next, if we can see under the sales order date for the ID 4 and 5, then we can see this is not the correct date. Month 15 is not available for the row 4 and similarly for the 16. So these two rows are incorrect data. So this is for the expect meets and for the expect uniqueness that will be the ID 9. So these two are failing the condition and third one that we have seen in the slide which is the expect exist. So this option is only available when you have selected a second incoming stream. So that we will see in the practical. Don't worry for now. Exist will look at both stream and determine if the row exists in both streams based on the column or the expression that you have specified to add the second stream for the exist. We can select the additional stream that we will see. Don't worry for now. So once we will select that option, then here we can see we are having another file which is the customer detail dot CSV file. So as here we could see it has total two columns with the customer ID and the customer name. So as here we can see it has only ID 1, 2, 3 and 4. If we can go in the sales data here we can see it is having the ID 5 as well that we can see in the sales order 6. So ID 5 is not available in the customer detail. So this is also going to mark as the error row and we are not going to load that into the destination. So total we have three conditions. So we will be implementing that later bit. So before that, so let me go in the SSMS and we'll try to see the sync. So here as we could see we are having these two tables. First is the sales table and second is the sales error data. So these two tables let me execute. So as of now we don't have any data here but we can see it is having total six columns that we have seen in the source for the sales data which is the correct data. For the error records here we have one additional column and this additional column is indicating like on which column we have the error and what is the error data. So everything will be available on this error column. So let me try to implement this in practical. So here let me go in the Azure Data Factory and let me try to add a new data flow and here let me try to add the sales source here. So let me click on this and after that we need to select the data set. So as I have already created the data set for the same, so let me try to use that one. So this is having the same data as we have seen there. So let me enable the data flow debug option here. So we will see that. So now we can click on refresh. So it will be going to load the data. So now we can see all the data here. Let me quickly go in the source and let me click on this plus symbol. So here we are required to add the asset transformation. So the asset transformation is available under the row modifier. So let me click on that. Now so here we can see the output stream name. So that will be the asset one that is okay. Incoming is the source one that is also okay. Here we can see the additional stream. So this is the additional stream that we are talking about. So the additional stream in this case will be the customer data. So that we will be going to add later bit. So in the downside here we can see the option for the asset and here the asset type. So under this asset type we can see total two values here as of now. First is the true and second is the for unique. So let me try to use the unique first and here we are checking the sales order ID as unique values. So here we can see the expression. So under this expression we need to select the column by which we want to check the uniqueness. So we want to check the uniqueness on this sales order ID. So we can select that one. Let me go in the data preview and here we will be going to see whether the duplicate row which is the ID9 is marking as error records or not. So that will be going to visible here. If you can scroll down here then here we can see the notification for the error records. So these two are going to mark as the error records because here we are having the duplicate values for the ID9. Now go to the asset setting again. Let me try to click on this add and this time we want to check the dates. So as we have checked for the uniqueness second we need to check for the dates whether the sales order date values are correct or not. So for that we are required to select the expect true and here we can call this as sales order date. Here we can add the description of this asset. So this is optional and for filter here we can also do the filter if we want to check this for any specific column value. So for example if we want to check for the country India then we can add the filter here. So we are going to check for entire data. So that's why I'm not going to use here anything. Now let me go on this expression and open the expression builder. So under this expression builder here what we need to do first we are required to convert this into the date. So we have to use the true date for converting this here we need to specify the string value. So that will be your column then we can see the date format. So the date format that will be dd mm yy. So it is going to convert this into the date. So we need to check whether this is converted properly or not. Once this is going to convert properly then it will be going to return the none. If it's not going to return properly then it will return not none value. So let me use the is null function to check whether this is returning true or not. So this time if this value is correct data. So if your sales order data is correct then it will be null. So here we are checking for the is null. So it will always return as true. So if it is returning true then your output value will be false. So we have to reverse this. So it should be not equals to if this value is not null then that will be the error records. So we have to use this like not equals to is null and this date. Now let me click save and finish. Let me go into the data preview and try to refresh it. So here we should be going to see the output of this incorrect data will be error records. So here we can see like for ID 4 and 5 we are getting the error records. So now we have total 4 records which is the error records. Here we need to do for the existing for the customer ID. So for that we have to add the another source. So let me try to add the another source and this source too is for customer data. So I am going to call this as custom. Here let me try to add a new source go to the Azure Blob Stories click on continue delimited text click on continue. Let me call this dataset as customer link service we can select then we need to select the file so we can browse it and under this we can go into the input folder and here we need to select the customer file so customer detail.csv file we can select and click on okay. Now the first row is header so we need to mark this as true. Now let me click on okay. So here we have created the dataset for customer. So everything is okay. Let me go back to the asset transformation and here under this setting we can see the additional stream. So this additional stream is our label now which is the customer that we have created we can select it. Now here we can add one more asset and this asset will be for expect exist. So we can select that one and let me call this for the customer and here we need to check if the customer ID from the source one is equals to the customer ID from the source two so that we can see the customer ID from this so if this is going to match then we are going to mark that value as true. If it's not going to match then those records will be going to mark as error records. Let me go in the data preview and try to refresh so we can verify that so here as we could see ID 456 and 99 so these are the error records. So now what we are required to do here if we will be going to have one more column and that column will be going to hold the value for the error description then we can directly load and filter this data. So how we can do that so for that we have to add the derived column transformation here so we can add this one and after that here we can see the columns so we want to add a column so let me call this like error rows so here we need to use two functions so let me try to open this expression builder first function is is error so what is error will do is error will be going to return the error rows if that value is error rows then it will be written true if we have the correct value then it will be written false so let me click on save and finish let me go in the data preview and try to refresh it so here we will be going to have an additional column and that column is having true or false if error record then that is true if not then that will be false so here as we could see this error records here it is false false false here we can see true because the ID 456 all three are having the error records so we can see the true similarly for ID 9 as well but this is not going to fulfill our complete requirement because here we are going to use this error rows to filter out the data for the error records as well as for the correct records but we want to keep the description as well like on which column we have the problem we have to use the another function so let me go here in the derived column and let me try to add one more column on this column settings by clicking on this plus symbol with add column and this let me call this as error description here we have to use has error function and we need to specify the string so what will be the string value here let me cancel this and go back to the sr transformation so under this sr transformation we have seen like we have created these three values so the description we have to specify the sr id so these sr id is going to indicate whether the problem is on that column or not so for example let me try to copy this s oid from here go to the derived column go to the expression here and under the expression let me use the has error and here we have to specify the s oid whenever we will be going to have the problem with this assert then that record will be true and other records will be false let me save and finish and we will try to verify the same so now we will be going to have two additional columns one for the error rows and second for the error on the s oid so here we can see the error description so we don't have the problem with the s oid on these three columns so that's why we can see the false on the error description here we have the problem on the s oid so that's why we can see true here so we need to write some expression to modify this true value with the proper column name so how we can do that let me go in the derived column setting go to the expression again so here we have to check the multiple condition so I am going to use the case statement here so we can write the case if this value is equals to true then the problem which is on the s oid column we can close the bracket so what it is indicating it is indicating if the problem with the s oid then it will written s oid on that error description column similarly let me try to check for the another assert so if the problem with the s oid then we can check if the value is true then we can call this as s oid so it will have the problem with the s oid column here we can put the comma because we are specifying the another condition similarly let me add the last condition and this last condition is on the customer so we can call this as cussed now we need to check whether this value is true or not if this value is true then we can call the problem with the customer id column so this is we have to specify on the error description so what it is doing it is checking if the assert value of this s oid is false then that will be the correct record if the assert value of this s oid is true then we have the problem with the s oid similarly for s oid and the customer let me try to save and finish let me go on the data preview and try to refresh so we can verify the data so here we will see the column value will be updated from true or false to the proper column names which is causing the problem so here as we can see so the customer id is showing here and the s oid is showing here but here the problem like on the date it is not showing properly because if you can go in the asset transformation again here we can see the s o date let me copy this so this is the case sensitive so we have to use as it is as we have used inside the asset transformation so if you can see we have used s o date but d is not in capital so that is why it was not reflecting there let me go in the data preview and try to refresh so we can verify the data there so here as we can see the error description is updated properly now we want to load the correct data into the proper destination sales table and error records in the error table so how we can do that so first we have to do the filter for the correct record and incorrect records so let me try to add the filter transformation here and the filter transformation we have already discussed so if you haven't watched that video so I will try to provide the link of that video in the description of that in this video so you can watch from there so here we have to check if the column which we have created the error description one or the error row if the error row value is false then we want to load all the data into the correct destination so we can also call this like correct data now here let me try to add the sync so this sync will be the sequence of a table as we have already created so let me try to create a new data set here let me search for azure sql database let me click on continue link service we have already created and let me call this as sales for azure sql here we can select this sales table now let me click on okay so we have selected and created the data set but we need to go in the mapping instead of the auto mapping let me disable that here so id second will be the so date then we have the customer id so we can select the customer id from here next we have the quantity so quantity should be here so we can select the quantity similarly we have value so value that should be coming from here in the source so we can select that so everything is done now we can go into the data preview of this so here we have some problem with this correct data because it is not expecting the space in between correct and data let me go in the sync and try to refresh it so here we should be able to see the correct data only because we are trying to load that data so here as we could see we are having only these five rows because these are the correct data next how we can use the incorrect data to upload into the table so for that after this derived column we are required to use the new branch so we can click on this plus and new branch transformation is here so it is going to create the duplicate copy of this derived column transformation after that we can add the filter again so this time your filter will be like error data that we have created column so here we can see the false so let me go over here here we need to add the condition so here let me try to use the column which we have created the error row if this error row value is true let me click save and finish so those records will be filtered by using this filter one and let me click on this plus symbol and it will be loading into the destination as well so here in the destination let me create the dataset for the same and the dataset that should be as your SQL database we can click on continue let me go here and here let me select the table for the error row and here let me call this dataset as sales error let me click on okay so this dataset will be created now we can go into the mapping instead of auto mapping we want to map this manually so first is the SOID second is the SO date third is the customer ID fourth is the quantity fifth is the value and the last which is the error column so let me use the error column that we have created like error description so everything is matched here now let me go into the data preview and try to refresh it so this time we should be able to see only the error records with the proper error description so as here we could see it is having all the data like this and here we can see the error records on these columns as well now let me try to publish all these which we have done here and let me go into the pipeline and we'll try to add a new pipeline to execute this so let me try to use the data flow transformation to execute the data flow that we have created so here let me go on the setting and I have to select the data flow one now we can try to debug it so once it will be executed your destination table will have the data for the sales and error records so it got executed successfully let me go into the ssms and here let me try to check whether data is inserted there or not let me try to click on this execute so here we can see under the sales data we have total five rows those are having the correct data and under this we are having again the five rows those are having the incorrect data like first two rows is having the problem with the sales order id here next two rows we can see the problem with the sales order date and the last which is the customer id so this is having the proper error description here so we can identify and we'll try to correct it and we'll be loading again so thank you so much for watching this video see you in the next video