 Hello, welcome to SSRetek, so we will decide and today we are going to see about the slowly changing dimension type 2. So if you haven't watched the last video of this video series, so I strongly recommend to watch that video where we have discussed what is slowly changing dimension and what is slowly changing dimension type 1. So let's get started with slowly changing dimension type 2. So what is SCD type 2? So a type 2 SCD retains the full history of values. When the value of a chosen attribute changes, the current record is closed, a new record is created with the changed data value and this new record become the current record. Each record contains the effective time and expiration time to identify the time period between which the record was active. So basically what we can say here, for example, we are having a source and that source is having certain columns. In the sync site, we can maintain the SCD with the historical data with two ways. One we can add two extra columns and that extra columns like effective from and effective to. So the effective from and effective to will be indicating like which record is currently active and which record is not active. So for example, if your effective 2 is null, then those records will be active and if your effective 2 value is not null, then those records is inactive. In the second scenario, we can add only a single column and that column is the integer type of column and that will be holding the boolean value with true and false or the 0 and 1. So 1 will be indicating as active record and 0 will be indicating as inactive record. So in my scenario, I am going to choose the second option by which we will be keeping only one column with the active and inactive record. Let me quickly go and try to see the source and sync by which we will be implementing this SCD. So the source file which is available under the blob storage of the input folder and the file which is containing the item information in the item details.csfile. So here it is having total 6 columns with the item ID, variant ID, item name, variant name, warehouse information. So we can see item ID and the variant. So these two will be having the composite primary key. So go to the SQL side and under the SQL here we can see this SCD type 2. So this is the table by which we want to keep the data. So here let me try to remove the data here first. So everything we can see all the columns like 6 columns we have seen in the source file. So here we have 7 columns. The last column which is indicating the is active. So if we are having 1, so those records are active. If that will be 0, then those will be inactive record. So how we can implement to load the data into this? So for that, let me quickly go into the SEO data factory. And here let me try to create a new Dataflow. So we can go into Dataflow and here let me try to add a new Dataflow. So let me call this as SCD type 2. Now here we need to add the source. So our source is the item details.csfile in the blob storage. So we have already created the dataset for the same. So let me try to choose that. So here let me try to open. So we can also verify the same like the file path is the input folder of the item details.csfile. So it is indicating the right file. Here next what we need to do, we need to add the sync location table as well. So this is the table. So we need to add one more source here and this source will be treat as lookup. So let me call this as source lookup. So this is the SQL table and we have already created the dataset for this as well. Here we cannot use the underscore. So let me remove the underscore and here let me try to select the dataset. So this is the dataset we have already created, which is pointing the same table. So let me go into the data preview and try to refresh. So it will be having all the columns those we have seen into the SQL server table. So here as we can see as of now we don't have any data, but we can see the schema. Here we can see the column names are same as we have seen inside the file. So let me try to use the select transformation to just modify the names of the column. So here we can see the input column and the name as let me add the lookup here. So it will be going to treat as a lookup. So that's why I am going to add the lookup at the last of the column names. So for all these columns, I am adding the lookup. So that's it. Now next what we need to do we need to use the lookup transformation and we will check whether the data is available into this destination or not. If data is available there, then we need to check if any changes on that data or not. If changes is also there, then we need to update. If changes is not there, then we can skip those rows. And if data is not available into the sync, then we just want to insert those. So for that first, we need to use the lookup transformation. So let me try to use the lookup transformation here. And here we can see the primary string that is source 1 and secondary for the source lookup. So this is the secondary scheme. Now here we can see the lookup condition. So lookup conditions should be based on two columns because here we have the composite primary key on the item ID and the variant ID. So let me quickly select this item ID and variant ID. But here we can see the item ID which we are getting from the source which is the text value. So we don't want to keep that as a string. So let me go into the projection and try to update the schema from string to integer because actually it is holding the integer. So now we can go and we can check here. So this is pointing as in integer. Second we can add another column by clicking on this plus symbol. And here let me add the variant ID should be equals to the variant ID. So the lookup should be based on these two. Let me go into the data preview and try to refresh. So we will be going to see the data. So from the source it will be having all the data like all the nine rows. And in the sync we don't have anything as of now that we can see the lookup table. So null is there. So what next we need to do we need to use the filter option here. So what that filter will do that filter will be going to filter out the rows if those are not available into the destination. So here we can go and try to use the filter transformation. And this filter transformation I am going to use for the insert. So simply go and try to call this as insert and here we need to write the filter on. So we need to specify the condition so open the expression builder here. So under this expression builder what we need to do first we need to check your lookup source ID of the should be blank or null. So first condition it should be null of your lookup table ID. So here we can scroll in the bottom side and we should see source lookup as item ID either this condition will true or if your source item name variant name zone warehouse bin. If anything is changed then we also want to insert. So we need to add those condition as well. So here let me quickly add the item name from the source one that we can see that should not be equals to the item name from the source lookup. Similarly we need to check for the variant so variant name from the source should not be equals to the variant name from the lookup. Next we need to add and need to check for the warehouse. So warehouse from the source should not be equals to warehouse from the lookup. Now let me add another condition and this condition for zone. So basically we are going to check for all the columns those are available in the table. So here we can see the zone from this. Now let me try to add or condition and this or condition for the bin so that is the last column. So bin from the source should not be equals to bin from the lookup. So here we have done with the expression. Now let me try to click save and finish. So we are configuring this for the insert so either value is not available so that is checking in the first condition. If any value is changed from source to the lookup so we need to check in below conditions. Once we have done this then we can go into the data preview and try to check here. So it will be going to have the data as of now it will be having all the nine rows here so we can check like because in the destination we don't have any data. Now let me quickly try to add a derived column transformation here and this derived column transformation let me call this as is active and this value should be one so we can mark this as one. Now we can try to refresh so it will be added one more column here. So we can wait go in the last so we can see this is active. Now let me try to select only those columns those we want to keep into the destination side. So lookup columns are marked here so we can remove all the lookup columns from here so we can see and we can delete all these lookup columns from this so we can go and try to delete this. So go to the data preview and try to refresh so we should be able to see the data here. So here we can see all the records so we simply want to insert all these records so we can directly add the sync part here. So let me try to add the sync and in this sync we have already created the dataset so we can select that one go to the mapping and we can check all the mapping succeed so everything is okay. So we have done for the insertion part we need to update for the is active value that is already available in the table so we are marking that value as an expire. So in the lookup we need to add a new branch here and this new branch will be having the same whatever we have done before that. So under this we have all the column next we need to filter out for the updated record where we want to do the update. So let me try to use the filter so here is the filter and let me call this filter value because we are going to update in this case. In the filter option what expression we need to do so here first we need to check the value that should be available into the sync location. So before that we have checked that was not there so we have used the is none so here I am also going to use the is none and that should not be available into the lookup. So if you are going to use the not equals to before that then it is indicating your source value of the item should be available into the destination so that should not be not null it means it should having the data so this is the first case second case we need to check it should have the updated record either all these columns so either in the item name should not be equals to the item name from the lookup so item name from the lookup so this is the one condition similarly let me check for the another condition let me put or condition here and the item name then we have the variant name from the source should not be equals to the variant name from the lookup or warehouse will be the next one so warehouse from the source should not be equals to warehouse from the sync location or we need to check for the zone so zone from the source should not be equals to zone from the lookup let me check for the bin as well so bin of the source that should not be equals to bin from the lookup or the destination table let me close this record so what it is doing let me recall this so first it is checking your data should be available into the sync location second if there any changes on the item name or the variant name or the warehouse name or the zone or the bin so all these columns are here if any changes are there and the data is already there into the destination site only then we want to filter those records so we have done this year next we want to update those records so we should be going to add a new column so need to use the derived column transformation here and let me add a new column with the is active with the value of zero so we want to update the is active value as zero next let me try to filter out only for the required columns we don't want to keep all these columns here so this time we are required to have the lookup column values we are not required to have the source column values because we will be going to update the data as per the lookup column those are the destination so we will be going to filter based on those column values so let me try to remove the source from here like those columns which we are getting from the source is not required in this case so this point you have to remember now here we have done everything now let me quickly go here and try to add the ultra row transformation because we want to update into the destination so we are required to use the ultra row transformation and what action we want to perform we want to update and we want to update all the records so we can use the one equals to equals to one now here we need to add the sync so under this sync we need to set up our data set so let me try to select the data set which we have already created let me go to the setting here we want to do the update we don't want to do the absurd or insert or delete we want to do the update so here we can see the key columns so key columns will be indicating and like it will be joining columns we can say so the item id next we can see the variant id next we can see the item name so all these columns we can match so if this is matching only then we want to update so let me try to select all these so here we cannot do the join on the basis of only item id and item variant id so we cannot do on that basis so why we cannot do that because if you are going to do on that basis then we are already inserting the data on this flow so that will be going to update as well like the is active value as 0 so after doing all these let me go into the data preview it is showing an error because of mapping for sure so here in the mapping let me go and disable this auto mapping so here we can see all these and here we need to select this from is active which we have created in the derived column so this is the same now we can go in the data preview and try to refresh so this is the whole thing that we need to do for implementing the slowly changing dimension so you have to remember this and if you have any doubt then you should be going to watch this video again let me try to save this and after that let me go and try to add a new pipeline here and in this pipeline let me try to execute this scd type 2 now let me try to execute it so in the first run it will be loading all the data into the destination side so we need to wait until this will not be executed successfully so as we can see this is done now go back to the ssms and try to refresh by pressing ctrl r so here we should be able to see all the data with the active records that we can see now here we can see the item id 2 and variant id 2 so I am going to update this warehouse from 001 to 009 now what it will do this time once we will our pipeline so this record will be inserted like this will be inserted again with the active and this will be expired with the is active as 0 let me try to execute this pipeline and I will show you that so here it got executed with success let me go to the ssms and try to refresh by pressing ctrl r so as I told you item id 2 and variant id 2 will be having another record so that we can see item id 2 and variant id 2 is having another record here with the active one and we can see it got marked as so we have successfully loaded the record here but we can see the duplicate records as well so we did one mistake let me go into the data flow and here into the lookup source we are fetching all the data that is already available into the table either that is active or inactive record so what we need to do we need to go into the source option of this and instead of getting all the data we should be getting only the active records so we need to use this query here now let me try to save this and as we can see into the table we have total 11 records so after executing this pipeline we should not be able to see any new records so this mistake I did so make sure while you are going to implement the CD you should be keeping only the active records by using the query as we have did over here so pipeline is in progress so we need to wait so it got executed with success let me quickly go into the ssms and try to refresh by pressing control r so here we should not able to see the new records as we can see only 11 records so this is you need to remember now let me quickly recap what we have done so first we have added the source which is the file and we have added the lookup which is the destination and after that we are doing the lookup for adding the data there then after we have filter out for the insertion and the updation case so in case of insertion first we are checking the data should not be available into the destination either or if any changes are happening into the other columns except the key columns so we are checking those columns here then we have added the derived column and where is active value is one then selecting only the required columns and dumping that into the destination similarly in case of update here first we are checking the data should be available into the destination and if any changes in the other columns then we are going to update then here we have added for the is active value as 0 and here we have added this for getting only the required columns then we have used this alter root transformation for the updating of the records then we have dumping that into the sync that is our sequence over table so this is what we have done for implementing the sd so thank you so much for watching this video i hope you have enjoyed this session see you in the next video