 Hello, welcome to SSUnitech. So, see this site and this is continuation of SSIS tutorial. So, today we gonna discuss about fuzzy grouping transformation. So, before going forward, if you haven't watched part 13 and 30 of this video series. So, I would strongly recommend to watch that video where you can understand about the conditional split and fuzzy lookup. So, let's start with fuzzy grouping. So, what is fuzzy grouping? So, it is also used to replace wrongly typed words with the correct words, like we have done in fuzzy lookup transformation. For example, if 90% records are correct data and 10% records typed incorrect, then it would be treated same as 90% records. Next is, it uses grouping technique to replace the wrongly typed words in data to correct data. Next is, it allows us to use only DT underscore WSTR or DT underscore STR data types columns in fuzzy matching. And in exact matching can be used any data types except DT underscore WSTR or DT underscore STR. So, this is our scenario. Let's assume we have a flat file which we have used in our last video when we were discussed about fuzzy lookup transformation. So, that flat file contains the information of employees that has three columns. First is first name, last name and designation, which is titles. So, in our SQL server, we have a table that contains the information of titles. So, we want to load only correct titles and we don't want to load incorrect titles, which is wrongly typed. So, as we can see, we have to use a flat file source, then we need to use fuzzy grouping transformation, then we will use conditional split and then OLA DB destination to load the data. So, let's have a look of flat file source. So, go to one folder. Then we can open this file. As we can see, it has last name, first name titles. In titles, we can see customer service representative. In our second row, we can see customer service rep dot. So, this is our incorrect title. While we are going to group between titles, then our second title will be treated as our first title. Let's have a look of our destination. So, this is our destination table and we can see it has only two columns. So, first is ID and second is occupation description. So, in ID, we can see it has identity column. So, go to an SSDT to implement the package. So, here we need to use one more package. So, right click on this SSIS package, then new SSIS package. Then I want to rename this package with fuzzy grouping. So, this is fuzzy grouping. Then we have to use dataflow task. So, we can drag and drop this dataflow task. After that, we need to double click on this. So, inside dataflow panel, we have to use our source, which is our flat file source. So, we can drag and drop flat file source. Then we need to double click on this flat file source. Then it would ask for flat file connection manager. So, click on new. Then we can browse and this is our file. So, we can select this file and then click on open. Go to on columns. So, we are getting last name, first name and titles. Now, move to advanced tab. So, in this advanced tab, we have titles. So, I gonna change the size of this title with 255. Then click on OK. Now, go to on columns. So, we want all the column. Then click on OK. So, we have done our source. Now, we need to configure our fuzzy grouping transformation. So, go to on other transformations. Then we need to drag and drop. Then we need to connect with source. Once we connect with source, then double click on this fuzzy grouping transformation. So, here we need to select OLEDB connection manager. So, I gonna click on new. As we have already made the connection. So, I gonna use that one. Click on OK. Go to on columns. So, here we wanna do grouping on this titles. So, we can select this checkbox. Then here we can see we have many options. First is input column names which is titles. Second is output alias name. I wanna use as titles. Then this is our group output alias and this is very important. So, this column will be loaded in our database. Next is match type. In match type as we can see we have exact match and fuzzy match. So, I wanna use fuzzy match. So, here we have comparison flags. So, it has many more options. So, what it mean? So, our first option is ignore flag. If we select this option, then fuzzy grouping will ignore our case. So, here as we can see we have capital letters and in our small letters. So, it would be treated as same values. Next is ignore kana types. If we select this option, then it will ignore the difference between Hiragana and Katakana. So, these are the language specific. And our next is ignore non-spacing characters. If we select this option, then fuzzy grouping will ignore the difference between diachritics and characters. Our next option is ignore character width. If we select this option, then it will ignore the differences between single byte character and double byte character. And our next is ignore symbols. So, if we select this option, then it will ignore the difference between letters and symbols. So, these are the comparison flags. So, here we can see we have numerals. So, what is numerals? So, basically we have to specify the significance of starting and ending numerals while comparing data. So, it has four options. First is neither. Second is leading, then trailing and leading and trailing. So, first is neither. So, numerals of the starting and ending position will be ignored while grouping. Next is leading. So, numerals at the starting position will be ignored while grouping. Third is trailing. So, numerals at the ending position will be ignored while grouping. And our last is leading and trailing. So, numerals at the starting and ending position will be ignored while the grouping. So, go to SSDT and I am going to use neither and move to our advanced tab. So, in our advanced tab, as we can see, we have input key column, which is our system defined formula. Second is output key column name, which is underscore key underscore out. Then similarity score column name, which is underscore score. Then we have similarity threshold value. And this is very important. So, here we need to take care. Here we have a slider. So, we can adjust the value between 0 to 1. So, I want to use 0.8 at this time. And then we have token delimiters. So, I want to use space and dot. So, in our additional delimiters, we can specify dot. Now, click on OK. So, now we need to use conditional split because here we can see in advanced tab, if the value of underscore key underscore in is equals to underscore key underscore out, that would be our perfect titles. So, we want to load only that data in our database. So, we can use conditional split. That is available inside this common. So, we can drag and drop conditional split. And then we need to connect with fuzzy grouping. So, here double click on this conditional split. And then I want to specify underscore key underscore in should be equals to underscore key underscore out. When this value will be matched, then that would be our matched output. So, we can put an output name with matched and you can specify any output name as you wish. So, click on OK. Now, I want to load data in our destination. So, we have to use OLEDB destination. So, we can drag and drop OLEDB destination, then connect with conditional split. So, here we want to load data for matched output. So, click on OK. Now, we need to double click on this OLEDB destination. Here we need to specify the table. So, our table is fuzzy grouping occupation. Then go to on mapping. So, in mapping as we can see, we have many columns in our available input. And in our destination, we want to load only titles clean column. So, we can map with destination. So, there is no need to specify any value for ID because ID is identity column. So, values will be loaded at the runtime. Now, click on OK. Now, we can save this package and then click on start. So, as we can see, we are getting 21 rows, then fuzzy grouping will perform grouping on this 21 rows and provide the output. By using conditional split, we have divided input data. And in our matched output, we are getting 9 rows. So, we can stop this package, go to on SSMS, then we can execute our selected statement. So, here we are getting the duplicate data. As we can see store manager, store manager, assistant manager, assistant manager, and then customer service representative, customer service representative. So, we are not getting data as we were expected. So, go to on SSDT. So, we had specified wrong threshold value. So, I am going to change this value by 50 this time. Then click on OK. Now, go to on SSMS. I am going to truncate this table. So, truncate table. We can execute. Now, we can check. We don't have any data right now. So, I am going to run this package. Click on start. So, as we can see, we are getting 21 rows from source and in our destination, we had loaded only 6 rows. So, go to on SSMS, then we need to execute our selected statement. So, as we can see, we are getting the data as we were expected. So, we are getting distinct titles. So, this is our actual value for threshold. So, thank you so much for watching this video. If you like this video, please subscribe our channel to get many more videos and don't forget to press the bell icon to get the notification of our newly uploaded videos. Thank you so much.