 Hello everyone, today we are going to see approximate string matching. So, what do you mean by approximate string matching? So, while typing sometimes we make some spelling mistakes ok. So, still we can able to retrieve the documents based on that corrected word. So, how many errors are allowed based on that that particular word will be get corrected. So, learning outcome for this session is students will be able to find the number of errors or edit distance between text and pattern. Let us understand first approximate string matching. So, this problem statement can be stated as you have given a short pattern p of length m and along text t of length n and the maximum allowed number of errors are k. So, what we have to do is that we have to find all the text positions where the pattern occurs with at most k errors. Now, this difference or the distance is generally called as an Levenson-tense distance or edit distance. So, with minimal modifications it is adapted to search whole words matching pattern with k errors. So, let us see here example flower and flower. So, by mistake instead of e there is a. So, if allowed distance or the allowed error is 1 still this flower f l o w a r will get accepted if there is a flower or we are typing Sunday and Monday ok. So, if there are difference between the Sunday and Monday is 2 because s and u m and o rest of the things are common over here. So, how much distance or how many errors are allowed accordingly we are going to find out the positions in the text. So, there are many algorithms available out of that one approach is dynamic programming. So, let us find out the edit distance between the two what you can say that is strings. So, one can be the text and one can be the pattern and see that whether the string will or the pattern will be found in the text or not considering that k errors. So, we are going to fill the matrix c of 0 to m and 0 to n where m is the length of the pattern and n is the length of the text. So, c of i j will be representing the minimum number of errors needed to match pattern with the text sorry. So, first how will we fill the 0th row and 0th column? It will be filled with a j index and i index for this row and column and what the remaining columns and rows or the i j entry if the character that we are finding that is the ith character in pattern and jth character in text if it is same means that consider this example here now this n and this n is same. Then what will be the difference here which is the previous one as it is that we are going to carry forward. So, of course, we are going to set in example if it is same then we are going to take the distance from the diagonal entry. Why diagonal entry means that whatever the previous characters of the pattern and the text that we have matched that distance is going to be carried forward since the character is same. If it is not same obviously the distance is going to be increased or the number of errors is going to be increased. So, that will be increased by one since there is one character mismatch to which value. So, we are going to find the minimum value vertically horizontally and diagonally of the position of i and j and whichever is minimum that we are going to take and we are going to add it to one. So, when we are going to report a match if the match will be reported at the text position of j such that c of m comma j is less than or equal to k. Let us see it with example. So, this is the text that is surgery and our pattern is survey. So, let us see that whether the survey will be get accepted in the text of surgery. So, this is the 0th row where we are having the entry same as j. So, 0 to 7 and then this is the 0th column where we are going to have the entry from 0 to 6. Why this is 6? Because the number of characters are 6. So, if you are not matching any character since we are having all the 6 characters it the distance will be here 6 here 5, 4, 3, 2, 1 and so on. So, let us see with this now the first position of i and j that is 1, 1. So, what will be the distance here? Now, which character we are matching here this is the character is s and this is also s. So, if the character is matching what we have seen the formula whatever the diagonal entry distance that will be carry forward. So, this 0 will be taken here. So, s and s is matching. So, distance will be obviously 0. This is how the distance will be 0. Now, let us see for the next position. So, what is the character here? It is u and what is the character here? It is s. So, s I am sorry s and u are not matching. So, what we have to do is that we have to find a vertical horizontal and diagonal distance out of that which is minimum that we have to take. So, 0 is the minimum. So, we are going to add 1. So, we have got the distance as 1. So, it is quite obvious that what is the string here? s u what is the string here up to this it is only s. So, how much will be the distance in between s u and s? It is going to be 1 character because this s and this is matched. So, remaining character is u. So, either u will be inserted, deleted or removed. So, that is what the distance is going to be 1. So, let us see for the next at this position this is r and this is s. Again it is not matching. So, find out the minimum distance that is 1. So, 1 plus 1 is going to be 2. So, now this s is not at all matching with this any character here that is why we are going to increase it the distance by 1. So, finding the minimum distance. So, minimum distance is 2. So, it will be 3, here it will be 4, here it will be 5 and it will be 6. I hope you have understood this. Now, what will be the distance at this position? Now, the character is s, character is u. So, again this is s u now and this is s. So, again the character is not matched. So, minimum distance at this 3 is 0. So, 0 plus 1 will be u. So, let us pause the video and find out the distance for this particular position. So, here now which is the character? Character is u, here is also u. It is the matching character ok. So, what we are going to do is that we are going to take the diagonal entry. So, this distance will be 0. So, of course, this is s u and this is also s u that is why the distance has become 0. So, this now u is not matching with any of the entry. So, it is going to be increased by 1. So, at this position it will be 1. So, 0 plus 1 at this position minimum is now 1 out of this 3. So, it will be 2. So, this is going to be 3, this is going to be 4 and then it will be 5. So, I hope you are getting how we are finding the distance. Now, what about this? Now, this is s, this is r, it is also not matching. So, minimum distance is 1. So, 1 plus 1 it is going to be 2. Now, this is r, this is u again not matching. So, minimum is now here 0. So, 0 plus 1 it is going to be 1. So, s u and s u r. So, only one character difference is there that is why it will be 1. So, what about this position now? This is also r, this is also r. So, this is going to be diagonal entry. So, s u r, s u r string is the same. That is why the distance will be here 0. Next is going to be here g. So, now there is g and r, it is also not matching. So, minimum 0 plus 1 it will be 1. For this it will be 1 plus 1 that is 2. Now, this is again this is going to be your r. So, this is going to be surgery and this is going to be r. So, the character is matched. So, how much what should be the distance it has been taken from the diagonal that is going to be here 3, ok. This is how it works and then the last is going to be y. So, this is not matching. So, 3 plus 1 is going to be 4. Let us go for the next. So, this is going to be s and v, this is also not matching. So, the distance is going to be 3, 2 plus 1, 1 plus 2, then again r v not matching. So, 0 plus 1, then g and r is not matching. So, again 0 plus 1. So, in this e and v is not matching. So, 1 plus 1, 2, then this is going to be 2 plus 1, 3 and then again 3 plus 1, 4. So, next entry is going to be here e. So, e and s is not matching. So, minimum 3 plus 1, 4 u and e is not matching. So, 2 plus 1, 3 r and e is also not matching. So, 1 plus 1, 2. Then, the next will be here again g and e is not matching. So, 1 plus 1, 2. Now, E and E is matching, so we have to take the diagonal distance, so it will be 1. After that R and E is also not matching, so the distance will be here 1 plus 1, 2 and this will be 3. At the end now, this is going to be S, so Y and S is not matching or S and Y is not matching, so 4 plus 1, 5, 3 plus 1, 4. Now R and this 2 plus 1, 3. Now again 2 plus 1, 3. Now E and Y, so 1 plus 1, 2. R and Y is also not matching, so 1 plus 1, 2. And now this Y and this Y is matching, so we are going to take the diagonal distance, so this will be 2. So, whether how the survey will be accepted in the text of surgery, it will be accepted if the maximum allowed errors are more than 2. If we are allowing only one error, then this survey will not be accepted because the number of errors at this position is 2. So, this is how we are finding the edit distance between two strings. So, if the character is matching, we have to take the diagonal distance. Why diagonal distance? If it is we are already compared with this particular part. So, let us look at this here. Up to this E and up to this R, so whatever this search, surger and survey, we have already found that there are 2 errors. And now at the next time, if the character is matching means that whatever the previous distance of this text that we have to carry forward and that is what the first formula. So, in this way you can find out the number of errors in between two strings or you can go for the approximate matching. Thank you.