 Hello and welcome to this next lecture on data structures and algorithms. Starting today, we will discuss an interesting problem called string matching. And the problem of string matching is about finding a textual pattern let us say call it P in a set of strings s1, s2 and so on. You come across this very often in textual editors such as Wim or G edit or you have this Unix command grep. This pattern P could be any regular expression. So, we will discuss both extremely naive algorithms as well as efficient algorithms for such pattern matching. So, formally the problem is to find all the valid shifts with which a given pattern P occurs in a given text T. More generally the text T could be a set of strings as we pointed out in the previous slide. And here is an example you are provided the string s and below is the pattern P. So, you want to find instances of S T y in string s. You start from the left most end s equal to 0. You find that there is a mismatch at the third position though there is a match in the first two positions. You can then think of moving the pattern to the right scanning at the second character position of the string. You find a mismatch right away further down you again find a mismatch at E and then at P however at position 5 which corresponds to index 4 of the string. You find a match at all the three positions S, T and Y. This basically results in one pattern match. You can continue this process if you are interested in finding all the pattern matches. You might stop here if you are interested only in finding a single pattern match. So, assuming that you are interested in all pattern matching you look at index 5 find a mismatch right away index 6 there is a mismatch. Now, do you need to proceed well does not make sense because you certainly cannot let the index overflow n minus m or the index position corresponding to the number of positions in the string minus the number of positions in the pattern. So, this is the nice string matching algorithm that we just illustrated given a text T of size n pattern P of size n. You keep track of the shift index S at every point of time. So, the shift index S is what you preserve into the text T. The pattern P at this instance of time will be scanned starting from this position S. So, for S ranging from 0 to n minus n and we are interested in matching the entire pattern. So, S will be restricted to number of positions in the text T minus the number of positions in the pattern P. So, for this range of S you are going to keep track of the position J the position J is basically the index into P. So, you vary J over the positions in P and do it till J is less than m. So, J starts for 0 and maximum value of m minus 1. You are going to match P S plus J with P J at every point of time and in case there is a mismatch you terminate. So, continue scanning P as long as there is no mismatch with T. So, once you are at the end of P you flag a match print valid at shift S that is what it means to have a match and we are interested in all matches and because of that we continue the scan. If we were not interested in all the matches we could break right after printing the valid statement. Analysis of this naive algorithm the inner loop is going to be called at max m times order m times this is assuming that you have a match at every position. So, the worst case situation is as follows T is a a a a and P is say a a m is 2 and is 6. So, you are going to scan all the positions of P for every position and T because match occurs at all or n minus m plus 1 positions. So, this will result in C 1 times m in the worst case for each position S and overall you will have to do this for every position in T which is n minus m plus m. So, overall there are n minus m plus 1 into C 2 calls order and this results in n minus m plus 1 times m overall complexity. What we will see in some of the subsequent discussions are more intelligent algorithms which avoid this brute force computation by explicitly caching or memorizing some of the computations that were already done and reusing them. Some of the algorithms such as dynamic programming are in some sense brute force because they do consider all possible configurations, but they leverage previous computations in order to avoid redundant or repetitive computations. We will see some of that flavor in the algorithms that follow. Thank you.