 Let us discuss some decision problems concerning regular language. What are decision problems? Decision problems are those computational problems, output is yes or no. You know many decision problems for example, given a graph is the graph connected. So, in decision problem there will be some input and output will be either yes or no. That is in that sense it is a decision problem we need to decide something. And in these problems today which we will look at the decision problems today the input is going to be a regular language. And we may ask certain questions for example, given a regular language does it satisfy certain property. Then the answer is yes or no that these are the kinds of questions that we will discuss. In fact, the most important decision problem concerning regular languages is that you are given two things a regular language L and a string W. And you are supposed to decide this question is the string W in the language. The answer is either yes or no. So, this is called language membership problem. We are asking whether the string W is a member of the language and that is what we need to decide. Now, our algorithm of course will depend on how we represent the regular language. And now you know regular languages can be represented can be given in several ways either in terms of a DFA or an NFA or an NFA with epsilon transitions or through a regular expression. So, therefore, we will have different algorithms depending on the way we would present the regular language. So, the first case is the simplest case. Let us take the simplest case that L is represented by a DFA say M. In other words what we are saying is that this language L is the language accepted by the DFA M. And now the problem boils down to your input is M and W. And you would like to know is W accepted by the DFA M. Let me say M is here. This algorithm is rather straight forward. You can see that what we can do our algorithm can just simulate the actions of a DFA. And what is that? It looks at the string W symbol by symbol starting from the leftmost symbol and it keeps track of the state in which the DFA is after scanning some prefix of the input W. And supposing at some point the DFA is in M as some state Q and the symbol next symbol in the input is small a. So, at some point of time the state is Q, the symbol is a and I know the next state is going to be whatever is b according to the transition function of the DFA. So, this part of the input is consumed the symbol of the input is consumed. Next state is p and this way we carry out the simulation till the string is over and at that time if the state is one of the final states of the machine M then the string is accepted. We give the answer yes it is in the language otherwise we say the string is not in the language and therefore the answer is no. It is therefore quite straight forward the algorithm for membership decision problem when the language is given as a DFA. What about if it is an NFA and here when I say NFA I do not let us say the NFAM does not have excellent transition. What you could do is first of all convert that NFA to an equivalent DFA and then use our old algorithm. But the problem with this is we know that if this NFA has n states the description of the NFA if it is part of the input then this n is a parameter is a size parameter is one of the parameters which determines the input size. Then the DFA can be can have two to the power n states. If we know that that it is possible for some NFAs that the corresponding DFA equivalent DFA will have exponentially many states. So in that case the algorithm just in converting the NFA to DFA is going to take exponential amount of time and the algorithm is not going to be efficient. But there is a way out you can have actually a polynomial time algorithm. But we do not in that case we do not convert the NFA to an equivalent DFA. Instead what we do is strategy that we use that we keep track of the set of states. The NFA can be in after reading some portion of the input. So in other words we keep track of the set of states m can be in. It means m is in some computational path of the other m is in this state and the computational path of the other m is in that state. We collect all those we keep track of this set of states as we scan the input string w symbol by symbol. Now let us see what it means. We understand because we have seen many NFAs the idea of acceptance and all this what are the set of states the machine can be in. So initially of course the if the machine is in its initial state. So this is the set of states the machine can be initially and suppose w is of the form a 1 a 2 a n. So w is an w is a string with n symbol. So right now the machine is in the state the set of states it is in q 0 and then when a comes we know which all states it can be after a 1. It is precisely the set of states given by the transition function delta q 0 a that will provide me with a set of states. These are the set of states the machine can be m can be here. Then a 2 will come and suppose this set of states is let me just give an example here p 1 p 2. Supposing these are the two states that results. So I see on a 2 from p 1 which all states the machine can be in again that is just consulting the transition function and similarly from p 2 on a 2 which all states the machine can be we take the union that is the set of states the machine can be after c a 1 and a 2 and this way you go on. Only thing is that this union taking can be done fairly efficiently what you can and this is a simple data structure idea that we can keep a bit bit vector right to indicate which all states are there in that set. So now this one's processing of one symbol therefore how much time would it take in general let us say this is a i the set of states the machine can be is bounded by in size by the number of states in m the n f a m. So supposing that number is s right and from so the machine is in set of states that size can be at most s and we have to look at each state in that set and look at on the next input symbol which all states the machine can be. So s into s so we require time s square let me write time required is s square in processing one symbol of the so and there are n symbols in the input w so that our algorithm what I just outlined is going to be order n a square remember as m is part of the input we know the number of states so this is polynomial in the size of the input. So this is also a polynomial time output what if m is n f a with epsilon transition so here we will do exactly the same thing per symbol we have to keep track of which all states the machine can be but remember now after we get the set of states the machine can be on scanning a particular symbol then we need to take the epsilon closure of each such state and then take their unions. So strategy better for this when I have epsilon transitions is right in the beginning to compute the epsilon closure of each state or n f a with epsilon transitions step one compute epsilon closure of each state and then we do exactly same thing at any given time after processing a number of symbols in the input I have a set of states the machine can be then the next symbol we take care of then we take the union of the whole idea that we take the union of the set of states for where the machine can be for each one of these and then in because of it is an n f a with epsilon transition then I take the epsilon closure of that set of states and that is the set of states the machine can be after scanning the next symbol and this way we go on right. So what is the time involved for step one we have again s states and to compute the epsilon closure which is basically recall which all states the mean the which are reachable from that state. So that means we only look at the transition diagram where the arts of the edges are labeled with epsilon. So it is really a reachability problem. So at most per state will require the size of the transition diagram which is s squared. So we need to do this for each state. So s cube this is the order but this is step one which is done only once step two you recall that we collect we are at any given time we have a set of states right and and and from each such that that set can be at most it can have at most s members for each I need to consult the transition diagram and then we take the union. So you know so that is again s square and then again to the same thing that is the old thing this part is same s square per symbol and then we need to take the epsilon closure of these but then now I have already pre done the epsilon closure of each state here. So it will be basically boiling down to taking union of certain states and that also can be done efficiently. So you can see overall it will be order s cube plus order n a square and so since n is usually length of the input is going to be larger than or or or put it this way this is the amount of time you are going to take is n a square plus s cube whatever is that whichever is bigger it does not matter. So this is roughly this is again this is therefore a polynomial time algorithm alright. So there is yet another representation for regular languages and that is regular expressions. We are discussing the case when l is represented by a regular expression say r and here what we have to do the simplest straight forward method would be first obtain an equivalent n f a m by equivalent of course you mean that the language represented by the regular expression or is the language accepted by the n f a m. But remember the standard conversion that we gave this n f a will be with epsilon transmission and in the second state decide if w the input string is in the language accepted by the n f a m we know how to do it. And so it all depends on what is the size of the m in terms of the regular expression r and this if you look at that construction once more it becomes clear that the size of m is at most twice the size of the regular expression r. So therefore this construction of course we do it in polynomial time we know that how to do that and the machine m is not too large it is at most in the size of the regular expression the number of states the machine m would have is twice the number of symbols in the regular expression and that is why we said in a way that so let us say in fact this correct statement could be the number of states number of states of m is at most twice the size of r. So in other words this step one can be done in polynomial time and step two of course can be done in the polynomial time polynomial in the size of w and the size of m. But since the size of m is also polynomial with respect to the size of r this entire process also will be efficient. So what we have discussed so far is that that the membership problem we can be conclusion is the language the regular language decision can be done efficiently because we have a polynomial time algorithm in every case for every representation or for the or let me say now every representation for representation in terms of DFA NFA NFA plus epsilon transition or regular expression thus this problem is at least for regular languages can be done efficiently later on in this course we will see not only we will not be able to do in general for certain classes of automata that this problem which now appeared so simple in case of regular languages that not only we may not have efficient algorithm we may not have any algorithm at all this for example when we talk of Turing machine as a the automata which will see later on our next problem for decision our next decision problem will be we can we can ask this question for example given a regular language in terms of course some representation either a DFA or NFA NFA with epsilon or a regular expression about the size of the language for example you can see is the language empty is the language infinite and so on at least these questions we can ask. So let us consider this problem given a regular language L L M T now as we understand the language will be given in terms of either one of the automata DFA NFA or NFA with epsilon transitions or as a regular expression. So first consider the case when the regular language L is given in terms of one of the one automata so L is represented by a finite automaton. Now whatever be that finite automaton whether it is deterministic non deterministic or with epsilon transition when can you see such an automaton think of this that I have this automaton and there is some initial I mean there is this initial state and the number of final shifts clearly this automaton accepts some string if and only if there is some path from the initial state to one of these any one of these one or more of these final states. So that is clear right because see remember that through the transition diagram think of the transition diagram as a graph where this is a particular vertex let us say the start vertex and this transition diagram graph is a directed graph but we can always use any of the graph search algorithm DFA depth first search or breadth first search to find out all the vertices in the graph which of course correspond to states reachable from this initial state this particular vertex. Now it is very easy to see that m accepts or let me write it this way the language accepted by m is empty if and only if no final state is reachable initial. So basically this decision whether the language is empty if you give me that language in terms of a finite automaton the problem is really doing a reachability analysis of a graph and that we know can be done quite efficiently right essentially it means at most I need to traverse each age of the graph once and since there if there are any states there can be you know quadratic many edges therefore this problem can be done efficiently because we are solving a reachability problem what about L is given as a regular language regular expression L is represented as a say R the question we are asking here the language denoted by R does it contain at least some stream that is L R is L R empty this is the question we are asking. Now we can use our old strategy given this regular expression we can efficiently convert it to an equivalent NFA with epsilon transition and then we can use the idea that we had just mentioned see whether one of the final states of the NFA in fact NFA with epsilon transition our construction in our construction there will be exactly one final accepting state that is how the construction when so we have to just see whether this final state is reachable from the initial state of the ultimate NFA with epsilon transition that we define for the regular expression. However it can be done in another way let us say one method was convert R to Q L NFA with epsilon transition M L M is there is another method where we do not convert the regular expression R to an equivalent NFA with epsilon transition and the reason for discussing that is just it gives us another intuition about regular expressions the algorithm is not any more efficient because anyway this algorithm is efficient enough and the second method goes like this see we we can recursively define or let me use it we can inductively define the set of all regular expressions denoting the empty language phi and by now we know that this kind of inductive definition will be in terms of the inductive definition of regular expressions. So, that inductive definition is this there are base cases there are three base cases epsilon phi and symbol a epsilon base case. So, let me write it as base case if your regular expression R is either phi or epsilon or a then you know that this one denotes empty language and other two are not empty right this is by definition and now we can see this that suppose our regular expression is R 1 plus R 2 then it is quite clear that R denotes an empty language. So, let me say it this way that L R is L R is empty if and only both L R 1 and L R 2 R. So, this is fairly easy to see what is L of the language denoted by R when R is R 1 plus R 2 clearly in that case L R is union of the two languages denoted and if either of this is non-empty then of course, L R is not going to be empty. So, L R is going to be empty if and only both of these are empty. So, that is clear so other two ways a bigger regular expression can be formed out of smaller regular expressions while the other way was concatenation R is R 1 R 2 and the last one is R is some R 1 star. Now, in this case so if R is R 1 R 2 so we are talking of concatenation then again it is fairly easy to see L R is empty if and only if any one of L R 1 or L R 2 is empty because if you concatenate the empty language with any language whatsoever the result is going to be empty language that we know. So, for this entire thing to be empty if either of this is empty then of course, the regular expression will denote the empty language on the other hand if none of them is empty then clearly if L R 1 and L R 2 both are not empty then clearly when we concatenate I will get some strings. So, L R is not going to be empty and finally, in this case we know that L R is not empty because L R will have at least epsilon therefore, you see all these three things together up in the after the base case what we have in effect is a recursive algorithm to decide whether a regular expression is empty or not. So, the input is a regular expression then you see whether it is of the form one of these forms or it is one of the base cases and then accordingly whichever is the form that it is if it is of the regular expression given as a one of the base case regular expressions then if it is given if the regular expression is this then answer yes it is empty these two cases answer no then depending on whatever is your regular expression otherwise if it is of the form R 1 plus R 2 then recursively decide L R the emptiness of R 1 and emptiness of R 2 and you can answer and similarly for the other in this case of course, the moment you see the regular expression is this form you can immediately say the regular expression denotes a non-empty language because it is at least epsilon. Now that we have seen efficient algorithms for deciding whether a regular language is empty or not we can use this to decide many other questions especially when the representation is in terms of DFA so let me give an example that given two DFA's M 1 M 2 is language accepted by M 1 is same as the language accepted by M 2. See the base one way we can do this is very simply that suppose so it is clearly like this that you take the language the machine M 1 and so let me say this is L 1 the language is L 1 for notational convenience what will this I think it is simpler to see in terms of supposing this is L 1 this is L 2 what we are looking for L 1 intersection with all those which are not in L 2. So, this is the part and L 1 complement L 1 complement intersection L 2 so all those strings of L 2 which are not in L 1 now see if both of these are empty then that means what that means L 1 is equal to L 2. So, you see the point is that because L 1 and L 2 are given to me by a DFA I can very easily constructed DFA we have seen this for this to accept this regular language as well as a DFA for this regular language and then a DFA for this entire language and then we can check whether that regular language denotes or accepts the empty language. So, we can see that this is the case if and only if L 1 is equal to L 1. So, given to me 2 DFA M 1 and M 2 then I can construct this language and then L M 1 is equal to L M 2 if and only if basically we are seeing there is no string which is accepted by M 1 and not accepted by M 2 as well as there is no string which is accepted by M 2 and not accepted by M 1. Therefore, in that case the two languages are equal and therefore the two machines accept precisely the same set of strings. So, similarly in a very similar manner you can actually you can see that I can decide like this this question also is L M 1 is a subset of L M 2 is L M 1 a strict subset of L M 2 all these questions once given to DFA they are very easy to decide because we can you know in this kind of way we can look at the equivalent and a DFA whose emptiness is going to decide such a thing. So, now let me just take another example just another example to illustrate this point that is. So, here I am just given one DFA M given a DFA M is L M sigma star that is it contains everything every string in the language well what will you what we can do that all what we can do is to you know let M dash be the DFA such that language accepted by M dash is sigma star and we just ask the old question is L M equal to L M dash this is very easy to define a DFA with which will accept all strings and then we are just taking whether these two DFA accept the precisely the same set of strings which is where we will use the previous idea algorithm and therefore, you can even decide this is another class of problems people are interested which is given a regular language I mean is it infinite does it have infinitely many in other words what we are asking is the language infinite and if we can decide that then of course we can decide given a language L whether it has finite that is it has only finitely many strings. Now, it can be done at least in two ways and one way actually is from our idea of pumping lemma I mean this is the although algorithmically it is not going to be very efficient, but let me let me write this out given a DFA M L M this language accepted by the DFA is infinite if and only if there is a string X in or X accepted by M such that the length of X is equal to L M dash X is greater than equal to N less than equal to 2 N where N is the number of states. So, this remembered M is a DFA and let us say it has N states and now let us look at it at least this that suppose you take a string we know that that any string is less than whose length is greater than equal to N the pumping lemma proof shows that and which is accepted by M such a string can be pumped infinite amount of times right and therefore each time I pump I will get a new string different from the previous strings. So, therefore just by pumping I can generate infinitely many strings from that one string. So, what this thing is telling me is that remember pumping lemma set for every string accepted by the machine M whose length is greater than or equal to the number of states. Now what this is saying is that left open that yes I am looking for a string whose length is greater than equal to N, but is there a bound within which I am guaranteed to find it if it is there and in fact this is the bound at most you need to go up to 2 N. Why because the value that you pump so let us see we had a long string bigger than 2 N which is accepted by the machine M then clearly if you recall the pumping lemma such a string there will be a particular v part now I will pump down on pumping down either I become I come into this range or I am still greater than 2 N and then I again I pumped out what is this idea why why that is coming because what we pump is at most of size N remember that the v part the size of v is less than equal to N. So, from a large string I can keep taking off chunks of N and then since that chunk can be of size N therefore I will get at least some string in this range. So, this shows that this lemma can be used so this can lemma can be used for an algorithm. So, what I can do given a DFA all I need to do is to look at its number of states that is N and then look at strings in this range and see whether any of these strings is accepted by the DFA. But clearly this is a exponential time algorithm because a string of size N how many strings are there of size N exponentially many and we would like to according to this naive algorithm you check each such string in this range whether that string is accepted by M accepted by M and so on. There is clearly if you think about there is a polynomial time algorithm and whose basic idea is this that if imagine this DFA and if there is of course for it to have some string there has to be a path from q 0 to 1 of the final states q f in the transition diagram graph. And now in this graph if there is a cycle then clearly you will be able to generate infinitely many strings through this and this idea is not just true of this idea that I am talking of that if I have a path from initial state to one of the final states and in that path there is a state on which there is a cycle that then the resultant language will be infinite. That idea is of course true even when the machine is an NF what about the case when the machine is an NFA with epsilon transitions all I have to make sure is that the cycle is does not consist only of epsilon edges. So, in other words what we are saying that if I if in an automaton there is a path from the initial state to one of the final states and in that path there is a vertex where there is a cycle which is not which does not consist only of epsilon edges. Remember our context for this graph is the directed transition diagram graph in that case that finite automaton will accept infinitely many strings and that is a necessary and sufficient condition. And using that idea we can have a polynomial time algorithm for deciding whether an automaton finite automaton whether it is DFA, NFA or NFA with epsilon transition whether it accepts infinitely many strings. So, therefore what we have done today is we have looked at some decision problems concerning regular languages and we have found not only that all these decision problems can be solved by an algorithm and these algorithms they to be quite efficient polynomial time and as we look at other classes of languages electron will see that in fact there will be many decision problems for which will not have any algorithm at all not to talk of efficient algorithms. So, in that sense or this class regular language class is a very nice class.