 In the last lecture we started proving this important fact that regular expressions denote precisely the class of regular languages by that what we mean is that the set of regular languages the if you take all the languages that they denote that class of all languages denoted by regular expressions is precisely the class of regular language. And for this there are two parts improving this in a straightforward manner what we first showed in fact last time we completed this that given any regular expression are we can construct an NFA which will accept the language denoted by the regular expression are L R is a regular language this we proved by providing an NFA with epsilon transition for the language L R. Now we need to prove the other way and that other way is given any regular language say L there is a regular expression R such that the language denoted by that regular expression is exactly the language L which was given to us which is of course regular to begin with. So one point is that we are saying given any regular language L how does one give a language regular language because in general regular language is infinite so one can't list out all the strings in the language and that is where these machines come handy. So given some regular language it is indeed regular because there is an NFA or NFA with epsilon transition or a DFA in fact all three of them there will be some one of the all three such machines which will accept that regular language. So what in effect is there is that what we can say is that what we are going to prove the following that given a DFA M we can construct a regular expression the language accepted by the DFA is the language denoted by the regular expression. So we start with the given DFA and we show how for any given DFA we can construct a regular expression which is equivalent the sense as shown here. So let us give the details of the construction that we have in mind. So suppose M which is the DFA is u sigma delta u 1 and f where the set of states is u 1 up to u n there are n states and they are numbered as 1 to n. We are not using q 0 for a definite reason for a for convenience of notation as you shall see now. What we are going to do is we are going to construct regular expressions for this certain number of regular expressions in fact and then we will put them together and show I will get the regular expression r. So now let me define this r i j k this r i j k is a regular expression such that the language denoted by r i j k this language is the set of all strings which two things one delta hat of q i x is q j. So that means now the words that such a string x will take the machine from state q i to state q j and further x let us say the machine m does not go through any state greater than on on going from q i to q j. So this sentence is a big mouthful but the idea is very simple what we are saying that this r i j k now notice there are three of these i j and k and how they are occurring firstly r i j k is a regular expression which will have strings such that each such string x will take the machine from state q i to state q j. But there is a condition that the as the machine so in fact a picture will look ok. So this is the machine state q i this is the state q j and as this is let us say the string x. So there are these states which are intermediate states as the machine is going from q i to q j and this is let us say q l 1 the state is q l 2 what we are saying none of these states is such that for no q l i l i is greater than q j. So the machine is on x the machine is going from q i to q j what we mean by that does not go through any state is that that on this string x sometime let us say the state is reached and then the state is left all these states must have its number you know every state has a number from 1 to n all these states the machine is going through on the on this x from q i to q j these numbers are strictly less than or equal to k. But notice either i or j or both can be greater than k because we are not going through the initial or the final state reach by going through one means one passing through this you reach and then you leave. Suppose I managed to build these regular expressions and k can vary you can see either the machine takes a direct edge from i to j or machine goes from i to j going through the state q r or q 1 q 2 or q 1 q 2 q 3 and so on what should be clear that the language accepted by this machine m is r 1 j 1 n plus r 1 j 2 n j 1 j 2 q j 1 q j 2 q j m this other only final states. Let us understand at least this part in the beginning before we see how these can be defined what is it saying firstly remember according to this q j 1 is a final state. So, all the strings which take the machine from the initial state which is q 1. This is where the index one coming to that state j 1 which is a final state all these strings must be in the language accepted by m because they all take the machine from the initial state to one of the final states which is j 1. Then let us say I have another set of strings which take the machine from one to another final state which is q j 2 now that what about that passing through you see here what I have here as superscript which is n now there are only q 1 through q n these states are there. So, in a way when I have n as superscript this means that the machine is free to visit any state whatsoever in passing through in its in its path to 1 to j 1 1 to j 2 and so on. So, in that sense this quantity r 1 j 1 n denotes maybe I should write that somewhere clearly what we are trying to say language denoted by this regular expression is the set of all strings x in of course, sigma star that is the r alphabet sigma is the alphabet all those strings in sigma star such that delta hat of q 1 q 1 on the input x is q j 1 point I am trying to say although there is a qualification which is in terms of the superscript that means this such in general what it is saying is that this string does not pass through any state whose number is higher than n, but since there are no string no states whose number is more than n this really is no restriction at all therefore, this is true now as we have said here q j 1 is an element of the set of final states we call according to us that q j 1 q j 2 q j m these are the only set of I mean these are the only final states of the machine m therefore, a string which is a string is accepted if and only if it takes the machine to one of the final states. So, clearly you can see that the language therefore, language accepted by m is nothing, but the union of write this here union of the language denoted by the set of strings or the regular expression this regular expression where this union is over q j k is one of the elements of the final states. So, this is not too difficult to see if my definition if I am if I can indeed define r i j k ultimately r i j n then I can this union of course, I can do in terms of regular expression if I just take this each individual regular expression put pluses in between that is the union of the languages languages denoted by each one of them individually. So, then I would be able to write out a regular expression for language accepted by m. So, now how can we build r 1 j n in particular of course, we are interested more with r 1 to final state that kind of regular expression, but with the superscript n and this is where a very nice idea which will remind you of your dynamic programming algorithms that you have seen in algorithms forces comes in. So, basically what we are going to do is first of all define r i j 0 for all i j then using these definitions we will build r i j 1 for all i j and so on. Then we will be able to see that we will be able to build r i j with the superscript n again for all i j using the previous ones. So, first of all let us see what does it mean to say this regular expression what is r i j 0 should be according to us that this is the set of all strings which will take the machine from i to j without passing through any state whose number is greater than 0, but every state has number greater than 0 which means what such a string x cannot pass through any state at all it can just go directly to from the state q i to state q j and that means what that means these are really single symbols which takes the machine directly from state i to state j state q i to state q j and such a direct transition of course does not pass through any state. This is one case and there is another case that we will talk of right away. So, let me just say this is the set of so in fact let me write it this way it is the set of all a such that delta q i a is j and this is a symbol in sigma really I am giving you the set notation now. So, this will be a bunch of symbols in fact if it is a deterministic machine i to j they can be at most from state you know this is the situation q i q j and this we are talking of all the symbols which take which will take the machine from q i to q j and this notice this transition is not passing through any intermediate state. So, this is this can qualify to be such strings to now think of the a as a string that string should be in the language denoted by r i j script 0 by definition. Now, there is an extra thing that can happen when i and j is the same state right. So, of course, this is this is there that is when i equal to j in that case you see the string epsilon takes the machine from the same state to same state. So, union I should say union epsilon and this union will be effective if u i i is equal. So, we can actually more correctly I should say or a in sigma such that delta q i a is u j this set union. So, we will add the epsilon there are two cases if i not equal to j this is the case in that case it is only this that epsilon will not come otherwise epsilon also of course, takes the machine from q i to q i therefore, epsilon should be there and let me let me take a simple example just to illustrate this what we are saying. So, take a very simple machine in fact or the first DFA that we had seen and recall that this DFA accepts the set of all binary strings finite binary strings which each of which has even number of 0. So, let me number these states as 1 q 1 and q 2. So, what is let me say what should have should be r 1 1 let us say r 1 1 0 according to this since both i and j is same firstly is there a direct transition indeed it is there and that is denoted by the regular expression a and epsilon of course, takes the machine from here to there itself. So, we will agree that this regular expression which stands for the set a and epsilon just the string a and the epsilon has two element this is the set of all strings which will take this machine from 1 to 1 without visiting any other state. In fact, here is cannot visit 1 or 2 because their numbers are greater than 0. So, on the let us take another example if we wish. So, what is r 1 2 0 it is basically all those direct transitions and there is exactly 1. So, this is given by the regular expression. So, 0 which means the set of all strings which will take the machine from q 1 to q 2 without going through any state whatsoever and that means there is a direct work we can only take a direct transition and there is only 1 symbol which can take the machine from q 1 to q 2. So, therefore, this way you can write this clear I can complete. So, how many will be there for with this such regular expression you will be there with superscript 0 course r 1 1 r 1 2 r 2 1 r 2 2. So, these are the at the base level these are the regular expressions that I built and now let me inductively assume that I have built all regular expressions. Now, I am using an inductive inductive argument inductively assume we have defined all i j for all i j and for all k let us say less than equal to m what I have defined I have defined already r i j k we assume inductively that these definitions are there with r i j with superscript k where k is less than equal to m and in fact, what I was trying to show you little ago that you can start the induction with the base case where m is 0 with this we can define i j m plus 1 for all i j. So, you had the definition of r i j m and now you would like to define r i j m plus 1. So, you see look at this what is r i j m plus 1 suppose I have a string which is in the language denoted by this regular expression by that what I mean that that string takes the machine from q i to q j and of course, it can visit some states in between none of these states can be their index their superscripted this one can be higher than m plus 1. So, what we are saying is assume x is a string in this which takes the machine from q i to q j without passing through any state whose number is higher than m plus 1. Now, such a string can of course, go from q I mean take the machine from q i to q j without passing through m plus 1 at all. So, what are those strings which take the machine from q i to q j passing through all states passing through no state whose number is m plus 1 or higher that by definition is r i j m. And now consider the more interesting case here is a string x which takes the machine from q i to q j and in this the state q m plus 1 occur some number of times 1 or more times if it is occur 0 number of times then of course, it is here and we are only talking of strings which take the machine from i to j without passing through any state higher than m plus 1. And now so in the interesting case let us say q m was here. So, this is first time it came to q m plus 1 then again after some time it came to q m plus 1 and so on. And then finally, it came to again q m plus 1 in general and then there is this part of the string does not pass through m plus 1 at all. So, in other words what we are doing is we are marking out in the sequence of states the machine goes to goes through in going from q i to q j on the string x those states whose numbers are precisely q I mean who is the those states which are q m plus 1. So, such a string x can be what is a string like this and must you must realize this is clear that since this entire string is in the language generated by r i j m plus 1 and in this part this is the first occurrence of the state q m plus 1. So, this string has to be from the regular expression which is r i m plus 1 m right because m plus 1 does not occur and all other states are their numbers are less than this m plus m m or less. So, this is a typical string so let me say y 1 y 1 has to be in the language generated by this and take a string like this let me call it y 2 what is y 2 y 2 takes the machine from state q m plus 1 back to state q m plus 1 without passing through any state whose number is higher than m right. So, therefore, can I see y 2 has to be in the language r i j sorry r m plus 1 to m plus 1 y 2 takes the machine from m plus 1 to m plus 1. So, r m plus 1 m plus 1 and superscript is m because it is not passing through any state higher than this. So, there are many such strings because the machine may go from q m plus 1 q m plus 1 q m plus 1 so on and then finally, there is this last bit of these strings if we call it y l and what does y l do y l takes the machine from q m plus 1 to q j again without passing through any state whose number is higher than m. So, clearly y l will be in the language of r m plus 1 j m correctly I should say y l is a member of the language denoted by this regular expression. Now, let me see now what is our r i j with superscript m plus 1 should be this regular expression can denote I mean will denote a language in which there are strings which takes the machine from q i to q j without passing through any state higher than m. So, those strings that set of strings is denoted by r i j m or it can pass through there are we now need to talk of the second case that we talked of strings which take the machine from q i to q j, but in doing so it does pass through this q m plus 1 and we have seen such strings can be written as r i first the machine goes from i to m plus 1 without passing through any state higher than m. Then you know it goes to this goes to this those will be the strings which take the machine from m plus 1 back to m plus 1 without passing through any state whose number is higher than m and how many such pieces can be there 0 or any number. So, therefore, I will put a star here all such strings are for me and then finally, after the last q m plus 1 occurrence there has to be a string which will take the machine from m plus 1 to j again without passing through any state higher than m. So, these two regular expressions their language is denoted if you take their union that is precisely the definition of r i j m plus 1. Now, notice what we have done using the definition of various r i j with superscript m notice the superscript here is m using that I have defined r i j m plus 1. So, in this example if we wish we can before taking the example let me complete the proof at least say what the proof is now. So, I see that inductively I can create I can define all r i j starting from 0 then 1 then 2 then 3 then when I have r i j n for all r i j then of course, I take the those regular expressions whose strings take the machine from the initial state to one of the final states. And I add them up I put plus in between and therefore, I get the regular expression for the language accepted by the entire machine m clear. So, what we are saying just once more we define using above recurrence all r i j m starting m equal to 1 then 2 to 1 up to n starting with the base keys r i j. And I have told you r i j zeros are easy to define then use that this recurrence to define r i j 1 then r i j 2 then r i j n and once you have all the r i j n then the language accepted by m this machine is of course, we had explained little while back is going to be the regular expression r i j 1 plus r i j 2 plus r i j m with superscript m n where these are the q j 1 q j 2 q j m constitutes the set of final states. Therefore, it is now clear we start with r i j zeros then we use the recurrence to define r i j 1, r i j 2, r i j 3 when I get r i j n I will be able to write out what the regular expression is which denotes the set of all strings which are accepted by the machine m. So, in other words what I have shown is that using this process this algorithm in fact what we have outlined is an algorithm you can indeed build you can indeed construct a regular expression given any DFA such that that regular expression is going to denote the language accepted by the machine. Now, two things I should tell you one is that and the two things are this can give you horrendously long big regular expressions and you can see why you see that in general as you go from you know r i j m's to r i j m plus 1 look at the size. So, this is a you are writing one regular expression here another regular expression another regular expression of course, then you are putting a star and then this. So, a regular expression with suffix m plus 1 can be as large as four times the length of a regular expression with suffix m. So, as you are going from lower subscript to higher subscript the size of the regular expression that you are building can quadruple and if. So, basically in the beginning of course, this is some constant each r i j 0 is some constant in length, but then next time it becomes four times of that constant next time it becomes four times of the previous constant. So, 16 times of the base case and so on. So, therefore, you can see that the expression that you are going to build in general can become exponentially long in terms of exponential in the number of states this suffix is of course, goes up to the number of states. So, that is point number one that regular expressions built through this process can be exponentially in length where they were in terms of the number of states of the machine. The other thing is that these huge regular expressions can be brought down somewhat in size using some simple manipulations. For example, here you see take this case that we are talking of. So, of course, r 1 1 0 was 1 this is the direct 1 plus epsilon. Now, if I want to write out r 1 1 1 according to this what do I need to do? I will write r 1 1 0 then r now what is m plus 1 this is 1. So, again r 1 1 then r remember m plus 1 is 1. So, r 1 1 whole star and then m plus 1 is 1 and your j is also 1. So, this is what we will write, but if I so in terms of this what is r 1 1 0 is of course, 1 plus epsilon and then 1 plus epsilon then 1 plus epsilon star then 1 plus epsilon. Now, it is easy to see what we are saying that this is of course, a long expression, but do you see that this is actually you know is you know much simply could have been written as simply 1 star is not it? r 1 1 which can go through only 1 and not this state q 2 is basically any number of loops here either 0 or more time. So, this whole thing could be simplified you know at least here using ad hoc arguments or using some algebraic laws of regular expressions to a very simple expressions. So, this whole huge thing is basically boils down to this simple regular expressions. How do we simplify regular expressions that may be one question like this is a case in point that you know you start using this using our recurrence blindly I got this and, but this can be massaged through this very simple thing. So, for that one uses one can use some laws which is very general and I will talk about them now. Here I have listed out some standard algebraic laws for regular expressions. What do we mean by algebraic laws? You see for example, when you say x plus y whole square equal to x square plus 2 x y plus y square that certain identity algebraic coming out of the from arithmetic. So, here in the same manner what x and y were in that example where this stood for any number here this l and m in these for example, this stand for any regular expression. So, what we are saying here for example, if you take this first one we are saying take any regular expression and another regular expression. If we put a plus in between then the regular expression that we get that denotes the same language as the language denoted by taking you know putting it the other way around you can see. So, you have to understand this equality in the sense that the languages denoted by the left hand side regular expression is same as the language denoted by the right hand side regular expression and sometimes you may call these l and m their variables for regular expression. In the sense the you can plug in any regular expression for l l any regular expression for m then you will get an some equality what you are saying is that the left hand side regular expression denotes the same language as the right hand side regular expression on plugging in any regular expression into l l and m you know for example the first one it is coming out of the fact what is plus plus is basically union plus is the union of the two languages denoted then we are saying and as you know union operation is commutative and this is associativity of union. Now, therefore, we do not have to put brackets right the way we do not have to put brackets to say do I first add 3 and 5 and then add to the result I add 5. So, we know that this is same as 5 plus 3 plus 5 by this what we mean is the order of these operations these are binary operations. So, we could have done only two at a time that order is not important first evaluate this then do this or evaluate this then do this it does not matter. So, that is why we just write for regular expression we may just write you know r 1 plus r 2 plus r 3 in fact these kinds of things will be very common right you do not need to put brackets and this is precisely saying that same thing we are saying here that concatenation the dot is actually is also associative right order of in which manner I did the concatenation which two of the regular expression languages I first concatenated that is unimportant. Now, this is saying and this is another point here in a algebraic law you can also have constants so for regular expression we have two constants phi and epsilon you might have said what about symbols now you see symbols would have come would have needed for that I needed to assume some alphabet see all these things are independent of alphabet whatever be the alphabet these laws will be true. So, what we are saying this is saying something very trivial that if you take the union of empty set with any set you get back that set right this is saying that if you take the union of the language language consisting of the empty string not union concatenation of that language with this you get back same thing that is clear and this is an interesting thing this is saying that if you concatenate the empty language with any language remember empty language is the language which has no string at all as opposed to this language this is a language with one string and that is the empty string you will get back the empty language and that is not too difficult to see if you just use the definition of concatenation this is saying that concatenation distributes over addition is to a same thing and this is again coming out of this is the law this is the law it is showing that union is or using the fact union is idempotent that l union l is of course l so we write l plus l is l and this is also something interesting that if you take l star if you try closure one more and more and more on the same thing you do not get back anything new so l star and whole of that star is l star and what is this phi star the empty language if you take its closure then you get the language with empty string that is because the closure necessarily closure of any language has to have epsilon it cannot have anything else and therefore you get this epsilon star is epsilon now using some of these algebraic laws you may be able to simplify regular expressions how do you prove this well you can prove you know argue from the first principles for example in this case using the fact that we are talking of the union of two languages and in this one of the ways of doing it and as we said that this algebraic laws use their use sometimes simplifies regular expressions so this completes our discussion on regular expressions and we have a very important several couple of topics left which are on regular languages which are we will cover in the subsequent.