 we want to show that regular expression and finite automata are equivalent. We have already said that to prove this we need to prove two points first is that given any regular expression r we should be able to construct a finite automaton a so that the language except by the finite automaton a is precisely the language depend by the regular expression and then given any dfa a we should be able to construct a regular expression equivalent regular expression that means the language except by the finite automaton a is exactly the language depend by r so we have already shown the first step in the last lecture for any given regular expression we have constructed an equivalent finite automaton a in today's lecture we are going to show the next step the second step that means for that means we want to prove this theorem if a is a dfa then l of a is regular that means there exist a regular expression r representing the same language l a except by the dfa a so we will prove this by index on the numbers of states of the dfa a consider for base case the dfa to be say a the dfa containing the set of states q alphabets sigma delta is a transition map q naught is the star state and f is the set of states so for the base case let us consider that this dfa a has only one state only one state so it is a base case so in this case there are two possibilities for the set of final states f that this dfa has first f equal to phi that means there is no final states in such a case l of a the language except by a is nothing but the empty language but this phi is a regular expression and hence this is regular and then the other possibility is that f is a set of states that means the only state that the dfa has is a final state in such a case l a is nothing but sigma star it accepts all strings over sigma but sigma star is also a regular and hence in both cases we have seen that this is there is a regular expression representing the language except by the final automaton a now let us consider that the result is true for all those dfa whose numbers of states is less than n so numbers of states less than n the result is true that means for a dfa a dfa a we can construct a regular expression that means the language except by the dfa is regular so let the dfa to be a containing that elements q sigma delta q 0 f say numbers of states now is say n so first note that the language l except by the dfa a can be written as l equal to l 1 star l 2 so where l 1 is the set of strings that start and end in the initial state q 0 and then l 2 is a set of strings that start in q 0 that start in q 0 the start state and end in some final state that means we can consider that say this is the start state q 0 so from q 0 if a string leads us from q 0 to again q 0 say this after processing string q 0 on initial state q 0 if it brings us back to the same state q 0 and then suppose for again some other strings it again brings us to the same state q 0 then union of all those strings will be considered as l 1 and we can take this particular string n numbers of times because we can start q 0 come back to q 0 and continue again with the string and come back to q 0 we can take it n numbers of times that is why it is l 1 star and then the other kinds of string is that start q 0 and process the string eventually we arrive at some final state say q f 1 similarly we can process some other string and arrive at state q 0 and arrive at some other final state q f 2 and so on say q f k so consider the union of all those set of strings so first we take the strings of this kind from this language and then we take considered the language l 2 union of all those strings which leads us from q 0 to some final states q f q f 1 where q 0 is not an intermediate state over here in this part so we can take this string n numbers of times and then concatenate with a string from l 2 and that will be the precise that that that will be precisely the language l except why this d f a we include epsilon in l 2 if q 0 is a final state if q 0 is a final state then epsilon belongs to l 2 so we consider this situation now using the inductive hypothesis we prove that both l 1 and l 2 are regular that means since l 1 and l 2 are regular we have constructed l by using only regular operation l 1 star concatenation l 2 therefore l must also be regular now we will be using the notations for defining language l 1 and l 2 suppose q belongs to the set of states and x is a string over sigma star now we will denote the set of states on the part of x from q that come after q that means once you process the string x at state q then will be we will arrive at some set of states and those set of states is basically denoted as p q x so if you consider x to be a 1 a 2 up to suppose some a n so after processing the string a 1 a 2 up to i then will arrive at some state starting with i equal to 1 up to n we take the union of all those states that is the set of states that will reach if you process the string x at state q and we denote this to be p q x so p q x is a set of states that we arrive at by processing the string x at some state q now we define l 1 to be the set of strings x belonging to sigma star such that delta hat q 0 x equal to q 0 so we have defined l 1 to be the set of all strings x such that processing x at initial set q 0 eventually lead us to the same state q 0 by this self loop similarly l 2 is basically l 3 if q 0 does not belong to f and it is equal to l 3 union epsilon if q 0 belongs to f where l 3 basically the set of strings x belonging to sigma star such that delta hat q 0 x belongs to f so we write it so where l 3 is the set of strings x belonging to sigma star such that delta hat q 0 x belongs to the set of final state that means if we process the string x at initial set q 0 we eventually reach one of final states f and then q 0 does not come in the part while processing x that means q 0 does not belong to p q 0 x according to our notation that we have just defined so clearly l equal to l 1 star l 2 because we can take the strings of this form l 1 from l 1 n number points and if we connect with l 2 eventually reach or arrive at one of final states of the d f a therefore this n string of this form will be accepted by the d f a and hence is a language precisely is a language accepted by the d f a now we will first prove that l 1 is regular that means there is a regular expression for a language l 1 and then we will prove that l 2 is regular if we can prove that l 1 and l 2 both are regular then l must also be regular because we have considered l by using only regular operations now to prove that l 1 is regular we consider the set say a is a set the set of pairs of this form say a b where a b a and b are symbols from sigma that means it belongs to sigma cross sigma such that delta hat q 0 a x b equal to q 0 for some string x belonging to sigma star for some x belonging to sigma star and then delta q 0 a is not equal to q 0 that means if we process a taking single symbol a at q 0 it does not come back to q 0 it will go to some other state and then q 0 does not belong to the set of states p q 0 x that means it does not come in the part while processing x at state q 0 so this is q q 0 does not belong to q a x that means q 0 does not come in the part while process x at state q a where q a is nothing but the state that we get while taking the input symbol a at state q 0 since we have already said that delta q 0 a is not equal to q 0 it must be some other state and we call it say q a and while processing the string x at state q a q 0 does not come again in the part now for this pair a b belonging to the set a we define the language l a b so for the pair a b we define the language a b l a b is a set of all strings x belonging to sigma such that delta h q 0 a x b equal to q 0 and q 0 does not come in the part p q a x where q a is nothing but delta q 0 a so define the language l a b like this now it will show that l a b this language as defined is the language accepted by the following d f a a b say a a b j d f a we define it like this is q dash sigma delta dash q a is a star state and f dash so where q dash is nothing but the set of previous states except for the star state q 0 so we leave out the star state from the original set of states and that is the set of states of the new d f a a a b and q a the star state of this d f a is nothing but the state that we arrive at by taking the input symbol a at state q 0 and the set of one such f dash is all those states belonging to the set q such that delta q b that means when you process eventually this last symbol b whenever it goes to q 0 then all those states q for which taking symbol b it goes to state q 0 is considered to be a final state except for of course the state q 0 q 0 is not in the set of states and delta dash we retain the same set of transition functions with a restriction that it is from q dash cross sigma to q dash now suppose the string x belongs to l of a b so clearly a a b is a d f a suppose the string x belongs to l of a b now since q 0 does not belong to the set of states in this new d f a a b and q 0 also does not belong to the set of states that we can arrive at by processing while processing the string starting at q a the star state and delta dash at q a x belongs to f dash whenever say x belongs to this language of this d f a therefore this implies that delta hat q 0 a x b is nothing but delta hat delta q a x b that means first is symbol a at state star state q 0 and then compute the process the string x b at that state considering this extension function therefore this can be written as delta hat since delta q 0 a is nothing but q a so this x b but this can be written as delta delta hat q a x so first process the string x therefore we used an extension function and then compute delta of whatever we as if at that state compute delta of that state b so this is nothing but delta of some p where p is the state that we arrive at after processing this string x at state q a so delta p b where p must belong to the set of final states because this string is accepted by this d f a hence p must belong to some final state according to our definition according to our definition this is delta p b is q 0 so that since delta hat q 0 a x b equal to q 0 according to definition of language and a b we know that this string x belongs to l a b the converse is also similar that means converse is also similar we can prove it similarly that means if a string x belongs to l a b this x will be accepted by the d f a l d f a a a b hence what have done is that in the d f a a a b the d f a a a b the numbers of states we have is exactly n minus 1 because we have already omitted the star state q 0 from the set of states that means q dash contains all states in q except for q 0 and numbers of states in the d f a a b is exactly n minus 1 therefore according to induction hypothesis the language l a b is regular now if we write the set say b to b all symbols a belonging to sigma such that delta q 0 a equal to q 0 union the string epsilon then clearly you can we will see that the language l 1 that we have already defined is nothing but this b union set of all strings of this form a l a b then b where this pair a b belongs to the set a and this language l 1 we have constructed by taking union of a regular set this set is regular set b is regular we have shown l a b to be regular we have constructed with a and b so a concombination l a b b and I have taken the union of those regular languages therefore l 1 must be regular since we have constructed l 1 by using some regular operations over some regular languages therefore l 1 is regular hence we have seen that l 1 is regular now when the prove that the language l 2 is also regular the prove this claim to that l 2 is regular we consider a set say c the set of all symbols a belonging to sigma such that delta q 0 a not equal to q 0 so when you process or take the symbol a at the start set q 0 it does not go to q 0 again and for a symbol a belonging to c we define the language l a to be the set of all strings x belonging to sigma star such that delta hat q 0 a x start with a symbol a and process the string x that eventually leads us to some final state and q 0 does not appear on the path while processing it from that state q a where q a is exactly that state that we arrive at taking a symbol a at the start state q 0 now for symbol a belonging to set c we construct a d f a which is exactly q dash sigma delta dash q a and f double dash where q dash is a set of states q except for the star state q 0 that means this d f a contains call it this call this d f a to be say a a this d f a a contains one state less than the previous d f a and then q a is the starts of this d f a a a is exactly delta q 0 a the state that we arrive at by taking symbol a on state q 0 and the set of final states f double dash is nothing but all those states final states of the previous d f a except for the star state q 0 and delta dash is again the restriction of delta 2 delta dash is basically restriction of delta 2 q dash cross sigma now it is easy to observe that l of a the language except by the d f a a a is exactly l a now first note that q 0 does not appear in the context of l a and l a a now if x any string x belongs to sigma star belongs to l of a language of this d f a a this belongs to this language of d f a if and only if delta dash hat q 0 sorry q a x because the star state is q a processing x it eventually belongs to some final state of this d f a since q 0 does not appear this if and only if delta dash hat q a x belongs to f since we have retained all the final final states of the previous d f a this if and only if delta hat because we have retained the same transition function restricted to the new set of states we can replace delta dash by delta so if this belongs to f if that is the case we can write it like this it is delta dash q 0 so appending a or taking taking a at the beginning if we start at q 0 take a since q a is nothing but q 0 a if we take delta we can write as delta q 0 a x if this belongs to f so this q a is written as delta q 0 a this if and only if delta hat we can write it as delta we can write it as since this delta q 0 a x we can write it as delta hat q 0 a x x and transition function process the substring process string a x if this belongs to f since q 0 does not appear so this implies that according to our definition of the language l a we know that this is nothing but x belongs to l of a so we have shown that if x belongs to l of a if and only if x belongs to l a so again the numbers of states in q s is exactly n minus 1 because we have omitted the start set q 0 in a a therefore by inductive hypothesis the language l a is regular but clearly the language l 3 is nothing but union of all those languages a l a just contact a with l a and for all a belong to the set c therefore l 3 is also regular hence l 3 is regular hence this completes the proof of the theorem that means given any d f a the corresponding language except by the d f a is regular that means we can always construct a regular expression for the language except by the d f a now let us give an example consider the d f a given by the transition function here say q 0 is the start set on a it remains at q 0 on symbol b it goes to set q 1 q 1 on a it comes to set q 0 and on b it goes to q 2 where q 2 is a final state q 2 on a comes back to set q 0 and on b it remains the same state q 2 consider this d f a now we note that the following strings bring the d f a following strings brings the bring the d f a from q 0 back to q 0 that means those are strings that will be during the language l 1 for this particular d f a for a means via the path q 0 q 0 because you can take the self loop start q 0 on a it will remain q 0 then again if we take the this path from q 0 to q 1 then again back to q 0 taking a will get the string b a then take the path start q 0 go to q 1 and back to q 0 and then we can also consider the path we can from q 0 go to q 1 on b from q 1 you go to q 2 on b and on any numbers of b you remain at q 2 and eventually on taking following this path taking an a you come back to q 0 that means for n greater than equal to 0 the strings of the form b b b river n a so all those things will bring us back from q 0 to q 0 again if we take the path q 0 q 1 q 2 then you remain at q 2 for as many times as we want by taking a b and eventually by taking an a you come back to q 0 there is one possibility thus l 1 can be written as string a or b a or b b b t for n a for n greater than equal to 0 again since q 0 is not a final state q 0 is not a final state therefore l 2 the set of strings which take the b f f from q 0 to the final state q 2 there is only one final state from a start set q 0 what kind of string brings us from q 0 to the final state q 2 where q 0 is not in the path it will be of the form we have to take this b taking b we will have to go to q 1 and from q 1 we can taking b we can go to q 2 and we can take n numbers of time this b in a self look therefore l 2 is basically of the form b b b river n where n is greater than equal to 0 now l 1 is of the form or l 1 can be written as or expressed by the regular expression a or b a or b b b star and l 2 can be written as b b b star thus as part of construction about theorem the language expressed by d f a is basically l 1 which is l 1 star l 2 that means this can be written as or expressed by the regular expression a or b a or b b b star a star this is l 1 this star l 2 l 2 is basically b b b star so this is the regular expression corresponding to this l 2 so l 1 star l 2 so this is the regular expression corresponding to the language l expressed by the d f a now what will do is that the main point is that given any d f a a we should be able to construct an equivalent regular expression and we can always construct an equivalent regular expression in the sense that the language written by this regular expression r is exactly the language expressed by the d f a a now how to construct a regular expression r for equivalent regular expression r for any given d f a so there is an algebraic method given by Brozovsky that means Brozovsky so there is an algebraic method which is proposed by Brozovsky we will just now we will now discuss this Brozovsky's algebraic method for construction by regular expression for any given d f a let a be a d f a containing the tuple written by the tuple quintuple q sigma delta q 0 f where the set of states is basically q 0 q 1 up to say q n their n numbers of states and the set of fun states we have say k numbers of fun states say q f 1 q f 2 up to say q f k now for every i every state q i belong to the set of states q right say r i is a set of states that means the language or set of states belong to sigma star such that delta hat q 0 x equal to q i that means if we process the string x at the initial state q 0 process the string x at the initial state q 0 eventually we will arrive at the state q i collect all those strings x to construct the set r i now we will note that the language of this d f a is nothing but union of all those sets r f i where i equal to 1 2 because if we consider for every state q f 1 the corresponding set r f 1 r f 2 and r f k and if we take all those strings take the union of all those strings that is exactly the language of the d f a hence l a can be written as the language d f a can be written as union of all those sets r f i for i equal to 1 2 k now in order construct a regular expression for l a for the language of a we propose an unknown for each r i say it is r i small r i so for each r i we propose an unknown say regression say small r i we observe that r f 1 plus r f 2 up to say r f k is exactly is a regular expression for the language of the d f a l a suppose say sigma i j subscript i j is a set of those symbols set of those symbols of sigma which take the d f a a from the state q i to the state q j just assume that sigma i j this notation is used for a set of those symbols of sigma which take the d f a a from the state q i to q j that means sigma i j is nothing but all those symbols a belonging to sigma such that delta q i a equal to q j this is what exactly we have defined clearly as it is a finite set sigma i j is regular expression as some of its symbols whatever the symbols will be there if we sum it up that is a regular expression for sigma i j it is always finite and you can write a regular expression for that now let s i j s subscript i j be the expression that expression for sigma i j. So, whatever regular expression we get say denoted by s i j now for say j greater than equal to 1 less than equal to n since the strings of r j are precisely taking the strings of r j are precisely taking the d f a from q 0 to any state q i then a reaching q j with symbols or the symbols of sigma i j we have r j to b we can write it as r 0 sigma 0 j. So, r 0 is a set of strings it takes us from q 0 to q 0 and then just consider one symbol which will which will take from q 0 to j q j denoted by sigma 0 j union r 1 sigma 1 j set of all those strings which takes the d f a from q 0 to q 1 and from q 1 to q j on a single symbol from of say sigma 1 j like that union r i sigma i j and eventually r n sigma n j. So, any r j can be written as a union of all those and in the case of r 0 it is basically r 0 sigma 0 0 union r 0 sigma 0 0 union r 0 sigma i 0 to say r n sigma n 0 union of course we have this string empty string epsilon as epsilon takes the d f a from q 0 to itself without taking any that is why we have taken epsilon at the end. Thus for each j we have the equation for r j which depending on all r i's call the characteristic r j. So, we can write the system of characteristic equation of a as r 0 is basically r 0 s 0 0 plus r 1 s 1 0 plus r i s i 0 plus r n s n 0 plus epsilon then r 1 can be written as r 0 s 0 1 plus r 1 s 1 1 r i s i 1 plus r n s n 1. So on for r j can be write it as r 0 s 0 j plus r n s n 1 plus r i s i 1 plus plus r 1 s 1 j r i s i j plus r n s n j. Similarly, for r n it is r 0 s 0 n plus r 1 s 1 n plus like that r i s i n evens really r n s n n. So, this how we can write the character equation for a for each state q 0 through q n we have the corresponding regular expression r 0 r 1 up to r n. Now, the system can be solved this system can be solved for r f i s because r f 1 r f 2 up to r f k these are all set of final states and since the language of this d f a is nothing but r f 1 plus r f 2 up to plus r f k solving for this r f i s we can eventually find out the language regular expression for a language of the d f a a. So, this is the corresponding regular expression. Now, we can solve it for probably system equation for r f i s that means for the final states for the final states via state forward substitution except the same unknown may appear on both sides on both the left and right hand side of the equation same unknown may appear. So, this situation can be handled using one principle called r n s principle r n s principle which says that if s and t are regular expression and r is an unknown an equation to form r equal to t plus r s where it is unknown appears on both side left hand side and right hand side and where epsilon does not belong to language of s. So, it is a unique solution given by r equal to t s star the solution is basically r equal to t s star. So, this is what is called r n s principle and you can use this r n s principle whenever this unknown r is appears in both side of the equation. So, by successive substitutions and application of r n s principle we can evaluate the expression for final states in terms of symbols from sigma. Since the expression or the operations involved here are admissible for regular expression we eventually obtain regular expression for r f i. So, we demonstrate this by an example consider the DFA containing only two states where q 1 is a final state q 1 is a q 2 is a star state on a it goes to final state on b it remains the same state q 1 on b remains the same state and on a comes to the star state. Now, the characterization for this DFA will be say r 0 equal to it is r 0 b it takes r 0 and on b it will remain the same state plus it will be r 1 a because from this state on a we come to this state r 0. So, it is r 1 a plus epsilon because this is the initial state. So, this is what we have got the characterization for the star state regular expression for the star state similarly for state q 1 the equation is it is for r 1 is r 0 a plus r 1 b. Since q 1 is a final state r 1 represents the language of the DFA because there is only one final state that is q 1 is the final state it is only final state hence r 1 represents the language of DFA. So, we have to solve this two equations for r 1. So, we will solve this equation for in terms of for r 1 in terms of a and b which are the only symbols in the alphabet. Now, by r 0's principle we will see that if we consider first equation r 0 equal to r 0 b plus r 1 a plus epsilon r 0 can be written as epsilon plus r 1 a b star. Now, substituting this in the second equation here we see that r 1 is r 0 a means epsilon plus r 1 a b star a plus r 1 b. So, it is r 1 b. Now, simplifying this we can write it as b star a plus r 1 a b star a plus b this is just by simplifying. Then by applying again since r 1 appears on the both right hand side and left hand side again applying r 0's principle on r 1 we get r 1 to be b star a a b star a plus b star. Now, which is a regular which is a desired regular expression represent a represent a language for the given DFA in the example. Similarly, if we consider the next example say we have three states q 1 is the star state on a it goes to q 2 which is also a final state q 1 and q 2 are final states on b it remain the same state q 2 on a goes to q 3 which on a b remains in the same state and q 2 on b goes to the star state. Now, for this we can write the characteristic equation as r 1 for the state q 1 as r 1 b because r 1 on b remain the same state plus from q 2 we can come to q 1 on b. So, it is r 2 b and from q 3 we cannot come to q 1 and since this is a star state this is a star state epsilon will be there then for state q 2 r 2 can be written as from r 1 you can go to q 2 on a therefore, it is r 1 a from q 2 you cannot go to q 2 and from q 3 you cannot come to q 2 therefore, it is the only term on the right hand side similarly r 3 can be written as from q 2 I can come to q 3 on a. So, it is r 2 a and from q 3 on a and b both a and b a or b I can come to q 3 again. So, r 3 a plus b. Now, since q 1 and q 2 are final states the expression for the regulation r 1 plus r 2 will represent the language for this given DFA. So, we will solve these equations for r 1 and r 2 in terms of a and b. Now, if we substitute r 2 in r 1. So, substitute this r 2 in r 1 we will find that r 1 equal to r 1 b plus this r 2 can be replaced by r 1 a b plus epsilon is nothing, but r 1 b plus a b plus epsilon now applying ardence principle you will find that r 1 can be written as epsilon b plus a b star which is nothing, but b plus a b star. Thus r 1 plus r 2 can be written as b plus a b star plus since r 2 is r 1 a. Therefore, it is b plus a b star a and so regular expression for this DFA is nothing, but this simply b plus a b star epsilon r a. So, therefore, this is a corresponding regular expression for the given DFA. Therefore, using ardence principle and solving the characteristic equation for the given DFA in terms of the symbols of the language we can always find out the equivalent regular expression for the given DFA by using this Brozovsky's algebraic method.