 .In the previous lecture, we have seen how can we remove euler symbols or say unit productions to simplify a grammar. In this lecture today, we will first see how can we remove unit productions. The procedure is almost similar to that of your removal of epsilon productions. So, we have said that unit productions in the form a goes to b where a and b are non terminals. Now, what we can do is that we say that for any Cfg say g which is n sigma p s we can say that d r is an equivalent Cfg say g s which has no unit production. We will prove this using similar approach that we did for removal of epsilon production. So, first we will construct a Cfg say g hat. So, from g we construct a Cfg say it is g hat which is n sigma p hat and s that means, we are going to include some more productions. We say that p hat contains all the productions of p and it is closed under the following rules. So, the rule is that if a goes to b and b goes to beta r in p hat then we include a goes to beta as well in p hat. Obviously, p hat will be finite because there are finite numbers of productions in p hat. Clearly, you can say that l of g is a proper subset of l of g hat because p is a subset of p hat. So, l of g is a subset of l of g hat because p is a subset of p hat. So, it is quite clear. Now, we can also show that l of g hat is a subset of l of g. We can show that g and g hat are equivalent. Now, this is a case because every new rule that added to p hat is because of the above rules and that rule can be simulated using two steps because we have said that if a goes to b and b goes to beta is in p or p hat say since everything is there in p will also be in p hat. So, if these two rules are there in p then a goes to beta is in p hat. So, these are rule we used. So, therefore, if in the original grammar g if we use these two rules then or if we use this rule in grammar g hat in some derivation then that can be simulated by using these two steps using these two productions in grammar g. That means, say alpha 1 a alpha 2 suppose in g hat in one steps in one step suppose we derive alpha 1 beta alpha 2 then this step can be simulated in g as like this say alpha 1 a alpha 2 in one step in g will be alpha 1 b alpha 2 and in one more step in g alpha 1 beta alpha 2. So, it needs only one more step. So, therefore, these two grammar are equivalent g and g hat are equivalent. Now, what we can do that using a similar reasoning that we gave in case of say removal of unit productions we can say that a minimal length derivation in g hat really do not use any unit productions and hence we can throw out all the unit productions from a grammar g hat and to have a new grammar say g dash and which will be equivalent to original grammar g. That means, from g hat we remove all unit productions to get a new g dash and eventually g and g dash will be equivalent. Now, what we want to show that a minimal length derivation in g hat do not use any unit production in that case only we can throw out all unit productions from g hat to get the equivalent grammar g dash suppose that a string w is derived under g hat where w is a string of terminals only. Now, say this is a minimal length derivation in g hat and assume for contraction that that minimal length derivation in g hat for w uses some unit production that is this derivation will be like this. So, as derives under g hat say in 0 or more steps it goes to alpha 1 a alpha 2 this form then one step this a is replaced by b unit using the unit production alpha 1 b alpha 2 and then in 0 or more steps under g hat we derived w. So, what we have said that at this point in this step we use the production a goes to this unit production is used and we have said that this is a minimal length derivation. Now, we will arrive at a contraction the point is that the occurrence of this non terminal b eventually must be removed this post of using some production of the form say b goes to beta the production of the form must be used at some point to dispose of this non terminal b. So, therefore, this product derivation look like this. So, s from s in say m steps in g hat where m is greater than or equal to 0. So, it is alpha 1 a alpha 2 say it takes m steps then in one step under g hat it is replaced by b using the unit production. Then in another suppose n steps in under n steps say this alpha 1 alpha 2 are converted to some other string of terminal say gamma b delta and eventually in another one step in g hat this b is replaced by beta using this rule and eventually in another suppose k steps under g hat we get the string of terminals say it is w. So, it total requires m plus n plus k plus 2 steps in grammar g hat and we have said that this is the minimal length derivation. But according to our rule that we have followed that if a goes to beta a goes to b and b goes to beta is there in p hat then a goes to beta must also be there in p hat. Now, what we can do is that we can use this steps starting with s in say m steps under g hat we will get alpha 1 a alpha 2. Then in another one step instead of replacing a by b we can replace a by beta using this rule since this rule is also there. So, alpha 1 beta alpha 2 then in under n steps in the same steps that we use earlier that will be gamma beta delta and in another k steps under g hat we will have w. So, this derivation is exactly m plus n plus k plus 1 steps. So, this takes m plus n plus k plus 1 step. So, it takes 1 step less compared to the previous derivation. So, our assumption that this previous derivation is of minimal length is false. So, therefore, we can easily throw out all the unit productions and can still have an equivalent grammar. By this way following this rule from g we can have g dash which will be equivalent to the original grammar g and will not have any unit production. Now, we will come to the discussion of normal form. We define in normal form by imposing some restrictions on the form of productions allowed in the grammar. For many applications it is often helpful to assume that CFGs are in one or another special restricted form or normal form. We will consider two of the most useful normal forms. One is Somskin Normal Form simply CNF and another is Greibach Normal Form simply GNF. We will first discuss this Homskin Normal Form and we will come to Greibach Normal Form at a later point. We say that first let us define this Homskin Normal Form. We say that SCFGs and sigma PS is in Somskin Normal Form or CNF if all productions are the form say A goes to BC or A goes to A where A B C are non terminals and small A is a terminal symbol. Please note that we do not use productions of the form A goes to epsilon and therefore, a grammar in CNF cannot generate the string epsilon that means empty string cannot be generated since we do not allow productions of this form. Now what we say is that in the form of a theorem say for any CFGs and sigma PS there is a CFG say G dash which is ANDS sigma P dash and say S in CNF in CNF form or in CNF such that L of G is equivalent to or identical to L of G dash that means G and GR equivalent that means for every CFG G we can find in CFG G dash which is in CNF and which is equivalent to G. Let us see how to get that CFG G dash which is equivalent to G and which is in Homskin Normal Form that means the proof of the theorem. So, without loss of generality what we can assume that the given grammar G does not have any is less symbol does not have any unit production or epsilon productions even if it has we can remove all those productions using the methods that we have already discussed. Now we use the following procedure to construct to construct G dash from G the first step is that for each terminal for each terminal symbol a belonging to sigma we introduce a new non terminal say it is a subscript a for every small a terminal symbol we introduce a new non terminal which is not there in the grammar G and we introduce a production and a production AA goes to in P dash. So, in P dash will include AA goes to a where a is the new non terminal and replace all occurrences of a in the right hand side of all productions. Whenever small a appears in the right hand side of any production and that occurrence will be replaced by that that occurrence will be replaced by and replace all occurrences of a in the right hand side of all production by AA. Of course, except for the production except for productions of the form B goes to a suppose if B goes to a is there we do not touch the right hand side, but if otherwise suppose we have a productions of like this A goes to say A B then this will be replaced by A goes to AA B and already we have included AA goes to A in the grammar. So, for this type of production we do not replace this A by A that is what we have said. So, after this step all productions will be of the form A goes to A where capital A is a non terminal and small a is a terminal symbol or it will be of the form A goes to some B1 B2 up to say Bk for some k greater than or equal to 2. So, every production will be either of this form or this form. So, it is quite clear from this construction and we have seen that by using this or by this construction we are not changing the language of the grammar that means g and g s are equivalent because in the new grammar whatever is generated in the old grammar any string in the new grammar it will take one more step and whatever we can generate in this new grammar can also be generated in the old grammar. Now, for all productions of the form A goes to say B1 B2 up to say Bk say if we have a production of this form. So, what we do we replace it by this production we replace by A goes to B1 C where C is a new non terminal that we have introduced in the grammar which was not there in the previous grammar and C goes to B2 B3 up to say Bk that means the remainder of the string. So, this part in this case also we are not changing the language of the grammar. So, we have for us this production we have introduced two new productions. So, whatever is generated in this previous grammar can also be generated in this new grammar, but it will take only one more step. Similarly, everything that can be generated in this new grammar can also be generated in the old grammar. So, this way we have reduced the length of the origin side here by one non terminal by one symbol. We continue this process that means again we introduce a new non terminal and from for this production again introduce a new non terminal and we write it as say C goes to B2 D and D goes to B3 up to Bk where D is a new non terminal and it is not going to sense the language of the grammar. So, this way we keep on introducing new non terminal and until the length of the origin side of each production becomes equal to 2. That means in the origin side we will have only two non terminal. So, eventually every production will be of the form A goes to Bc eventually because eventually it has a terminal. Since the length of the origin side is always finite it will be of the form A goes to Bc or it will be of the form A goes to A. So, we will get only this kind of productions in the grammar and at that point it will be in Homskin normal form. Just for example, we introduce or construct a grammar in Homskin for a given language. Say the set of G is given like this S goes to A as B epsilon. This generates a language A triple n B triple n n greater than equal to 0. So, this is a language generated by this grammar. So, L of G so it is a set of strings numbers of S flowed by same numbers of B's. Now, since this contains an epsilon productions. So, you can remove this epsilon production by introducing is A as B or as goes to A B. So, you introduce a new production as goes to A B to remove this epsilon production that we have already discussed. Now, this is not in CNF because both the productions as goes to A as B or as goes to A B is not in a form that to be in CNF. So, what we do introduce non-terminal say it should have been small B since we are talking about B only. Now, introduce two non-terminals say A and B for each of the terminals over here. So, A goes to A we introduce in the grammar and B goes to B we introduce in the grammar. That means the new grammar will have the following rules as goes to A as B A B A goes to A and B goes to B. Now, every occurrence of this A and B A and B in the right hand side of the production will be replaced by this non-terminals. So, from this now we get as goes to A as B as goes to A B A goes to A and B goes to B. Now, last three productions are in the form in the form of CNF, but this production is not in the form CNF. So, correspond to this we introduce a new non-terminal say C and we write it like this A as goes to A C then C goes to S B where C is a new non-terminal and we retain all the earlier productions A goes to A B goes to B. Now, we see that in this every production obeys the rule that we have given for productions of CNF and this is equivalent to the original one, but of course this cannot generate the string epsilon other than that every string will be identical or same string can be generated that can be generated in G as well. So, therefore, this is the grammar which is in CNF for the given language A to power n B to power n. Now, let us consider the other normal form which is called Greibach normal form. First let us define Greibach normal form GNF we say that a C L G G is in GNF if all productions in the grammar is at the form A goes to A B 1 B 2 up to B k for some k greater than equal to 0 where A B 1 B 2 up to say B k all these symbols are non-terminals and symbol A is a terminal symbol that means every production should be of this form where the first symbol that appears in the right hand side is a terminal symbol and the remaining all symbols it may be 0 or more symbols there all the symbols should be non-terminals. So, every production should be of this form. So, here you see that k may be 0 since k may be 0 therefore, productions of the form A goes to A is also allowed only thing we do not allow is that the epsilon production similar to your GNF. We will now show that every C F G C F G can be transformed to an equivalent C F G G dash. So, now to do that what we do we first introduce two lemmas that means given any these two lemmas will be helpful in constructing a C F G in GNF for any C F G G. So, given any C F G G which contains some left recursive productions we can always construct an equivalent C F G G dash. So, removing those left recursive productions by right recursive productions for any grammar G if it contains some left recursive productions we can replace those by right recursive productions and that new grammar will be equivalent to the original grammar. We will first prove this we call it say lemma 1 let G equal to n sigma PS B SCF G and let A goes to A alpha 1 A alpha 2 like this we have k numbers of left recursive productions. So, these are the beta set of all left recursive productions this is called left recursive because this terminal non-terminal A appears in the left side. So, which is there in the left side of the right hand side is called left recursive. We have set of all left recursive productions and let of course these are left recursive A productions where A is the left hand side there may be some other left recursive productions where it may be B productions or C productions and like that and let A goes to beta 1 beta 2 up to say beta n B the remaining A productions. So, there may be two kinds of A productions this is the left recursive type and this is the remaining type. So, assume that these are the remaining A productions that we have in the grammar then there is a C F G G dash. So, it is n dash sigma P dash S where n dash is n union B we introduce a new non-terminal B which is not there in the previous set of non-terminals. That means, B does not belong to n and P dash contains all productions in P except the left recursive productions. We will remove all left recursive productions left recursive A productions we remove the left recursive A productions and also contains the following additional productions by removing left recursive A productions we include the following additional productions A goes to beta 1 B beta 2 B up to say beta n B B goes to alpha 1 alpha 2 up to alpha k this is n B goes to alpha 1 B alpha 2 B alpha k B such that L of G dash is equal to L of G that means, G and G dash are equivalent. So, we have removed all left recursive A productions and introduced a new non-terminal B and these new productions have been introduced in the grammar. Now, we can show that these two grammars are equivalent we first show that L of G is a subset of L of G dash. Now, what again you can see is that A goes to A alpha 1 A alpha 2 A alpha k these productions were there in the grammar G which are not there in G dash, but if in the derivation in G if we use at any point this kind of productions which are not there in G dash that can be simulated by a step of by few steps in G dash. So, whenever we use this kind of productions eventually this A goes to A alpha 1 A alpha 2 or in general if A goes to A alpha i is used. So, this A has to be eventually disposed of by using a productions of this form some A goes to beta j. So, therefore, the derivation will be like this say S goes to in G some gamma A delta in 0 more steps. Then in one step in G we replace A by A alpha i 1 suppose A alpha i 1 delta then in another one step in G we replace this A by again say A alpha i 2 gamma A alpha i 2. So, alpha i 1 is there in the previous step delta. So, continuing this eventually every time in one step we are using an A productions we will get it like this gamma some alpha we will go up to alpha i 1 alpha i 2 up to say alpha i p and eventually this A left side A has to be replaced by some beta j. So, we say that gamma some beta j then alpha i p alpha i p minus 1 and so on say alpha i 1 and then delta. So, this is how we need to proceed now this same thing can be derived in G dash as well like this say S goes to gamma A delta. Now, instead of using A goes to A alpha i because this kind of production we do not have in gamma G at G dash. So, we use in G dash. So, in one step in G dash we use A goes to beta j instead we use A goes to beta j. So, gamma beta j delta now in the next step we can use the production that we have already added A goes to we have the kind of production A goes to beta j A goes to beta 1 B beta 1 B beta 2 B and beta j B. So, we will have beta j B. So, in next step it will be we can use the production on the form where B goes to alpha 1 alpha 2 up to alpha k we can use B goes to alpha B goes to say alpha P we have B goes to alpha 1 B alpha 2 B like that. So, we can use gamma beta j this B goes to say alpha P B delta. Then next step again we can use B goes to alpha P minus 1 B and continue this like this beta j. So, we will have alpha or say this alpha i P alpha i P alpha i P minus 1 up to say alpha i 1 and then B delta and then eventually we will have gamma beta j alpha i P alpha i P minus 1 up to alpha i 1 B delta. Then in one step we can replace this B by sorry. So, we go up to say alpha i 2 and one step we can replace it by B goes to alpha 1 to dispose of this B. So, it is gamma beta j alpha i P up to say alpha i 1 and then delta. So, the same external form we have arrived at in this case also only thing is that the length of the derivation in this case is more. Similarly, derivation in G is also a derivation in G dash because there is a subset of G is a subset of G dash. Now, to show that L G dash is a subset of L G we need to follow just the reverse process that we have already given over here. By using the reverse process of this we can show that L G dash is a subset of L G as well. That I can show that G dash and G are equivalent. So, even though we use this transformation to remove the left recursive production and replace it by right recursive production the two grammars will be equivalent. Then we introduce another lemma which is called lemma 2. So, where suppose there is a C of the G with n sigma P s and say A goes to alpha 1 B alpha 2 is in the production and B goes to beta 1 beta 2 up to say beta k with a set of all B productions which are which are during P. Then we can have construct from G the grammar G dash which is n dash sigma P dash and s such that P dash contains all those productions which are during already in P and then all productions the form A goes to alpha 1 beta 1 alpha 2. That means this B is replaced by this beta 1 or A goes to alpha 1 beta 2 alpha 2 this B is replaced by this B 2 and like that will go up to alpha 1 beta k alpha 2. And then we remove this production that means we subtract from this set this particular production A goes to alpha 1 B alpha 2. That means we remove this production and replace this occurrence of B in the middle by all the right hand side of this B productions. So, we get a new grammar which contains this kind of productions we have to show that L of G dash and L of G are equivalent. So, it is quite easy to see that whatever we can simulate in G can also be simulated in G dash it requires only one more step and whatever we can generate in G dash can also be generated in G. So, we left it to the reader to prove it. So, it is quite simple to show what we do now is that we use these two lemurs to transform any grammar which is not in grammar form to grammar form. So, we say that for any C of G and sigma p s with epsilon not in the grammar there is an equivalent C L G G dash say n dash sigma p dash say as dash which is in G n f. That means all productions are the form A goes to small a B 1 B 2 up to B k for some k greater than equal to 0. So, given any grammar G we can always construct an equivalent grammar G dash which is in G n f. So, we will consider instead of giving the proof of this theorem I will explain how can we construct the equivalent grammar in G n f for the grammar with the help of an example. So, in the construction step what we assume is that first we assume that the grammar G is in C n f if it is not in C n f we can always convert it by in the process that we have already given. So, therefore, all productions will be of the form A goes to B C or A goes to A and in the construction process we first rename all variables or all non-terminals. Suppose that there are m numbers of non-terminals in n in this n numbers of non-terminals is m which are say and those can be renamed to have say A 1 A 2 up to say A m. So, there are m numbers of non-terminals these are set of non-terminals. So, this n this will have this non-terminals. So, we have renamed all those non-terminals without changing the language of the grammar. Then in the second step what we will do we process the productions in P such that they satisfy a property called increasing non-terminals non-terminal property and this increasing non-terminal property is defined like this. We say that all productions are increasing non-terminal property if it is of the form say A i goes to A alpha where it starts with a terminal symbol or A i goes to A j alpha where j is greater than i and where alpha is basically string of non-terminals. To enforce the increasing non-terminals property what we do we start with A 1 production of the form A 1 goes to A and A 1 goes to some A i as because all productions of this form according since it is in CNF. Now, here we can apply this lemma 1 and lemma 2 to bring it to CNF. We will illustrate it with the help of an example. For example, say we consider this grammar A goes to B B, B goes to A C A and C goes to A B, B A and A. Suppose, this is a given grammar we have these productions. So, in step 1 we renamed the non-terminals and suppose A is written as A 1 and B is say A 2. So, we will have it is A 2, A 2 then B is A 2, A 2 goes to A 1, A 3 where C is A 3 and is A and A 3 goes to because C is A 3, A 1, A 2 or A 2, A 1 for this one and this A. So, we have renamed and by renaming we have got all these productions from the original productions. Then in step 2 we consider first this A 1 production. We see that this A 1 production is already in increasing non-linear property because there is only one A 1 production is A 1 goes to A 2, A 2 and subscript here A 2 is greater than this subscript over here A 1. Now, we process this A 2 production. So, when you process this A 2 productions to enforce this increasing non-linear property, what you do first apply lemma 2 to A 2 goes to A 1, A 3. When you apply lemma 2 over here we get A 2 goes to A 2 and we replace this A 1 by the right hand side of this A 1 production that is it is A 2, A 2 because we know that already A 1 production is in satisfies increasing non-linear property. So, it will be A 2, A 2, A 3 and the other production A 2 production is A 2 goes to A. Now, we apply lemma 1 because this is a left recursive production A 2 goes to A 2, A 2, A 3. So, when you remove this left recursive then we get here we introduce a new non-terminal say A minus 2 corresponding to non-terminal A 2 and the corresponding productions will have A 2 goes to A A minus 2, A minus 2 goes to A 2, A 3 and A minus 2 goes to A 2, A 3, A 2. So, in this case all these productions satisfied this increasing non-linear property and hence the resulting lemma will be A 1 goes to A 2, A 2 already A 1 production satisfies increasing non-linear property. Now, A 2 productions are also satisfied increasing non-linear property. So, A 2 goes to A, A A minus 2 and A minus 2 goes to A 2, A 3 and A 2, A 3, A minus 2 because here the right hand side the left most non-terminal is A 2 and here also it is A 2, but here it is A minus 2. So, what is going to be here is 2 greater than minus 2 and we have already A 3 productions under form A 1, A 2 and A 2, A 1 and A. Now, this A 3 productions do not satisfy this increasing non-linear property and hence we consider this A 3 productions now. So, that it can satisfy increasing non-linear property. So, we have A 3 goes to A 2 by applying again lemma 2 to A 3 productions and introducing a non-terminal say A minus 3. Now, in this case we need not apply this lemma 1 because there is no any left recursive productions because A 3 goes to A 1, A 2 and A 3 goes to A 2, A 1. So, we need to simply apply lemma 2 instead of lemma 1. So, that it satisfies increasing non-linear property. So, therefore, we can write it as A 3 goes to A 2, A 2, A 2 since we have A 3 goes to A 1, A 2 this A 1 will be replaced by the right hand side of this A 1 production which is A 2, A 2 and hence we get instead of A 1, A 2, A 2 and already we have this A 2 over here and the other production is it is A 2, A 1 and this is A. Now, again we apply lemma 2 because this is A 2 and this is A 3 this 2 is less than 3 it has to be greater than or equal to 3 according to our increasing non-linear property. So, therefore, this again this A 2 can be replaced by the right hand side of this A 2 and eventually we will get A 3 to be A A 2, A 2, A A minus 2, A 2, A 2, A A 1 and A A minus 2, A 1, A minus 1, A. So, this is A 1 in fact. Now, all the productions we will see that we will be in I n P all production satisfy this I n P and the resulting grammar can be written as A 1 goes to A 2, A 2 then A 2 goes to A A minus 2, A A minus 2 then A minus 2 goes to A 2, A 3, A 2, A 3, A minus 2 and A 3 goes to A A 2, A 2, A A minus 2, A 2, A 2, A A 1, A A minus 2, A 1 and A. Now, all A 3 production and A 2 productions are already in G N F because this should have been A 3 goes to because A 3 goes to it starts with a terminal symbol, starts a terminal symbol, starts a terminal symbol, terminal symbol and terminal symbol and remaining string is a string of non-terminals that we introduced already in the grammar and already in the grammar. Similarly, A 2 production also this is already in G N F only means that this A 1 production and A minus 2 productions they are not in G N F because they start with a non-terminal. So, now again we can apply lemma 1, lemma 2 to A minus 2 to convert it to G N F form that means this A 2 can be replaced by the right hand side that means we have the production form A minus 2 goes to A 2, A 3. So, this A 2 can be replaced by the right hand side of this A 2 productions that we have already over here. So, it is A and A A minus 2 that means this can be replaced by A minus 2 goes to A A 3 or A A minus 2 A 3. So, in such a case these two will be in G N F. So, if you repeatedly apply this lemma 2 to this A 1 and A minus 2 production eventually all the productions will be in G N F and we will have the resulting grammar as A 1 goes to A A 2, A A minus 2, A 2, A minus 2 goes to A A 3. A A minus 2, A 3, A A 3, A 2 and A A minus 3, A 3, A minus 2 then A 2 goes to A and A A minus 2 and A 3 goes to A A 2, A 2, A A minus 2, A 2, A A A 1, A A minus 2, A 1 and A. So, this is the resulting grammar by applying lemma 2 to A 1 and A minus 2 productions and we see that all productions are in G N F. So, this is how by applying repeatedly lemma 1 and lemma 2, lemma 1 to remove left recursion and lemma 2 to convert it to G N F. So, that the leftmost symbol of the right hand symbol right hand side is a terminal symbol. So, we can eventually get the grammar to be in G N F form and since at every step we apply either lemma 1 or lemma 2 and when we apply lemma 1 or lemma 2 the resulting language does not sense. Therefore, the resulting language in this equivalent in this grammar which in G N F will not be changed. Therefore, this is an equivalent grammar corresponding to the original grammar. So, that is how we transform any given grammar to G N F form.