 So, in today's lecture, we will discuss simplification of context for grammars and normal forms, simplification of grammars and normal forms. So, given any C f g, suppose we have a C f g, we would like to find an equivalence C f g g res. So, g and g res are equivalent, but here g res should be simpler than g. The grammar will be simpler than the original one, in the sense that it is less clumsy for example, and easy to understand. Maybe it has less number of symbols or some awkward productions which are there in g are removed from g res, but still they are equivalent. Now, if a grammar is given in a simple form or simpler form, it helps us proving many facts and results about languages. We will consider a simplification of grammars by removing useless symbols, epsilon production and unit production. So, first we will consider useless symbol. We will first define useless symbol, what is mean by useless symbol and then see how can we remove useless symbols from grammar, creating an equivalent grammar. Now, a grammar is designed to generate a language, that is what we know and every non-terminal introduced in the grammar should contribute to the generations of strings in the language. It is meaningless to introduce a non-terminal that do not occur in derivations that generate terminal strings in the language, because it will make the grammar unnecessarily large and clumsy. The same is true for a terminal as well, would like to eliminate all symbols of this kind normally denoted as useless, that means we should not required at all. Elimination of these symbols makes the grammar simple straight forward and we first formally define useless symbols and then give methods to eliminate them. We say that a symbol say x belonging to the set of non-terminals or the set of terminals is useful. Suppose x a symbol belonging to n union sigma, we say that x is useful in the grammar g n sigma p s n is a set of non-terminals sigma is a set of terminals p is a set of productions and s is the start symbol. We say that x is useful if there is some derivation of the form say starting the start symbol in 0 or more steps in the grammar, we get some symbol like this say alpha x beta which in 0 or more steps eventually derives w where w is basically a string of terminals eventually we derive a string of terminals and alpha and beta may be any string of terminal and non-terminals. So, alpha is any string of over terminals non-terminals and w is a string of terminals. So, starting with s eventually we derive a string of terminals and we get a standard form where we have the symbol x that occurs over here. In such a case we say that x is useful and a symbol is useless if it is not useful. So, this is the case if useful if it is not useful and a symbol is useless if it is not useful. So, that is the definition of useless symbols that means a symbol is useless if it is not used in derivation of any string w in the language generated by the grammar. So, that is what we mean by an useless symbol. Now, a terminal symbol is useful if it occurs in a string of the language that is what we know and similarly a non-terminals useful if it occurs in a derivation that begins with the start symbol of the grammar that eventually generates a terminal string. So, that is what we know about a usefulness of a non-terminals symbol that is for a non-terminals suppose a belonging to a belonging to n the set of non-terminals suppose a a symbol that belongs to the set of non-terminals. So, for this non-terminal a to be useful we need to it means to satisfy the following two conditions. So, starting with s in 0 or more steps under a grammar g we should have this standard form say alpha a beta where alpha beta belongs to n union sigma star it may be string of terminals and non-terminals and union sigma star. In such a case we say that the symbol a is reachable a is reachable that means starting the start symbol we can eventually reach the symbol a is reachable and the second point is that eventually starting with a in 0 or more steps in the grammar g we should be able to derive a string w for some w belonging to sigma star that means a string of terminals. In such a case we say that the symbol a is generating or live we use the term generating or live to indicate such kind of symbols and we say that a is reachable if starting with s we can get this kind of standard form, but the converse does not hold that means even if both the conditions satisfied in such a case we cannot say that a is useful a may be useless. Now, an example will be say given a grammar g like this say s goes to a b and a goes to say small a where small is not terminal symbol and a b are non-terminals a b and s are non-terminals. So, in this case even both the conditions are satisfied because starting with s we can get s derives a b which are first condition satisfied and the second condition also a derives eventually a using the second production, but in this case a is the non-terminal a is not useful because eventually we cannot derive the string a starting with a start symbol s. This is because of the second non-terminal that we have b along with a in this production so because of this non-terminal presents with non-terminal creates a problem. So, what we do in our procedure for elimination is useful symbol we start first we for we start with applying the condition to we first apply condition to that means we will first eliminate all non-terminals symbols that are not generating and non-terminals that are not generating and then we apply condition one to remove symbols condition one to remove symbols that are not reachable. So, given a context of grammar g we first use an algorithm to remove all non-generating symbols from the grammar g and then construct an equivalent grammar g dash containing only generating symbols. In the second step we eliminate all non-terminals from g dash which are not reachable from the start symbol of the grammar and then construct an equivalence here g say g double less that contains no useless symbols. So, first let us construct or let us discuss a procedure to find an equivalent grammar g dash from g which do not contain any non-generating symbol. So, our idea is that suppose a goes to w where w belongs to sigma star that means from a in one step we can derive a string of terminals. So, therefore, a must be generating. So, we create a set say gen where we keep all non-terminals a such that a goes to w belongs to the set of productions of the grammar which say w belong to sigma star. So, if w belongs to sigma star and a goes to w is in grammar then we say that a is generating. So, we keep all symbols in the set of in this gen set generating and then suppose we have already included some symbols like this in this gen set and then suppose we have some productions like this a goes to alpha that belongs to p and already the symbols of alpha non-terminals that is there in alpha is already included in the gen set. In such a case we also include this a in the gen set we continue this process until no more symbols can be added. So, therefore, we can write it in the formula algorithm. So, first we include in gen set all those symbol a such that a goes to w belongs to p with the condition that w belongs to sigma star is a string of terminals and then we repeat this process we first say create a set say all where we put this gen set then for all a belonging to a set of non-terminals we do the following. If a goes to alpha in p the set of productions of in the set of productions and alpha is already here in this set old or it may be alpha is set basically alpha is a string of may be string of terminals non-terminals. So, alpha may be may belong to old union sigma star it may be any string over the symbols that are already here in this set that means we say already included in the gen set or it may be string of terminals. Then we include in gen this new symbol gen union a we keep on adding like this until until this old set and new set gen set are identical. So, in such case what we can conclude is that no more symbols can be added. So, it is easy to see that this algorithm will terminate because there are finitely many productions in the grammar. Upon termination gen contains only those and all those symbols or all those non-terminals only those non-terminals and only those non-terminals which are generating. We can prove it easily by applying interaction suppose a is a generating non-terminal and a eventually derives w under the grammar g for some w belonging to sigma star that means it generates a string of terminals. Now, we can apply interaction on the numbers of steps needed to generate w to show that this gen contains only those symbols those symbols and all those symbols which are generating. Now, the basis case is that if a derives w in one step under grammar g then a goes to w must belong to p otherwise it cannot derive the string w in one step. Therefore, a is included in gen hence a is included in gen set that we have already said in the first step. Now, induction step what we assume is that let a derives w in n plus one step under g then we can write say a derives alpha in say one step under g and in say n steps eventually it derives w. Now, since alpha derives w in n step under g we can apply interaction hypothesis to calculate that all the non-terminals in g are already included in gen. So, these are induction hypothesis again since a derives in one step the string alpha. Therefore, a goes to alpha must belong to p otherwise in one step it cannot derive. Therefore, according to our rule of the algorithm a will be included in the second step over algorithm. So, it is clear that no non generating non-terminal will be included in gen and thus the algorithm generates the desired set that means the set of all non-terminals which are generating and only those sets which are generating. Non-terminals they are not in gen set that we have constructed are useless since they cannot contribute to the generation of strings in L of g. So, this observation leads us to construct a C f g g dash which is equivalent to g and eliminates all variables of g that do not derive any string of terminals. Now, we can say that in the form of a theorem say let g equal to n sigma p s be a C f g. So, derives an algorithm to construct a C f g g dash which is say n dash sigma p dash and s dash such that number 1 L of g dash equal to L of g and number 2 for all a non-terminal belonging to set of non-terminals a is generating that is every non-terminal in g dash derived set another string in g dash. So, you can easily now prove it let n dash equal to the gen set that we have already constructed from g and p dash is obtained by deleting all rules in p. So, from p will delete all those rules which are not generating that means containing non-terminals that do not derive terminal string that is p dash basically contains all those productions like this a goes to alpha such that a goes to alpha belongs to p and a belongs to gen set and alpha is basically gen union sigma star also sigma dash equal to all those terminal symbols a belong to sigma such that a occurs in the right hand side of a rule in p in p dash that means here we have sigma dash of course. Now, since p dash is a proper subset of p because from p we have eliminated some rules not containing which contains the non-terminals symbols since p dash is a proper subset of p therefore, every derivation of g every derivation in g dash is also a derivation in g. Therefore, it is clear that L of g dash is a subset of L of g. Now, we want to show the other side that means L of g is a subset of L of g dash. So, that will prove that the resulting grammar g dash is equivalent to g to show this we assume that say let S derives a string W in grammar g that means W is a string of L g and starting with S we can derive the string W in grammar g then what you can say if that is the case then we want to show that then S derives W in grammar g dash as well then only we can say that L g is a subset of L of g dash. If not suppose this is not the case then a non-terminal which is non-generating must occur in an intermediate step in a derivation that is what the case may be advised not possible, but a derivation from a standard form containing a non-generating symbol cannot generate a terminal string. Hence, all the rules used in S derives W so all the rules which are used in the corresponding derivation as derives W under g must also use in p dash. So, therefore, since we are using all the rules which are there in this and those rules must be in p dash therefore, S derives W under g dash also in 0 or more steps. So, therefore, L g is a subset of L of g dash and because of both this we can say that g is equivalent to g dash that means we can now construct from g an equivalent grammar g dash which do not contain any non-generating symbols. Now, we will construct the set of all variables or all non-terminals which are reachable and then we will see how to construct an equivalent grammar containing only reachable non-terminals. Suppose given a grammar g by constructing g dash as we have already described equivalent grammar g dash from g we have removed all non-generating symbols. So, we must now remove all variables that are not reachable that is what our aim. Now, we use the following process so we will start with the first reachable non-terminal which is basically the start symbol. The start symbol is the first non-terminal symbol which is reachable and then starting with this if there is a production like say S goes to some alpha where alpha contains some non-terminals then that non-terminal will also be reachable. So, therefore, initially we construct a set which is called reach which contains only the start symbol S and then for every production the forms S goes to alpha all the non-terminals which is there in alpha will be included in the set reach. We keep on continuing this process repeating this process until no more symbols can be added. So, therefore, we can write it in the form an algorithm given the grammar g first we will initialize reach to be the only symbol which is a start symbol of the grammar. Then we will assign this whole set to be phi and this is used for checking the termination criteria of this algorithm then we repeat this process. We create a new set which is nothing but all those symbol in reach but which is not there in old. So, reach minus old and old is initialized to be now the reach set. Now, for all non-terminal A which belong to N we do the repeated process for all productions of the form A goes to alpha which is in P do add all non-terminals in alpha to reach. So, whatever non-terminal we have whatever non-terminal we have in alpha must be added to reach because A is already there in a reach. We continue this process until no more new non-terminal can be added that means until reach is equal to old. So, by this algorithm we keep on adding a new non-terminal which is reachable from the start symbol as now prove that reach will contain only contain only those variables which are reachable from the start symbol of the grammar g. We first show that all reachable variables are added to reach. Now, if as derived suppose alpha A beta in suppose N numbers of steps under grammar g then age reachable that is quite clear because we can reach A from the start symbol as using this derivation. Now, age reachable and is added to reach on or before iteration N of the algorithm does not before N at iteration of the algorithm on the order procedure will add this symbol non-terminal A which is in the rise and side that will be added to the set reach eventually that is what we want to show. That means all reachable non-terminals will be put in the reach eventually. The basis case is that the start symbol S is the only start symbol reachable by derivation of one step and it is added to reach in the step one of the algorithm because S is already included in the first step by default. So, it is a basis step then we use this induction hypothesis. So, it is non-terminal reachable by a derivation of N steps or less is added to reach on or before iteration N. So, every non-terminal reachable by derivation of N step or less is added to reach before iteration N. So, this is an induction hypothesis. So, it is non-terminal reachable by a derivation of N step or less is added to reach set on or before iteration N of the algorithm. So, that is what our induction hypothesis. So, in the induction step you show that say assume S derives in N step alpha A beta and in one step it derives under a gene say alpha gamma B delta beta. Suppose this is a derivation and length of this derivation is clearly N plus 1 because the first step this requires N steps and in another one step we get this particular send of the form. So, the length of this derivation is exactly N plus 1 steps. Now, in the first step you see that in the first step we apply in the last step basically we apply the derivation A goes to gamma B delta because this A is replaced by the string means A goes to gamma A delta must be a production in the grammar that is why in one step we have applied this and hence we have got this send of the form. So, therefore, in the last step we have applied this to A goes to gamma B delta. So, by inductive hypothesis A has been included in reach because this takes N steps. So, according to this induction hypothesis so, since it takes N steps A is already there in the reach set. Now, variable B will be added in the next iteration according to our rule. So, therefore, every variable which reachable will be included in the set reach. Now, we show that all variables included in reach are reachable. That means, we do not add any variable which are not reachable in the reachable step by this algorithm. So, the basis case is that first in the first step we include S and S is always reachable this is a start simple grammar. In induction step we show that suppose a variable B is added in the N plus first iteration. So, B is added in the N plus first iteration. Then according to the algorithm A goes to alpha B beta must belong to P. Generally, you can at this much of this kind of production must be there we applied in one step. And in this step since A is already in the reach we include B in the reachable set. Now, so A goes to alpha B beta must be a production such that A already belongs to the set. That is what we say after N iteration. Now, by induction hypothesis as must derives there must be a derivation like this in 0 or more step under grammar G alpha A delta. So, there must be a derivation like this such that S derives gamma A delta because A is already there in reach. So, therefore, in 0 or more steps S must derive this challenge of form gamma A delta because A is already there in reach. And in the last step we have added B. Hence the derivation S goes to gamma A delta. Then in one step we have included gamma. So, A can be replaced as alpha B beta by this production and then delta. So, A is replaced by this. So, in the grammar we have a derivation like this because A is already included we have this derivation. In the last step we said that we applied this production. So, therefore, we have a derivation like this. So, according to this derivation we know that B is reachable and hence since B is added we know that every such B must be reachable. So, therefore, whatever we add in the reach set according to this algorithm must be reachable. Therefore, it contains the set of reachable variables and all those variables which are there in it is reachable are reachable. Once we have this algorithm so what we can say that let g equal to n sigma p s be a c f g. So, there must be an algorithm to construct a c f g g double less such that l of g double less and l of g are equivalent. That means g and g raise are equivalent and g double less has no reachable. So, what we do from g we first construct an equivalent c f g g raise say n raise sigma raise p raise and s containing no non generating symbols by using the previous theorem already we have given how to construct it. Now using the algorithm just we have described or using the previous algorithm just we have described we construct the set reach and this reach for g raise for g raise we construct the set reach. Now, we construct the c f g g double less which is say n double less sigma double less p double less and s according to the following rules. So, what we do n double less in the set of terminals in g raise will be basically the all those variables or terminals non terminals which is there which are included in the reach set because which are basically reachable. We consider all those symbols or non terminal symbol which are reachable and that they will be there in the set of non terminals and p double less will contain all those productions are from a goes to alpha that belongs to p less. This means which belongs to this grammar g raise such that a is a reachable non terminal a belongs to reach and this alpha is a string of symbols which is there in reach and sigma. So, it is reach union sigma star. So, alpha is a string of symbols which are there in sigma sorry sigma dash and reach and sigma double less is basically all those symbols a belong to p less belong to sigma such that a occurs in the right hand side of a rule in p double less in a rule of p double less. If that symbol any symbol does not appear in the right hand side of any rule in p double less that we have constructed then that symbol is usually less. So, we can now establish that l of g. So, we will use this three rules to construct a grammar g double less and so that l g double less is equivalent to l of g less and hence this implies that this will imply that l of g less is eventually equivalent to l of g which is the origin grammar. Now, since this p double less is subset of proper subset basically is subset of p less. So, every derivation in every derivation in g double less is also a derivation of also derivation in p less sorry in derivation of say g less. So, whatever we derived under grammar g less can also be derived under grammar g less because p double less is subset of p less and therefore, this implies that l of g double less is a subset of l of g less. Now, we need to show the converse that means every variable in this derivation is reachable and hence it is in reach and in and less and double less and it will be in and double less and each rule applied will be in p double less. So, therefore, s derives w under grammar g less as well. Suppose, if s derives w under grammar g less double less we said that s derives the same string w under grammar g less as well. So, therefore, similarly we can show that l of g less is a subset of is similar to the previous one that we have already discussed. So, l of g double less. So, therefore, l of g double less is equivalent to l of g less they are identical and hence since we have already shown that l of g less and l of g are same. Therefore, l of g double less and l of g are also same. So, this is how we can get rid of all the e less symbols by using the two algorithms that I have described. First to remove or eliminate the non-generic symbols and next to construct a set of reachable non terminals and then eliminate all the productions not containing the non-generic symbols and not reachable symbols. So, we have seen how to remove user symbols by using this two algorithms. Next we will see how to get rid of epsilon production and unit production. First let us define what is epsilon production? We know that a production of the form a goes to epsilon is said to be an epsilon production where this a is a non terminal. Similarly, a production of the type a goes to b where both a and b are non terminals is said to be an unit production unit production and this is epsilon production. Now, we would like to simplify a grammar by removing these productions because sometimes it may be difficult to determine whether applying a production of these types in a derivation makes any progress toward deriving a terminal string. For example, if we use say these productions production a goes to b first and say b goes to c and then b goes to and then say c goes to a they are all unit productions and we suppose apply this in a sequence. In such a case we may even enter in a loop a goes to b, b goes to c, c goes to a and again a goes to b, b goes to c, c goes to a. So, it may so happen that we enter in a loop. Similarly, we can generate a long string deriving starting with some non terminal and we can keep on deriving a long string and then make it empty by applying some say b goes to epsilon. So, all these will be empty on the other hand a derivation of a c f g g without epsilon production. So, there is no epsilon production and there is no suppose unit production. Then we can be very sure that there will be a demonstrable progress at every step in the sense that either a terminal symbol will appear in the right hand side or in the central form or the central form will get strictly longer. So, in that sense it is always better to eliminate unit production and epsilon production. Of course, if we get rid of epsilon production the grammar can now cannot now cannot now cannot generate any epsilon the string epsilon in the language. Now, we will consider this theorem say for any c f g g which is n sigma p s there is a c f g g dash with no epsilon production such that l of g dash is equivalent to l of g, but the string epsilon is not there. So, l of g minus the set containing epsilon. So, l g can g can g dash can generate all those strings that is there in g except for epsilon and all those strings which will be there in l of g will also be in l of g. So, let us prove this. So, we will construct given a grammar g from grammar g we will construct a grammar say g hat which is say n sigma p hat s that means we modify the set of productions of the original grammar g. So, p s is basically constructed by using a few rules. So, first is that if a goes to alpha b beta and b goes to epsilon r in p then we include a goes to alpha b alpha beta in p hat. That means since b goes to epsilon we can remove b because b can be replaced by epsilon and we in right hand side we have only alpha beta. So, therefore a goes to alpha beta will be included in p hat. So, there is a new production that we have included in p hat and then it is easy to see that if p is finite then so is p hat. So, p hat is always finite provided p is finite. Now clearly you can show that l of g is a subset of l of g hat because l of g hat contains some more productions besides the productions that are there in l of g. So, it is quite clear that l of g is subset of l of g hat since p is a subset of p hat. Therefore, every production or every derivation in g must be derivation of g dash as well. Again l of g hat can be shown to be a subset of l of g that means whatever you can derive in grammar g, g hat must be there in g as well because every new production added to p hat because the above rule that we have already constructed can be simulated in two steps by the two productions that cause it to be added in p hat. That is say alpha 1 a alpha 2 from this central form suppose we have derived in g hat in one step say alpha 1 alpha beta alpha 2. Suppose this is the derivation in g hat in one step then in grammar g what you can do we started alpha 1 a alpha 2 then in one step in grammar g this a can be replaced by alpha b beta then we have alpha 2 as well. So, a goes to because a goes to alpha b beta and b goes to epsilon both are there in p hat that is why we have included a goes to alpha b in p hat. So, therefore, in one step we will use this one and then in the next step in grammar g we can use alpha 1 b goes to epsilon. So, alpha 1 alpha beta alpha 2. So, whatever we derive in g hat can also be derived in g, but here the length of derivation will be one more one step more that is what we have done. So, therefore, l of g is equal to l of g hat. So, it is quite clear now what you can show is that for any string w belong to sigma star which of course it is not equal to epsilon this w is not equal to epsilon any derivation as starting with s under grammar g hat of minimal length does not require any epsilon production. So, in g hat we can if we can derive w starting with s and if that derivation is of minimal length then we can show that this minimal length derivation of w in g hat does not require any epsilon production. So, therefore, we can always throughout all those epsilon production without changing the language of the grammar. So, therefore, we do not require any epsilon production. Suppose for contradiction we assume that there is a minimal length derivation as goes to w in g hat that uses an epsilon production to derive w. Suppose there is a minimal length derivation which uses epsilon production say that epsilon production is of the form say b goes to epsilon at any point of derivation. That means we have derivation like this as goes to under g hat alpha 1 b alpha 2 then in one step under g hat it uses b goes to epsilon. So, it is alpha 1 alpha 2 and eventually suppose in 0 or more steps under g hat it gives us w. Suppose this is the scenario that you have considered for contradiction it uses in this step the production b goes to epsilon and I have said that this is the minimal length. Now, we will arrive at a contradiction now since w is not equal to epsilon. So, both alpha 1 and alpha 2 cannot be epsilon because alpha 1 and alpha 2 both can be epsilon since w is not epsilon that we have already said w is not epsilon. Now, since b is since both alpha 1 and alpha 2 cannot be epsilon b must appear earlier in the derivation when a production of the form say a goes to alpha b beta was applied. That is the above production derivation as derives in some m steps suppose say gamma a delta then in one step under g hat gamma a goes to alpha b beta was applied and then in suppose n numbers of steps under g hat eventually it goes to it has taken the form say alpha 1 b alpha 2 and then in one step under g hat we have got alpha 1 alpha 2 applying rule beta goes to b goes to epsilon and then in suppose k steps under g hat it has derived w. So, clearly this derivation has a length of m m plus n plus k plus 2. So, so many steps are required this is a length of the derivation because m steps over here n steps k steps and 1 plus 1 2 steps for some m and k greater than or equal to 0 is a case. Now, according the construction of g hat since we have construct g hat a goes to alpha beta belongs to p hat because a applied we know that a goes to alpha b beta is in the grammar g hat and b goes to epsilon is also there in the grammar. Therefore, a goes to alpha beta must also be there in the production or p hat. So, therefore, what we can do is that starting with s suppose in m steps under g hat we have got alpha a delta. So, therefore, at this point since a goes to alpha beta is a production here we can apply in one step that production and we will get alpha gamma alpha beta delta under g hat and then using this n n steps will arrive at alpha 1 alpha 2 and then using those k steps eventually we will get the string w, but this has clearly this derivation has clearly m plus n plus k plus 1 steps. So, we have seen that this is a minimal length derivation not the previous one. So, therefore, this is a contradiction that means we do not actually need the production of the form a goes to epsilon which are there in the grammar g hat. So, therefore, you can throw out all those productions and eventually you will get we will get a new grammar which do not contain any epsilon production and which can generate all the strings that was there in the earlier language except for the epsilon production.