 We will see in this lecture, how we can simplify context free grammar. So, let us say we have a context free grammar G set of non terminals, set of terminals, set of productions and the start symbol and we would like to get from G another grammar G dash which will have V dash possibly sigma dash, P dash and S such that this grammar G dash will be a simplified form of G, G dash will be a simplification of G will explain later on what we mean by the kinds of simplifications that we would like to affect. However, both the grammars G dash and G they should produce the same language. So, we will see with the condition the language generated by this new simplified grammar is same as the language generated by the old grammar G. If two grammars are in this relationship that is they have the same languages generating then we call them equivalent G dash or equivalent. The reason we would like to affect this simplifications will be that as we will see that this will help us cast every context free grammar in a certain standard form. We call it normal form and using that normal form it will be easy for us to prove certain properties of context free grammars in languages. Now, what are the kinds of simplifications that we would like to affect? The first kind of simplification that we would like to affect is removal of useless symbols. Remember that symbols in a grammar are the elements of the non-terminal set and the terminal set and we call a symbol useless if it does not take part in derivation of a string terminal string which is in the language. So, let us write it down a symbol is useless if it take part in the derivation of some string in the language. So, let us get this idea clear. So, you have a grammar G that is generating the language L G. So, that language is of course, the set of some set of strings over sigma and now if it so happens that some non-terminal or even some terminal is such that these symbols are not taking part in any derivation whatsoever of a symbol of a string of L G the language generated by G then such a symbol will call useless and if we think about it a symbol may be useless in two ways. Let us say non-terminal symbol can be useless. So, let us say non-terminal symbol let us say a the symbol is a can be useless it does not generate any terminal strings at all. What we mean by this or we can simply say one of the ways a non-terminal symbol a can be useless if there is no w in sigma star such that a derives this terminal string. So, what why such a non-terminal is useless because you see that suppose you start your generation as usual from the start symbol and somewhere down the line you get this string a alpha a and beta alpha and beta strings over v union sigma. Now, eventually to get a string of the language this a needs to be rewritten into some string over the terminal strings now that string could even be empty the point is this a needs to be rewritten ultimately eventually. So, that somewhere here you get some w in sigma star. So, it should mean that there has to be a way of generating some terminal string from this non-terminal a. So, this is one of the ways a non-terminal symbol can be useless can a terminal string be useless in this sense can some symbol a in sigma be useless in this sense of course, not because a itself is a string over sigma. So, that is why we said a non-terminal symbol a can be useless if there is no string w in sigma star such that this is the situation. So, another possible way a non-terminal or a terminal can be useless that if you can never reach such a symbol from the start symbol is what we mean by this that a symbol will say terminal or non-terminal is useless there is no way reaching the symbol from the start symbol is from the start symbol elaborate this a little more what we mean by the second point is that suppose there is no way you can generate some string over non-terminals and terminals such that starting from s with. So, let us write it this way that suppose we never have this situation for any alpha and beta alpha and beta are over V union sigma star then we say in that sense we say that we can never reach a reach a starting from s. So, if this situation happens then again such a symbol is useless now we have the way we have written here a is a non-terminal, but you see the same situation is true the same way even a terminal can be can be useless that if you know if you never have a derivation partial derivation of this kind that starting from s you get to some string where that terminal is a part. So, this way the second point of being useless can happen with both non-terminal as well as terminal now there you know people give names for this such a non-terminal we call it non-generating is called non-generating such symbols symbols which are not reachable they simply called unreachable symbols such symbols complete this sentence suppose we never have s derives or can be rewritten as alpha a beta for any alpha and beta right then a is useless such symbols such useless symbols are called unreachable symbols. Notice as I said that this way of a symbol becoming useless which we call non-generating that makes sense only for non-terminals whereas unreachability that of course, can happen with both terminals and non-terminals and let us now see how we can identify unreachable symbols and then non-generating symbols and then we will simply remove them from the grammar we will see little more on that. So, our let us say here we wish to identify of unreachable symbols what we do is we inductively build a set let me call it script R this will be the set of reachable symbols the symbols which we can reach from the start symbol s. So, let us say we identify to do this we identify R which is the set of reachable symbols and then what will be the unreachable symbols basically take this set out from V union sigma whatever is left all the symbols which are left after taking out the reachable symbols from V union sigma there is these symbols will be obviously unreachable. So, as I said this set of reachable symbols we create or we define R by an inductive process. So, we start you know wherever we define a set or any object through induction we first have a base case and then we say that suppose we have already built some you know up to some point this set R then how to go how to extend right. So, what is the base that is pretty easy isn't it of V union sigma which are the symbols or which is one symbol that you can think which is definitely reachable from s itself or s that symbol is the symbol s because in 0 step one can reach s from s. So, we can see that initially set R to just this absolutely no doubt that this symbol s the start symbol s is of is reachable from s I should not say even definition right it is trivially. So, correct and now the induction process is this imagine I have a production let us say A goes to alpha and B beta and I have already found A to be reachable then surely all these symbols here in particular B also will be reachable right. So, induction step is that suppose A is an element of R the set built so far then and A goes to let me say alpha right is a production is in P then every symbol in alpha is also reachable then every symbol say B in this right hand side of the production in alpha is also reachable. Therefore, what we can do I can update my R with such a symbol if it is already not a member of R. So, we can say that R is set to R union this symbol B of course, again I am saying that this B the way I have written it is a non-terminal, but the same is true even if it is a terminal and in doing this what you really should do that if you look at the right hand side of the production and for every symbol check if it is already there in your set that you have built so far if it is not there add it add to it and this way you go over all the productions then you have a set R right, but then again every time you add you change the set R to the set R changing in the sense since the set script R is actually growing every time it grows you need to again look at all the productions to see which all new members come into R because of an augmentation in the script R in this set right. If I already if I find one more element which is reachable so I need to look at all the productions with whose right hand side is that particular symbol sorry the left hand side is that particular symbol. So, that I can consider all the right hand side if it is a non-terminal if it is a terminal of course, it goes that itself does not add because it the terminal symbol will never occur in the left hand side of a production. So, do you see what is happening I start with the base case and then as I keep growing this set R by putting newer and newer members into R I check if this R keeps growing or not now any time I find that I have added you know some symbol and then even after consideration of all the productions in the grammar the set R is not augmented that means what that means we have reached the final value of R the R cannot grow any larger because only way R can grow of course, when some new member comes in then it is productions whose left hand side is that new member will give may will give rise to some possibly some more new reachable symbol. So, you see this way it is not too difficult to see that this set R keeps growing monotonically because as you are adding more and more symbols and then finally, it has to stop this it cannot grow on indefinitely because after all at most all the symbols of V union sigma all of them are reachable. So, you know at some point perhaps before that when R contains everything of V and sigma your process this process of growing R stops. So, if that is the set of reachable symbols then the unreachable symbols are simply here it will be V union sigma you subtract from this set this set of reachable symbols. So, this is the set of unreachable symbols. So, we know how to identify this set of unreachable symbols and now we will also need to find out how we can identify non-generating non-terminal because that is this is the definition of non-generating non-terminal that is a non-terminal which does not produce does not derive a terminal string. We will see now how to identify non-generating non-terminals. What we are going to do to identify non-generating non-terminals is likewise the previously what we did will first identify the set of generating and clearly the definition of generating non-terminal is that a non-terminal is generating if you can derive a terminal strings for it from starting from that non-terminal. So, we will say is generating if derive some w in sigma starting from a you can reach a string of only terminal that is a generating non-terminal. So, this set let me call it script G the set of set G of this generating non-terminals I am calling it script G and again we define G inductively. What is the base case here clearly suppose I have a production which is of the form a goes to w where w is in sigma star then clearly is this non-terminal is a generating non-terminal. So, the base case in the definition in the inductive definition of script G is that put in G place in G all a such that a goes to w in sigma star is an element of the set of production. So, you start with some subset of v and how do you grow G now again will be able to grow G by looking at productions. So, for example, suppose I have a production of the kind that b goes to alpha where every non-terminal in alpha is already in G and then if b is not in G that then I should add b why because you see it is like this suppose this situation is b goes to let us say a c and I can derive some terminal string w 1 from here I can derive some terminal string w 2 from here this is a terminal. So, what happens that means I can derive from b a w 1 w 2. So, therefore, b will be also generating where every non-terminal alpha is already in alpha is already already in G that means we have already found every non-terminal in alpha to be generating then add b to G if it is not already there if not there already this is there this makes sense. So, again you know anytime I have this set G then I go through I look at productions and try to find a ways of augmenting G using essentially this strategy. So, again the G will keep growing at some point you will find G is growing no further even when you look at all the productions and that is the time that is the final set of generating non-terminals. Then the set of non-generating non-terminals will be simply this the set of non-generating non-terminals is the set v subtracted from this set v I subtract this set G that is the set of non-generating non-terminals. So, I have found simple actually both these algorithms are fairly simple the ways to identify non-generating non-terminals as well as non-reachable symbol which can be of course, either a non-terminal or a terminal after we have identified the set of unreachable and non-generating symbols we should simply remove them from the grammar. Now, by that what we mean removing such a symbol from the grammar means that not only they go out of the corresponding non-terminal set or terminal set also we must get rid of any production where such a symbol is a takes part. So, there will be a problem which we should address now that separately we can do both of these without any problems. So, let us say what I can do is that removal of unreachable symbols is let us say v sigma p s and recall that the set of unreachable symbols this set may contain some non-terminal and some terminals. So, we will get a new grammar G dash by removing from v all the unreachable non-terminals from sigma I remove all the unreachable terminals I get v s. So, therefore, I am writing v dash and sigma dash from p I remove all productions where any of the unreachable symbol occurred. So, basically what we are saying is let me write it down clearly that p dash is p minus the set of all productions of the kind a goes to alpha such that any element unreachable symbol occurs in a goes to alpha it can may be a itself is unreachable. So, in that case of course, we must remove that production from p also it could be that you know some one of the right hand side symbols of this production one of the elements of alpha is unreachable then again there is no point keeping this now keeping this production. So, this is the set that I this is the set of all productions where this set starting from here the set that I representing this outer curly brackets. This is the set of all the productions where some unreachable symbol occurs all those productions I remove from p to get p dash right and s can never go out because s of course, is itself always reachable from itself right. So, this is the grammar g dash which is which will not contain any unreachable symbol and because of removal of these unreachable symbols we have managed to get rid of some productions which will never be which would never be used in deriving a terminal. So, this grammar g dash is therefore, a simplification of the grammar g after removing unreachable symbol in the same manner we can obtain separately starting with a g starting with g which is again let us say some v sigma p s after you identify after identification of non generating non terminals after identification of non generating non terminals we we can obtain a simplified grammar g dash where of course, some non terminals which were non generating they have been removed from v sigma does not change because we are just talking of non generating non terminals p possibly obviously, would change if we have removed some symbols from v that is if we have identified some non terminal to be non generating and we would assume we will assume that s is always generating s that the in other words the grammar g originally was such a such a grammar that l g was non empty the language is non empty. So, therefore, s must be deriving some terminal strings and therefore, s would remain in the simplified grammar also. So, these two simplifications we can do separately and it is easy to see that the whatever we have said is correct in the sense that this g dash is indeed will generate the same language as g and similarly, here also this g dash of the removal of non generating non terminals will also generate the same language as g at the same time both these cases this grammars are simplified. Now, our goal was to remove all useless symbols and we said in the beginning that a symbol can be useless because either it is non generating or it is unreachable. So, we can as I said here we know how to do these things separately removal of unreachable symbols and removal of non generating symbols. Now, how in which order we should do this the point I am say making is that you have given me a grammar g v sigma p s seems that I have a choice that first remove unreachable symbols then remove non generating symbols. So, this is my choice one and choice two is the other way we first remove the non generating symbols and then remove the unreachable symbols. So, is it clear what we are saying that from g after removing unreachable symbols I get a grammar g dash and then I remove in this case in choice one all the non generating symbols and may be I get the new grammar g double dash and in choice two I first remove all the non generating symbols and then remove all the unreachable symbols. I what I would like to point out is that choice one is wrong, but choice two is correct. So, why would choice one be wrong. So, let me show it here very simple you see it may be that. So, let us say that I have this is goes to a b and. So, therefore, both both a and b are reachable and then later on you found that b is non generating. So, then you would remove this production. Now, it might be in the process because of the removal of this production the link from s to a also goes. Although a itself is generating in fact there is a simple example given in some text books. So, let us say s goes to a as well as I have a goes to small a. So, this is the grammar and first when you do choice one what you are going to find that all these symbols a b s of course, as well as small a they are all reachable from s is not it from s or set that script are first it will have s. Then immediately when I look at this I will see s a and b they will also go and then I see any one of these productions to see the symbol small a is also reachable. So, all the symbols here are reachable. So, the grammar really will not simplify if I consider the unreachable symbols removal, but now I see what now I look at non generating symbols. I clearly identify that b to be non generating then b identified as non generating. Therefore, this production will go out and now that is all you can do. You will remove all productions and of course, you will remove the non terminal b also and then you are left with this. But now do you see what has happened. So, this is the simplified grammar that you are getting, but is this grammar for us because now we have introduced because of the removal of b, we have introduced a new non reachable element which is a. So, actually now this becomes this should also go. In fact, the simplified grammar after removing all the useless symbols will be simply will have only one production which S goes to it. So, what has happened. So, you see the point I am making is that if I first remove unreachable symbols and then remove non generating symbols, I may end up as in this case with these two productions, but that grammar which has these two productions is not totally simplified because I will retain a symbol a which is unreachable. On the other hand for the same, if I do choice 2, what is going to happen. S goes to a b again start with this, S goes to a, a goes to a I find. So, here in choice 2 what I am going to do first identify all the unreachable symbols, remove them from the grammar. So, here you will remove this particular production because b is non generating. In choice 2 first you figure out all the non generating symbols, remove them from the grammar and that would mean that I will remove this production and of course, the symbol b itself will be will go away from the set of non terminals and then both of these are generating S and a and now I will try figuring out if there is any non reachable symbol and yes indeed I find that a is unreachable because starting from S, I just get this and that is it S and small a will be the only two symbols which are reachable. So, capital A is not reachable. So, again this goes. So, I have got the right kind of simplification. Can we prove this or at least if not formally can I can I at least justify that choice 2 is right and choice 1 is wrong. Why choice 1 is wrong? Because of the simple thing that it is possible after the removal of unreachable symbols. So, let us make this point clear. Why choice 1 can go wrong? Choice 1 was first remove unreachable and then remove non generating. So, what can happen and in fact, there is an example that we have seen that after you have removed whatever symbols that you found originally to be unreachable, when you started removing non generating symbols, some symbols which were reachable previously became unreachable. So, that was the problem with choice 1. So, let me write this the problem in the second step that is removal of non generating symbols. We introduced some new in the process of removal of non generating symbols, we introduced some new unreachable symbols. So, this is the problem and that there was there was an example we have already seen, but interestingly why choice 2 is correct? Let us understand that at least informally what is choice 2? First remove choice 2 is first remove non generating symbols and then remove unreachable symbols, then remove unreachable symbols. So, can such a thing what happen in case of choice 1 happen with choice 2? That would happen if you have found you have found that some a to be reachable, but in the process of removing some other symbols which are not reachable, you made a to be non generating. Only in that case such a thing is such a choice 2 would also be unsafe. Is it clear the situation is choice 2 will be bad that if a is found reachable, a was generating before that is why you came to second phase when you found a is reachable, but in the process of removing some unreachable symbols in the process of removing some unreachable symbols. Now, a became non generating. Now, a became this is the way this choice 2 can go also wrong. Now, I claim this can never happen why because that that idea is very simple. See a you could reach a fine and a was generating. So, imagine I have a sequence of steps through which I derive W and now there is a possibility of a becoming non generating if I remove b, but now remember in which phase we are in. We are in the phase we are looking at whether some symbol is not reachable and then we are removing them, but if we have found a to be reachable and then we can reach b from a as this shows then surely b will also be reachable is not it reachable it is transitive. If you can reach from s to a and in this case I can as we see we can reach b from a therefore, we can reach s from s to b also. So, this b will not be thrown out because it is not reachable because it is clearly reachable if a is reachable then b is also reachable. So, will never throw out such a b and therefore, it is not possible because of throwing out of some unreachable symbols in the second part I will make something which was already generating to become non generating. So, this situation can never happen. So, therefore, choice 2 that is first identify the non generating symbols remove them from the grammar get the simplified grammar and now identify the unreachable symbols and now I remove those unreachable symbols. In the process you are not going to go wrong and therefore, the grammar that we will get is finally will be will not have any use.