 So, for simplifying context free grammars, we carry out three processes. One is removal of useless symbols, the second is removal of epsilon productions and today we will look at elimination of unit productions. The first two processes we have already covered in the earlier lectures. The unit productions are productions of the kind A goes to B. In other words the right hand side consists of exactly one non-terminal. So, production of the kind where the left hand side as usual is a non-terminal and the right hand side consists of just one non-terminal such things are called unit productions. So, these are productions of this type are unit productions and we would like to eliminate all such productions from the grammar without affecting the language generated by the grammar. In other words as before we are given one context free grammar G 1. From here we would like to get another context free grammar G 2 such that the two grammars generate the same language and G 2 does not have unit productions. Recall that such grammars pairs of grammars are equivalent grammars because they generate the same language. And the way we eliminate all unit productions is very similar to the way we eliminate epsilon productions. Remember epsilon productions were productions where the right hand side contained just the empty string epsilon. Now what we do is to remove all unit productions step one is to identify all pairs of non-terminals A B such that all pairs of non-terminals A B such that A derives B. Recall this means in number of steps one or more steps A from A you can go to B. Actually this we have used for 0 or more steps that is if you can go in 0 or more steps from A to B then we say that A derives B. Now how do we do this how do we find out all such pairs? Of course, trivially every such pair will satisfy this that of course you can go from A to A using no production at all. So, using 0 production you can go from A to A but this is the trivial case. More interesting case is when you can go from A to B in exactly one step that is exactly in one step of derivation you can go from A to B. And when is that possible? Clearly that will be possible when you have a production of this kind. So, what we will do is as we did for epsilon productions we will start with a base and from there we can generate the rest of pairs such that A derives B. So, in the base set will consist of A B where A goes to B is a production and you can now see inductively what we are going to do that if we have already defined let us say A goes to B and let us say B 1, B goes to C 1, C 1 goes to C 2 and so on. And let us say C n goes to C n plus 1 then clearly from all this you can see that A also will derives this non-terminal C n plus 1. So, therefore, A C n plus 1 is going to be such a pair actually we can do it in a more straight forward manner which is a little more intuitively clear. But even let me complete what I was saying here that we start with the base case. So, base consists of identifying all pairs A B such that A goes to B is a production. So, let me write it down base consists of all pairs A B satisfying above and induction is going to be that suppose I have created a number of such pairs you know starting with the base following whatever we are suggesting we have already have a number of pairs satisfying this. And then if I find that in my set A B is already included and B goes to C is a production is in P then A go A C pair is added to the set being inductively constructed A C is already not there. Now this way in other words I have created some amount of these pairs and then I look at all unit productions again and see that if I have any situation where B is the right hand side of something that I already have in that case I will put the left hand side pair with C this is the new pair that I am adding provided it is already not there. So, this way we carry out carry on till we find that the set that we are constructing cannot be made any larger this is the usual inductive construction that we did for eliminating epsilon productions. I just remarked that we can see this process of identifying all pairs such that A derives B is the case in the grammar a little more simply and clearly and that is if you think of a graph consider a die graph directed graph where non terminals are the vertices. So, consider a directed graph we are defining a directed graph in which first of all I define the vertices of the graph are the non terminals in the grammar and then we will put an edge and edge A to B is there if A goes to B is one of the productions in the grammar. So, it is very simple that you start with the set of all non terminals as vertices of a die graph and at this edge and this is a directed edge A to B if A goes to B is a production and now in that graph if you find that there is a path from C 1 to C 2 in the graph. So, a path from C 1 to C 2 in the graph will be there if and only if C 1 derives C 2 this is not difficult to see why this is happening. So, start with a die graph whose vertices are the non terminals and the edges are all those unit productions in the graph and then to check whether A derives B or not all you need to see if there is a path. Now, remember in this case the path is going to be a sequence of directed edges. So, if there is a path from A to B in the graph that means A derives B and that is fairly easy to see why that is happening. Now, this is how first of all we identify all pairs of non terminals such that A derives B. So, after carrying out the step 1 we carry out step 2 and in step 2 first of all we remove all unit productions and in the second part of step 2 we add some productions. So, that the removal of this unit productions will not affect the language which is being generated. So, say add new productions and which are these new productions the idea let me explain first. See suppose in the old grammar you had A derives A goes to B is a production. So, it is possible in a tree in a derivation you replaced A with B and then applied one production which you know had non which is not a unit production. So, let us say this one was D 1 D 2. So, in effect what is happening is A is deriving D 1 D 2 D 3. So, now if this unit production is not there this unit production A goes to B is not there then what I want is that from A I should be able to directly derive or I should have a production. So, that from A D 1 D 2 D 3 can be obtained if one can replace A by D 1 D 2 D 3. So, this is the basic idea. Therefore, the language which is the generated the language which is being generated will not be affected. So, let us see what we how do we say this we say that suppose A derives B in the old grammar G. Now, you should say the old grammar G and the new grammar is the G 1 that we are the new grammar is G 1 which is what we are defining. So, if in the old grammar from A you could derive B then and let us say that and B goes to alpha is a non unit production. We add in G 1 the grammar getting constructed the production new production A goes to now in the old example of ours this was A this was B. So, in the old grammar G you could have derived from A this D 1 D 2 D 3. Now, what is happening we said. So, if you look at what we have said. So, here alpha is D 1 D 2 D 3 this is alpha and B derives alpha and A goes to B is a production. So, you are seeing add the production A D 1 D 2 D 3. So, then you would be able to generate the same string which the derivation of that tree was carrying out. It is not difficult to see why this would ensure that the new grammar G 1 will generate the same language at the same time the new grammar G 1 will not have any unit productions because we have already removed them. Just one point here see recall that B goes to alpha we said is not a unit production is this a unit production. It is not a unit production because remember although unit in the sense one although the right hand side consists of only one symbol, but because this is a terminal it is not a unit production. If in case of a unit production it has to be that B goes to something which is a non terminal one non terminal. So, this takes care of the process of elimination of unit productions from a grammar G to obtain a new grammar G 1 and the fact that the two grammars are equivalent. That means they generate the same language that can also be proved and that proof is very similar in all these proofs are by induction and that proof is very similar to the proof that we had given when we constructed a new grammar from an old grammar having removed all epsilon productions. Of course, there we had to say that the new grammar will generate all strings as the old grammar did except epsilon the empty string. So, now you see that we have learnt three processes or three procedures for simplifying a context free grammar and these are removal of useless symbols then removal of epsilon productions, removal of let me give a title to this that it is going to be the procedures for simplifying a grammar. These are the three procedures that we have learnt for simplifying a grammar and now a question arises in which order I should carry these out. See the re the ordering is important because what should not happen is that the effect of carrying out one procedure is destroyed by a subsequent procedure. In other words we should have that order in which whatever I did earlier that effect is not going to be lost by any subsequent procedure and again the safe order is and I am saying again the safe ordering because if you recall for removal of useless symbols again we had two separate sub procedures and there again there was the question of which of these two should be carried out first and there we said that the safe ordering depends on that principle that the ordering should be such that the effect of the previous procedure should not be destroyed by the subsequent procedure. So, here too the safe ordering because of that principle is going to be first epsilon production removal. So, first is epsilon production removal then unit production removal and then finally useless symbols removal. Once we have a simplified form of a grammar then what we can do is to turn the grammar into something canonical and by that what I mean is the grammar will have a form which is syntactically of the same kind and we will explain that and these such grammars are called normal form grammars and the one that we will study is called Chomsky normal form grammar and there we will start with a grammar where all these procedures have been carried out. This is one kind of normal form grammars for context free grammars and let me define what Chomsky normal form grammars are definition of context free grammar g is said to be in Chomsky normal form every production g is one of the following two types. So, type one form is a non-terminal followed by a terminal and type two productions are a goes to b c. So, in other words type two productions as usual of course, every production has a non-terminal at the left hand side the right hand side consists of exactly two non-terminals. Now, although I am writing a goes to b c one of b or both of b and c one of b and b and c or both of b and c could be the non-terminal a itself. So, b and c could be any non-terminals including a itself. So, these are the only two type of productions which Chomsky normal form grammar will have and the main point is is the following result that every grammar g such that l g does not have epsilon. So, any grammar any context free grammar g which generates a language without epsilon such a grammar g can be converted into a Chomsky normal form grammar. I should add that some people would define a grammar to be in Chomsky normal form provided it satisfies one more condition further g does not have useless symbols. The result here applies also for this extended definition of Chomsky normal form grammar. So, what we would like to show is that any grammar g which produces a language which does not have epsilon can be converted into a Chomsky normal form grammar and one more condition that we should have and that is you will see that is a kind of technicality that the language even after without of course it does not have epsilon, but also it should be non-empty. In other words l g does not have epsilon and l g is so let me use this also as a l g is non-empty. So, what is the problem if l g is non-empty or if l g is empty because then if you see that the grammar g does not have a useless symbol when you have a problem is not it because for s you know any grammar as has to have the start symbol and if the grammar produces no string at all in that case s itself is useless because s does not produce any or does not generate any terminal string. So, that is a technicality and therefore, we are putting that also as a condition. So, therefore, what we would like to show is every grammar g such that l g does not have epsilon and l g is non-empty such grammars can be converted into a chomsky normal form grammar. So, now because we have learnt the various procedures we had outlined for simplifying context free grammars I can right away take g to be in that simplified form. In other words let us assume that g has no useless symbols no unit or epsilon productions. So, I can right away assume my g is to satisfy this condition because original g that you gave to me if that did not satisfy this conditions I can turn it into a new grammar g satisfying these conditions. So, let me assume that g has no useless symbols no unit or epsilon productions. Therefore, you see either I have a production of this kind. So, either the right hand side of a production consists of one non-terminal or to alpha where alpha has two or more symbols because you see we do not have unit or epsilon production. So, these kinds of productions we could have a goes to a single non-terminal or the other kind of production this g will have the right hand side will have two or more symbols by that I mean terminal as well as non-terminal. Now, these kinds of productions are no problem at all because we allow such productions in chomsky normal form. Again when alpha this with two or more symbols if it is of the kind a goes to b c again that is not a problem right because this forms are again allowed by our normal form. So, the challenge in converting the grammar g into chomsky normal form is to make sure that all productions a goes to alpha where alpha has two or more symbols they should look like or you know we should do something to those productions such that ultimately will have only these kinds of productions in addition to of course, these kinds of productions. So, then the grammar will be in chomsky normal form. So, the step one in this conversion is we ensure that right hand side of every production consists only of non-terminals. So, you know what we mean is that suppose we have a production of the kind b a c. So, these two are of course non-terminals, but here is a symbol which is non-terminal which is a terminal. So, I would like to ensure that the right hand side of every production consists only of non-terminals. So, like consider this example. So, how we can do so just for this production let us see at that we can do we can ensure this condition by introducing some extra new non-terminals. So, let us say a 1 is a new non-terminals by that what I mean is a 1 is a symbol for a non-terminal by that what I mean is a 1 is a symbol for a non-terminal which has not been used in the grammar. So, far and now what I will do is that instead of this production I will write a goes to b a 1 c and I will add the production a 1 goes to a. So, you see in effect therefore, what I will have is that a goes to this right hand string. So, now so basically what we are doing is whenever I find a right hand side where a terminal symbol is there what I do is I introduce a new non-terminal and do this instead of the non-terminal instead of that terminal I write in its place the new non-terminal that I have introduced and I do that for all right hand side because you know this kind they can be some other right hand sides also where the terminal a occurs. So, for all that I will introduce this new non-terminal symbol and then the addition of this production will ensure that the semantics will not change in effect I can have the have a generating this string all right. So, this is step one now step two is the step in which I will ensure that the right hand side consists precisely of two non-terminals. So, how do we do that I should modify this what I have written here of course, we do not mind productions of this form a goes to a single terminal. So, I should write here ensure that the right hand side of every production consists only of non-terminals unless the right hand side consists of a single terminal for the I am saying it for the sake of completeness and being totally correct because right in the beginning I said we have no problems with productions of this kind and we leave them as such. Then whatever the productions that we are left with of course, they will have right hand side consisting of not a single non-terminal, but since I am making a statement here I write it in this form and now the step two is making sure that every right hand side has exactly two non-terminals of again for the sake of correctness I should modify this sentence that of all productions with R H is not consisting of a single terminal. So, after step one leave out productions of this kind, then what are the productions that we have either they are of this form that is either the right hand side is of size two which is all right because Chomsky normal form will allow such productions to be there or the right hand side has more than two non-terminals. So, let us see C 1, C 2, C k and the challenge is to eliminate such things where k is greater than 2 and to have equivalently produced productions which are of this kind where the right hand side consists exactly of two non-terminals. The idea again is fairly simple and you can do so by again introducing new non-terminals and let me illustrate by taking let us say an example of where k is let us say 4. So, let us say I had A goes to C 1, C 2, C 3, C 4, C 3, C 4, C 3, C 4, C 3, C 3, C 3, this I would like to eliminate in its place what I will have is I will write A goes to D 1, 4, C 4 and this D 1 should generate C 1, C 2, C 3. So, I would write D 1 goes to D 2, C 3 and D 2 goes to C 1, C 2 of course you could have done alternatively A goes to C 1, D 1, D 1 goes to C 2, C 3 and D 2 goes to C 1, C 2 of course you could have done alternatively A goes to C 1, D 1, D 1 goes to C 2, D 2 and D 2 goes to C 3, C 4 goes to C 3, C 4 either way you could do this. Now, it is fairly simple that this idea extends to any k where k is you know 3, 4, 5, 6, 7 or 8. So, in other words we remove one such production and add a number of productions where you see all these D's are new non-terminals. So, again the idea will be to introduce some new non-terminals and making sure that the syntax or the condition on the right hand side is satisfied. I should mention in passing that this idea is very similar to something you had possibly seen in data structures. Similar idea is let me just mention I will not go into the details. Tree with nodes having more than children can be converted in some at least in our data structure we can represent such trees by really trees with exactly two children can be represented by binary trees. Remember that the idea would be that this is the eldest child and then the rest of the children will come here. See this is what we are doing in this something similar is what we are doing here. We have outlined a procedure and it will not be difficult to prove formally that if one follows that procedure one will be able to generate one will be able to obtain Chomsky normal from grammar from any grammar satisfying the condition that the grammar generates a non-empty language and that non-empty language has a string other than epsilon and that language does not have does not generate epsilon. So, and that can be proved formally using as before induction will not do so. Let me point out that the reason we would like to consider grammars in Chomsky normal form because we will see that later on some results will be fairly easy to prove. We will show a pumping lemma kind of result but you recall that you are familiar with pumping lemma for regular languages will prove a pumping lemma for context free languages and there are starting point will be grammars in Chomsky normal form. Also we will use Chomsky normal form grammar to provide you with a efficient algorithm to check whether a string belongs to the language generated by a grammar or not. There again we will consider without loss of generality that the grammar is in Chomsky normal form. In other words what I am trying to say is from now on without loss of generality we can always assume a given context free grammar is in Chomsky normal form. So, let me prove to you one simple fact about Chomsky normal form grammars which we will use in the next lecture to prove our pumping lemma. And that fact is let me say this suppose G is a Chomsky normal form. Let consider a derivation of the grammar derivation tree of the grammar G generating the terminal string W such that no path in the derivation tree has length greater than m in the derivation tree greater than m. So, consider what we are saying suppose G is a Chomsky normal form grammar and consider the derivation in G derivation tree of G generating a terminal string W such that. So, let me draw a picture that this is generate this is S and this is W and no path here exceeds m the length of the path. Then length of W is less than equal to 2 to the power m minus 1. So, you see what we are saying in effect that if you have a bound on the bound on the bound on the bound the largest path in the derivation tree of a string in a Chomsky normal form grammar then you can provide an upper bound on the length of the string itself. The proof of the fact is by induction on m proof of what proof of the fact that length of W is bounded by 2 to the power m minus 1. So, base case is when m is equal to 1 when m is equal to 1 that means what that means the no path in the tree has length greater than 1 actually the only tree is of this kind that S directly deriving a terminal. So, this is path of length 1 and trees of this kind will generate only strings of length 1. So, then we are saying that any in case m is equal to 1 then surely length of W is 1 which is of course 2 to the power m minus 1 m is 1. So, 1 minus 1 is 0. So, 2 to the power 0 is 1. So, the base therefore, is proved and the induction step. So, let me write this as write the induction step which is also straight forward. Now, consider a tree where no path is of length more than m plus 1 and such a tree is of course not the base case and the induction case. So, clearly such a tree will be of this kind. So, the initially S will derive some two non-terminals a 1 a 2 and then some things are happening. So, basically the total string is W. Now, if whatever I am saying is generalized a little bit then the induction is very simple to prove. I said derivation tree. Now, derivation tree normally means starting with S, but we can extend this notion of derivation tree to mean derivation starting from any non-terminal. So, in that case base again will be not just trees of this kind also trees of this kind will also be allowed when I said by derivation tree I mean derivation starting from any non-terminal. So, in that case for induction in general I instead of S I would write some a and then I inductively apply the fact that the large longest path in either of this is of length m or less. So, therefore, this part has to be of length by induction 2 to the power m minus 1 this path this path the length of this path is again bounded by 2 to the power m minus 1 because these sub trees have no path greater than m because in the original tree no path is greater than m plus 1. So, therefore, this path is at most has length 2 to the power m minus 1 this path the right hand path has at most length 2 to the power m minus 1. So, together w therefore, size of w is bounded by 2 into 2 to the power m minus 1 which is equal to 2 to the power m which is m is of course, one less than the longest path m plus 1 and that proves this fact and this is one result that we are going to use in the next class to prove our pumping lemma.