 Today, we will look at an important result which is the pumping lemma for context free languages using which we will be able to show that certain languages are not context free. Before we state the lemma and prove the lemma and of course make use of the lemma, I will like to make one point which is in connection with what we did last time about getting or transforming a grammar into a chomsky normal form grammar. There we had seen how to take a production whose right hand side has you know a long string and then remove such a production and instead in its place we could add productions which had only right hand sides of length 2. You recall that was a very simple idea for getting rid of productions with long in fact any production whose right hand side had more than 2 symbols we could equivalently get productions which will have right hand side only consisting of 2 symbols and getting rid of such productions with 3 or more symbols on the right hand side. Now this has one more use and that is if you recall when we considered that or procedure for the removal of epsilon productions what is the time complexity of this epsilon production removal. Now unfortunately the way we discuss this process this procedure this algorithm can be exponential in the length of the grammar. So, very quickly you recall that suppose I had something like B 1, B 2 these are all non-terminals up to B m and it could be that each B i each one of this non-terminal they were each of these will be suppose each is nullable. So, in that case how many productions will I get after removing this I will get order of in fact precisely 2 to the power m minus 1 productions we must add in its place right. Now as you can see this adding so many productions we are adding exponentially many productions in m here which can be of the order of the size of the grammar that you is given as input to you and therefore just getting rid of such one production and in its place put the requisite number of productions that itself could be exponential. So, the idea is the way we can make this procedure this algorithm for epsilon production polynomial time is to first in its place in the place of this production we should have add first productions whose right hand side is of length 2 we know how to do it as we discussed Chomsky normal from grammar and then we apply this process of you know removal of one by one the nullable symbols to get new production. So, in other words from such a production if I can always get equivalently productions whose right hand side has only exactly 2 symbols and how many such productions I will need to add only linearly many right and then we do what we did for removal of epsilon productions right. So, in that case you should realize it is not difficult to see then our procedure is going to be polynomial time right. Now let us get back to pumping lemma for context free languages now you will remember corresponding pumping lemma for regular languages and the format of the lemma is very similar right. So, it goes like this the lemma goes like this that let L be a context free language that there is a constant n that depends only on L such that for all in L with length of x greater than equal to n such that for all n for all x in the language that is important and the other thing that we are saying is these length these lengths I mean these strings have large enough length there exist u v w x y satisfying now we give a number of conditions first condition is that the string x is the concatenation of these 5 strings v second condition is this string v x is not empty is another way of saying at least one of the 2 strings v and x must be non-empty and the third for all i greater than equal to 0 u v i w x i y is also in the language and we will add we will qualify one more thing that v x not only v x is non-empty we will also say a bound on v x on this v x which will say that the length of u v x is bounded by n. So, first of all just let us read the lemma what is it saying before we make use of them or we prove the lemma the conditions what it is saying. So, it is saying that let l be any context free language then you can find a constant n which will depend only on l if you recall that is exactly how we started the pumping lemma for regular languages also. Now, it says that for all x in the language which are large enough that means whose length is greater than this n this constant you will be able to find or that you can find or one can find u v w x y and that will satisfy the following one is that the string x is the concatenation of this 5 strings then v x is non-empty that means at least one of v or one of x is non-empty also it says that I am sorry I should have said v w x this part of the string the middle three v w x together their length is bounded by n. So, actual x can be very large however v w x that part is bounded by n its length is bounded by n and this is where the pumping occurs you can either pump down by making i equal to 0 or you can pump up by making i greater than equal to 2. What it is saying is that u v to the power i that means i copies of v followed by w then x i copies of x followed by y. So, this is another string and we are claiming that this string also will be in the language. So, the long and short of the whole thing is that if you take a long large enough string in the language then such a string can be pumped and what you pump is of course, non-empty and the other thing it is saying that the portion where the pumping will occur that portion is not too long just remember n is a constant for the language L. And now one point often you know strikes people that which really means see from this x how many new such strings that we are talking of which will be in the language L. We are saying for all i greater than equal to 0 u v i w x i y this all these strings are in the language how many such strings are there clearly infinitely many. So, we are asserting that all these infinitely many strings are in the language L, but that does not mean that all context free languages are infinite. So, let me let me put it this way we know that every finite language is regular we also know that every regular language is context free. Therefore, every finite language is also a context free language if you recall long back right when we started the discussion on context free grammars we showed how to take machine a DFA for a regular language. And then from that DFA we can obtain a grammar context free grammar which will generate the same language thereby we proved that every regular language is also a context free language. And of course, every finite language is regular so therefore, every finite language is context free. And now it seems we are saying that if L is a context free language and somewhere down below you are saying that all these strings infinitely many strings are in the language that same language. You should be careful to see why it is not ruling out the possibility of languages which are context free languages which can be finite. Although we are stating this lemma I leave it that I leave that as an exercise before we prove the lemma. Let me quickly give one example of the use of this lemma to show that some language is not context free. In fact, the language will consider is this and let us prove assuming this lemma to be true that L is not a CFA context free language. Now, the way these proofs go is by contradiction. So, we will say that suppose we are giving the proof suppose L is a CFA. Then what is it saying the lemma saying that if it is a context free language then there is certain constant N which depends only on the language L etcetera. So, we will say that let K be the constant for L as posited by the lemma. Now, the lemma says that if you take any string whose length is greater than K. So, let me take such a string consider the string A K B K C. Trivially the string is of length 3 K. I mean therefore, the lemma will be applicable for this for all those conclusions will be applicable for this particular string. Then what is the lemma stating that we should be able there will exist or we should be able to find out u v w x y in the string such that those things will apply. Now, where will these u v w x y occur in the string. Can I assume let u be this v be this w be this etcetera I cannot assume that because for the application of the lemma you see ultimately what I want to do is to prove that L is not CFL and get some string which is not of this form A N B N C N. And for that this I should be prepared to handle any break up of this string into u v w x y will come to that point a little later. We will elaborate that a little later, but right now you see to apply this lemma I have to I have to say that I know that there exist u v w x y whose concatenation is this string. Further I know two things that v w x this part its length is bounded by the pumping lemma constant. So, in this case what we can certainly say that this v w x part its length is bounded by the constant K here. Which is the pumping lemma constant for this language. So, where can this v w x lie it can of course lie entirely within the A right. It can straddle some part of A and some part of B. Remember that v w x is a contiguous string. So, it is kind of a window of length K or less that is what we are seeking. So, that window can be entirely within A or it can be part of A followed by a part of B and so on entirely in B part of B part of C entirely in C. These are the cases right. Now, you see that suppose v w x was entirely in within A and now consider I is equal to 0 to get this string u w y. The lemma states that both v and x cannot be empty. Now, we are considering the case one that v w x entirely in A's. So, after taking out v and x the string that result is indeed u w y and then this string u w y which I obtain from the string which was a to the power k, b to the power k, c to the power k I have removed v and x which is non-empty and that was entirely in A's. So, the long and short of this whole thing is that in this string number of A's is less than in this string clearly. So, in that case the number of A's is less and nothing has changed because so far as B's and C's are concerned because you have made you have removed v and x, but that by assumption will not have either B's or C's. So, therefore in this string u w y you have less number of A's than B's and C's. So, take the next case. So, that means what in this case what we are getting a string which is not of this form and therefore, that string is not in the language l. So, what we can say that clearly this case one could not have happened. So, let us see another case remember there are five such cases. Case two is that v w x consist of some B's, some A's, some B's and some B's. So, remember the picture you should keep in mind. So, the some number of A's followed by some number of B's followed by some number of C's and this number is K and that is the pumping lemma constant and your v w x this part has to be of length at most K. So, either that v w x part is here or here this is the case now you are considering right. So, let us say somewhere here you had your v w x. Now, again both v and x cannot be empty. So, therefore what one thing is certain that both v and x could contain like v could contain A's and x could contain B's and so on. But surely because v w x part is here this part does not contain any C's. So, now when you remove v and x to get the same kind of string u w i as before this could have lessened the number of A's and the number of B's or only A's or only B's, but certainly the number of C's in u w i will remain K as in this string. So, what has happened that some number of A's possibly and or some number of B's possibly have been removed right, but the number of C's has not changed. So, the new string that you obtain after pumping down is a string which has more C's than A's and B's right. So, this case too also could not have occurred because in that case we would obtain a string which is not of this form and therefore not in the language. See that is out of the question because we have assumed L is in CFL. So, all the strings that I get after doing the pumping up or down should be in the same language L right. So, this way now I consider the other three cases that is case three was u consider the case that v w x part is entirely in B's right. Then fourth case is v w x part straddles both some B's and some C's and the last case is the v w x part straddles or not straddles does not straddle two B's and C's it only consists of C's. Now you see all these other three cases are also very similar to either case one or case two. In that case what is the conclusion that whatever you do wherever you consider this break up of u v w x y in each of these cases what we find is that we are able to obtain a string using the pumping lemma which is not of this form, but that will contradict the lemma and therefore we are coming to a contradiction that means what our initial assumption must be false or initial assumption was L is a CFL. So, we have proved that L is not a CFL because if you assume L is a CFL then you are contradicting the lemma which of course is true. We saw an application of the lemma through which we proved a certain language is not context free and all the applications of the lemma will be of this kind that will prove that some language to be not context free and we should be quite clear the way this prove went and that we will be provide we understand provided we understand the structure of the statement a little more clearly. You see if you look at there are there are several quantifiers existential as well as universal. What it is saying that given that L is a context free language L context free language first of all it is saying there is an N which we are calling the pumping lemma constant. Then it is saying what that for all x in L so after this existential quantifier there is a universal quantifier for all x in L and length of x is greater than N. For all x in L and length of x being greater than N then what then there is existential quantifier again. So, let me write it there exist u v w x y right after that what we have again a universal quantifier for all i then for all i greater than equal to 0 and this part is the rest of the statement. So, basically it is saying that the string x is u v w x y and we are saying that v x is not empty and we also are we are also saying that v w x this length is bounded by N and we are saying that u v i w x i y this is in the language. This is the logic logical or you know if I if I use the language of first order logic this is how we are restate that lemma. The reason for writing like this is to make ourselves clear that in the application see. So, you just go back to the example that we had given we wanted to show that the language a N b N c N this language is the language of the is not a c f l. Now, as we mentioned in the case of pumping lemma for regular languages you know you it is it is useful to think of this proof that this language is not context free proof of this as a game between you the prover and an adversary recall in that game as as well as in this game you see we alternate adversary does something and then I do something and then finally, something is evaluated then either I win or the adversary wins. So, always remember that the prover the one who wishes to show that some language is not context free that person I am calling prover he has complete freedom on the universal quantifiers. So, he can his move comes when the universal quantifier is being considered. So, like his move is here his move is here. So, in other words if you think of what we did we did not assume what is the pumping lemma constant you said. So, whatever be the pumping lemma constant that means we are assuming let the adversary give me any constant depending he gave the constant k if you recall we said that k is the pumping lemma constant then comes my turn my me in the sense the prover. So, now I can choose any x. So, in order to win the game I chose the string x as a k b k c k and this string has to be in the language and it should its length should be greater than the pumping lemma constant that is. So, we chose I as the prover chose prover or you know moved legally this part and now comes the role of adversary the move of adversary adversary can break you the string x into these five parts in any manner that is not in your hand. So, whatever. So, let us say he breaks the string in some way the string x in u v w x y and then I as the prover choose an I and obtain a string which is not in the language by doing this. So, I am showing that look we are following this game and I am getting a string which is not in the language and therefore, we conclude that the language l is not context free. So, the point I wanted to emphasize was that that look at the quantifier structure of this lemma and remember that you as the prover you have complete freedom on only on the universal quantifier. So, this is something I mentioned at the time we talked of pumping lemma for regular languages. The one mistake beginners make in applying this lemma they will say let v be this let this v w let this v x let this v y and then they use some pumping and get a string out of the language. Now, that would not do you should be prepared for whatever be the break up you cannot assume this break up because this move is not in your hand. So, now we are going to prove the lemma I see that there is a typo if you like which is this the way I have written here I am showing I have been saying whenever I said that v x together is not empty, but I wrote like this, but what I should have said since I am using the length v x that what I am saying is this length v x is not equal to 0 which in other words is that both v and x suppose they each was empty in that case the length would have been 0 that is not the case. So, of course, the main point is that at least one of v and x must be non empty. So, one way of saying that is the length of the string v x is not equal to 0. Let us go ahead and show the proof for this lemma the one fact we are going to use is what we did in the last lecture. So, let me let me write this fact here suppose G is a Chomsky normal form grammar. So, this and let tau a derivation tree in this grammar of G and let tau no path longer than k in other words every path here path always means from the root to the tree. So, let me write that down path from to a leaf. So, no path is longer than k then the string generated is of length less than or equal to 2 to the power k minus 1. So, what it is saying is that if you take any derivation tree and the longest path as a certain bound then using that bound you can bound the length of the string that you that this derivation gives rise to. So, how do we use that we say that for this pumping lemma proof we say that you know let L be a context free language. So, we start with that that I have in the statement of the lemma. So, we will start with that let G be a Chomsky normal form grammar for L. There is some issue about that suppose L is L has the string epsilon or L is empty language and so on. So, these are cases which you can dispose of I am not I will not go into that, but that is not going to stop us from proving this lemma. So, remember because when I say that G is a Chomsky normal form grammar for L, but then Chomsky normal form grammar exists provided L does not have the empty string, but I will leave it to you to argue later on that that is not a problem. If L is has the string epsilon then you could consider this language L dash which is L from which epsilon has been removed. So, anyway so we can without loss of generality we can consider a Chomsky normal form grammar G for the language L and let G has M non-terminals. Now, let set N 2 to the power n. So, basically consider now string consider derivation tree in G of a string in L of length L is equal to L is equal to L is equal to L is equal to 2 to the power m or more. Now, this string is being derived. So, this is the string x whose length is 2 to the power m or more and that is being derived by this derivation tree of G. What is what is the length of the longest path? Do you realize that the length of the longest path in G has to be at least m plus 1 and that is where the fact comes in that suppose the length of the longest path in this derivation tree for x was of length m or less. So, but derivation tree whose now now consider for K consider m. So, that tree which had only every path was bounded by length K could not have generated a string larger than 2 to the power m minus 1, but our string is of length 2 m. So, therefore, I say that the tree or derivation tree for x must have a path of length m plus 1 or more. So, in other words you just consider. So, let consider. So, let this be the longest path. So, let us say this is the longest path in the tree. So, what we are saying is the longest path therefore must be of length m plus 1 or more. So, now this is of course, a terminal symbol and this will be all these are non-terminal symbols and this non-terminal symbols and this non-terminal is of course, s and this path is of length at least m plus 1. So, how many non-terminals occur in this path? Clearly m plus 1 or more sorry this path is at least of length m plus 1. So, there will be at least m plus 1 non-terminals in this path, but there are only m non-terminals in the grammar G. So, if I start from here within you know within here you know. So, basically what I am trying to say is I start from this point and stop when I find some non-terminal has repeated for the first time. There has to be the some non-terminal will definitely repeat in this path because that comes out of the pigeonhole principle. You have m plus 1 non-terminals in this path, but there are only m distinct non-terminals in the grammar G. So, some two of these must be at least two of these must be identical and so, you are starting from here, here, here, here, here. So, by the time you go up make a make you know do a path of you know go up m plus 1 edges. You will suddenly encounter at least one repeat. So, remember the repeat is let us say the first time the non-terminal that is repeating is A. And remember that this path from the you know if you are going from here. So, first time A is occurring here and the next time A is occurring here. This path is of length m plus 1 at most m plus 1. It cannot be more than that. Why? Because by m plus 1 some non-terminal must repeat. So, remember that we are considering the longest path and in that longest path we start from the leaf in here and you know we are going up the path till a non-terminal repeats for the first time. So, this is the picture that is what I am showing. So, now what I would like to do is to do this that. So, in fact let me show you this. So, let me show you this. So, the subtree subtended at the first A that generates the string let us say which is this part. So, the first or the you know I do not know how to put it here, but A you can see this A let this A generate the string this part of the string and let this A generate this part of the string. So, let me see that this part is called W. In picture what I am trying to show that A is deriving the string W. So, let me write it A derives W and this A derives of course, this W as well as something before it may be let me call that V something right after W X. In other words we are also saying because you see this is also A. So, this A is deriving V W X and what does S derive you can see S derives. So, let me say this is this part is U and this part is Y. So, is it clear. So, S is deriving the entire string of course, U V W X Y and we have these two also and particular of course, S is also deriving U A Y right. So, just consider this S is deriving U and then A this A part was deriving V W X and S was deriving rest of the string Y. So, now put these consider this. This and this right we can also say that this A was deriving V A and then X this A was deriving W this A was deriving V W X. So, this W was of course, derived by this A remember this A was deriving V W X of which the W part was derived from A. So, clearly A was deriving V A X right and now you can see the reason why the pumping is happening. So, using S you came to U A Y right then this A. So, U A Y this A can be replaced of course, by this V A X this U this A is replaced by V A X Y and in this string what we did was replace or rewrite this A or derive from this A the string W, but you know since I have this non-terminal A I can repeatedly apply this A going to this V A X. So, I could do this that U V and then this A I rewrite as V A X V A X and then of course, these X and Y would come right. So, what is the string that you have now the S deriving S deriving U V V A X X Y you need not stop here you can again rewrite A as you know V where rewrite V A X. So, then you will get three V's and three X's then Y and so on. So, therefore, you can see I can derive any string of this form U V I then W. So, you can write this as U V I then W. So, you can write you know finally, when I am tired of applying this rule again and again and again A going to V A X finally, I rewrite this A as W. So, I will be I will obtain from S the string U V I W X I Y. Now, of course, pumping down would mean what that from S when you generated U A Y then instead of applying rewriting this A as V A X one we could simply do simply rewrite A or derive W from this A. So, I could get from here U W Y and this therefore, shows that for every I greater than equal to 0. In case of 0 I get the string U W I for the string U W other things I equal greater than equal to 1 then I will get U V W X Y U V V W X W X X Y and so on. So, that is really the proof of the pumping lemma for CFL's. Now, I will tell you and in fact this is not too difficult to see why these two conditions must apply that V X is not empty V X is not an empty string that means at least one of V or X must be non-empty. The other thing was that this condition V W X is less than equal to the length of V W X is less than equal to the length of the pumping lemma constant. This I will explain in the next class, but you can actually if you just look at this picture you should be able to find it for yourself the reasons for these two conditions we stop here.