 In this lecture, we will discuss properties of context free languages, and first we observe certain trivial properties, and then we will ascertain some properties connecting to context free languages, which you know you will look at in parallel to what are called properties concerning regular languages. So, this properties of context free languages, so point one it is very clear from the definitions of context free grammars, you can quickly understand that the class of context free languages is closed with respect to union, this proof is very simple. If you take two context free languages, let L 1 and L 2 be 2 CFL, so that means you have say G 1 is equal to N 1 say T 1 P 1 S 1 a CFG such that the language generated by G 1 is L 1, and similarly you can have G 2 say let me call L i for i is equal to 1 or 2, I have two context free grammars. Now, you can design, so set G to be you know this N 1 union N 2 union singleton S, you take a new non-terminal symbol. Now, this language it is certainly over union of the terminal symbols, and now production rules P I let you know in S. So, in this case the production rules you take P 1 union P 2, and you give new rule S goes to S 1 or S 2, if you set grammar like this you can quickly clearly see the productions in P 1 are in the form for context free grammar, and for P 2 also. Now, what are the two new rules I have introduced S goes to S 1 and S goes to S 2, these two are you know satisfying the conditions for the context free grammar, and thus you can see note that this G is a C F G, and now you can ascertain that the language G is a C F G, and the language generated by G is simply L 1 union L 2 that one can observe. Because if you take any string generated by this grammar you have to start from the start symbol S, and you have to take one of the branches S 1 or S 2, if it is going to S 1 then you can generate the strings of L 1, if it is going to S 2 you can generate the strings of L 2. So, that you know any string which is generated by this in L 1 union L 2, and conversely what are the string which is in L 1 union L 2 you know either it will be L 1 or it will be in L 2. So, you know if it is in L 1 you can generate through that through S 1 thus you can generate that through S also, so that it will be in L of G similarly. So, you can observe you can prove that this construction works to show that context free grammars, so context free languages are closed with respect to union. Now, if you look at intersection that will be a question to us, the question is the set the class of CFL's closed with respect to intersections, and you can question that this is closed with respect to complementation etcetera the set theoretic properties related. We will address these points little and now if you ask about these are set theoretic thing, now if you look at the concatenation again. So, the result is the set of or the class whatever the set of CFL's is closed with respect to concatenation how as earlier you know you let L 1 L 2 be 2 CFL's consider the respective context free grammars say G 1 G 2, and now when I am setting G here I consider the production rules you know of this form, because this is P 1 union P 2 union this S goes to S 1 S 2 I will consider. So, here you require the non terminals of the first one of the second one union a new symbol S, and for concatenation anyway this T 1 union T 2 and P S. If you have this again this new production rule I have a single rule here, this satisfies the criteria for context free grammar, so that you know you can note that G is a CFG, and you can observe that language generated by G is L 1 L 2 how come? If you start from S you have to use this if I take any word which is in L of G I have to start from S and produce that word W, if W is in L of G I require this, but you look at the first step from S I have only one rule. So, that means I should use this S 1 S 2 and after that infinitely many steps I might be producing this W. Now, corresponding to the non terminal whatever that we are generating the terminal string the portion of W that you can generate in L 1. Similarly, corresponding to S 2 whatever that we are generating the portion in this W that is in L 2, so that this W can be written as x y for x in L 1 y in L 2, so that it is in L 1 L 2. And similarly you can see the converse that means if you take any string in L 1 L 2 you will have two strings say x 1 x 2 x is in L 1 x 1 is in L 1 x 2 is in L 2. Now, since x 1 is in L 1 you can generate that in the grammar G 1 x 2 is in L 2 you can generate that in the grammar G 2 and thus you have the respective production rules to generate this x 1 and x 2. Now, if you take the if you start the derivation what are the derivation that you have got from S 1 to x 1 you have got a derivation in G 1. And similarly for x 2 you have got a if you start with a string W is in L 1 L 2 this W is of the form x 1 x 2 and for x 1 is in L of G 1 and x 2 is in L of G 2. So, I have these derivations and now you just have to start use this production rule S 1 S 2 and then use this derivation and produce from S 1 because the derivation G 1 will be derivation G also because all the production rules of G 1 at P 1 the production rules are already there in G. So, I can make that as a derivation in G also. So, I have x 1 this and after again finitely many steps x 2 can be made this x 1 x 2 thus you see that you have a derivation this in G. So, this is a derivation in G. So, that this x 1 x 2 that is what is W is in L of G. So, we can observe these things. So, this is one side to observe that L of G is continue in L 1 L 2 this is a other side to observe L 1 L 2 is continue L of G. So, that this grammar what we have constructed here that generates L 1 L 2. So, that you know the class of contextual languages is close with respect to concatenation. So, regarding this set theoretic questions whether the class of CFLs is close with respect to intersection, complementation before going to such questions and many other properties. Let me just give you the pumping lemma for CFLs which you know it is something parallel to the one you have worked for regular languages. So, let me first state the lemma because I do not have to give much introduction for this because you have already worked for such pumping lemma for regular languages. The philosophy is similar what pumping lemma says for any infinite context free language L of course, this we will be observing that it is true for finite language also. But, anyway I will work for infinite language here. So, for any context free language L there exist a constant n such that for all w in L with length of w is greater than or equal to n this w can be written as u v x y z such that the length of this v y the second component and the fourth one length of v y is greater than or equal to 1 and length of v x y the middle one this v x y these three components that is less than or equal to n. And if you pump the strings v and y simultaneously for i number of times that means for all i greater than or equal to 0 u v power i x y power i z is in L all these strings will be in L. So, now look at in case of regular languages every string you will be splitting into three parts you will be able to split into three parts such that the middle string is non empty string and if you pump that string for any number of times the resultant strings will be within the language that is what we have observed in case of a pumbig lemma for regular languages. Here what is happening a string can be split into five parts and among this five parts say u v x y z when we have split it this v and y at least one of them is non empty when I am say this carnality of v y is greater than or equal to 1 that means at least one of them has to be a non empty string and the middle portion that v x y these three strings together you can always produce such a way that this length is less than or equal to n what are the number n is existing. And now if you pump v and y simultaneously for any number of times the resultant strings will be in L. So, the philosophy is similar and the applications are also similar. Now, I will give you an idea that how we will produce this result. So, the idea is as follows now you look at if L is a context free language you have a grammar corresponding to that say some n t p s and if you now we have to identify the number n we have I have to come up with a number n I will tell you what should be that number. And now what how the how this number should be if you take any string w in L of g whose length is greater than or equal to n I should be able to write this w in five parts u v x y z. So, this v y at least one of them is non empty. So, that is what is the break up we require and if you pump these two portions any number of time simultaneously the resultant string should be in the language. Now, w is in L of g you have a derivation for w you have a derivation for w what I will what I will show here is I can always have this derivation of the form because I will find a number n such that this derivation can be for this w the derivation I will I will get things like this say u some non terminal a z I can always produce things like this or you know this u and z you may be producing later also does not matter. So, I will have a non terminal symbol this non terminal symbol will occur after you know after production of some strings that is you know u v a y z and then this a I will be able to terminate x y z. So, if I can show that I will have a derivation with the given context of this form then I am through this is what is w why because once I have a situation like this you see what is happening this non terminal a is producing the string v a y. Now, this a once again we can produce v what is called v a y after finitely many steps once again. So, that you will have double y and if you once again if you use you are getting triple v a triple y and also you see that a is producing x infinitely many steps. So, I have this also a produces x infinitely many steps. So, this is also there in the part of the derivation thus after finitely many times if you use you will get you know v power i a y power i you can get like this. And then if I use this I will get v power i x y power i because a is a can be termed with the terminal string x. So, I will get always like this thus you see I have a derivation in this grammar of this sort. Now, you look at what is happening here in this derivation if you observe if I can have such a derivation what is here the a non terminal symbol which has occurred once is appearing once again in the derivation of a string whose length is bigger than or equal to that n. So, that is what is essentially the fundamental idea once it reoccurs once a non terminal symbol reoccurs I would be able to produce derivations for v power i x y power i within the portion of that. And then you know I can produce the strings in the required as mentioned in the statement. If this is clear we can get the idea how we have to proceed. So, what we will do I will connecting to this suppose if I draw the derivation tree for this this is a start symbol let us assume this way. And what I have here I am getting say for example, a here once and once again if I am getting a here within the purview of this a whatever is because you see contractual grammar. So, every non terminal symbol will have production rules there is nothing to do with other symbols adjacent to that for a if it is producing x and this a if it is this tree if it is producing the portion v y this is the leaves of this derivation tree. And now what are the remaining portion here that is u z this portion. Now, the derivation tree if I can have this a reoccurring now connecting this sub trees will be having this way and thus whatever the way that I have described here you know this such a derivation a has occurred once again and thus I will be able to produce the strings of the form u v power i x y power i z in the language a language generated by g. So, now you look at whenever you know if I am looking forward for repetition of a non terminal symbol. Now, you put a cap on it that means you consider a number you consider a number such a way that if you take any string whose length is bigger than that number in its derivation tree you know you require this non terminal symbol to be repeated there should be repetition of non terminal symbol this is the fundamental idea that I have just discussed accordingly you know. So, what essentially we have to set the number in any string if you look at the derivation tree in a branch I should have a non terminal symbol repeated there taking this into count if you have if you if you consider that number then what are the string in the language whose length is bigger than that particular number you take the derivation tree of this form. Then you can manage to show that you know it is littered into u v x y z and of course, we have to observe that we have to have this splitting such a way that at least this v or y should be non empty and of course, the extra criteria we had put the length of this should be less than equal to that number all these things we require these are all extra conditions. But as a bare minimum you can understand that it will be split into 5 parts in which this v and y portions are at least non empty and u v power i x y power i z for all i this will be inside L that is what we have to observe. So, what do I do in this connection to get this extra properties what instead of considering an arbitrary contested grammar I will consider a contested grammar which is in c and f you know Chomsky normal form. So, in Chomsky normal form every production rule is of the form you see that a goes to b c a goes to a of course, if connecting grammar if it is in Chomsky normal form in the language if you have empty string except empty string everything else you can generate you know that. So, these are the types of production rules that you have and now if a grammar is in Chomsky normal form you know this you can quickly understand this property. So, let me write this lemma it is not difficult for you to understand let g is equal to n t p s b a c f g which is in c and f Chomsky normal form. So, generating a language in c and f if a parse tree of a word parse tree of a word w generated by g is of height h then you can observe that the length of this is less than equal to 2 power h minus 1 you see what happens I just of course, this you can prove by induction I will just demonstrate I illustrate this two example for example, I have like this. So, since it is c and f I can have say like this if the situation is once again I have non terminals I will have like this then say for example, a 1 a 2 and if in the case you know say a 3 or b whatever now you look at because it is in c and f you know the length of what happens now if I consider the complete binary tree here. So, the possibilities I can in fact, extend if I am looking for the height here 3 say b 1 b 2 say b 1 b 2. Now, height of this is 1 2 3 is the height and now you see the maximum length because I have filled all the nodes in this complete this thing here I can get maximum length is 4 I could get maximum length is 4 because you see at this level you have node 1 that is the way you have in a complete this thing you have 2 nodes here here 2 square nodes will be there if you go 1 more level 2 cube nodes will be there and now once you have all the levels all the nodes are available that means this 2 power you know 2 cube nodes are there and all of them are getting terminated to say for example, a 1 a 2 and so on a 8 and now you see the length of this thing is 8 that is what is the maximum. So, to this level if you come now you understand now at any level suppose if you are not you are terminating at this level itself now the lesser number the, but the maximum you look at that if it is of height you know here 3 you see that 2 square here. So, if the if it is of height 3 you know 2 power h minus 1 that is the maximum length string that you can produce if it is of height h any parse tree of height h can produce a string of length of length maximum 2 power h minus 1 maximum this is the length. Now, in fact you can observe this result by induction by induction on this binary trees you can observe this result I will use this result. Now, so let me consider this Chomsky normal form in the in the present context. So, for proof of this result let G equal to N T P S be a contractory grammar in Chomsky normal form generating l minus singleton epsilon right. The Norton mills set suppose the size is k and set the number n 2 b 2 power k now you will understand because just know the result what we have discussed is if it is of height h the length of you know the string will be maximum 2 power h minus 1. So, I will consider this n equal to 2 power k now you consider the string w in l of length greater than or equal to n take any string whose length is greater than or equal to n now since it is the length of this is greater than 2 power k minus 1 any parse tree T of w in G must have a path of length at least k plus 1 because the length of this thing is greater than 2 power k minus 1. So, you should have at least one path of length at least k plus 1 now let me draw a picture in that sense. So, you consider a parse tree of this w. So, you should have you should have you should have a path of length at least k plus 1 because whose length is bigger than or equal to n since it is greater than 2 power k minus 1 I should have a path of length at least at least k plus 1. Now, the last node is anyway terminal node some terminal node will be there and then here before that a non terminal node will be there. So, these are all non terminal nodes on this of length at least k plus 1 at least k plus 1 length now you look at. So, let me say that path will be p. So, how many vertices will be there if it is of length k plus 1 there are k plus 2 vertices will be there on this path or at least k plus 2 vertices. Now, you know the last node is a terminal node and at least then k plus 1 non terminal nodes will be there on this path. On this path p at least k plus 1 non terminal. So, whose labels are non terminal symbols. So, at least k plus 1 nodes whose labels are non terminal symbols. So, you will have like this and thus you see if there are at least you know k plus 1 nodes who labeled are non terminal symbols, but how many non terminal symbols we have there are only k non terminal symbols. Therefore, at least one of them by p g n hole principle at least one of them should be repeated now you see since there are only k non terminal symbols by p g n hole principle some non terminal symbol you know will appear twice on p. Now, what I will ask you to do you just go from you know this terminal node to above and check you know k plus 2 nodes other than this suppose if you check k plus 1 nodes you can certainly get 1 node repeated. So, whatever that it is getting repeated for example, this node label and this node label assume they are same. For example, say you know the label is a if I am writing and the node say it is close to this terminal node that means let me call v t and the node which is close to the root I may call v r whose labels are same the label is the label of v r and v t assume it is a labels are same. So, what I ask you to do from the terminal node you go above till say for example, k plus 2 nodes that means other than this terminal nodes if you just visit k plus 1 nodes before that you can realize that 2 nodes will have the same label before you reach to k plus 2 nodes from the bottom. So, let me assume the first node which is getting repeated say v t label and v r label I am just writing because v r which is close to root and v t which is close to this whose labels are a. So, from the leaf node on p go above and check the first k plus 2 vertices we will find 2 vertices v t v r whose labels are same non-terminal symbol say a this is how we have taken a label. Now, note that the portion of p from v r to leaf node is of length at most k plus 1. So, the subtree t r with root v r that represents the derivation of subword say w dash of w of height at most k plus 1 from here whatever is this this w is the entire word of course, this subtree let me call it as w dash. Now, you look at how many nodes I have visited k plus 2 at the maximum k plus 2 nodes I have visited and therefore, the length of this path is you know the length of this path is k plus 1. Now, you see the height of this tree is maximum k plus 1 why is because the height of this is maximum because p what we have considered that is the longest path in the entire tree in the entire tree I have considered a longest path. Therefore, you know here this portion should be longer than any other path within this subtree also if I have something else then it cannot be longer. And therefore, the height of this thing is at most k plus 1 thus what do you have the length of this word is less than equal to 2 power k length of the word because whose height is maximum k plus 1 and from the result what I have just discussed the length of the yield this from this lemma you can observe that it is less than equal to 2 power height minus 1 and therefore, the length of this w dash should be less than equal to n. Now, you see if you consider the subtree now if you consider the subtree rooted at this v t if this is a subtree rooted in v t which represents the derivation of sub word x then w dash can be written as v x y because this w dash is connect is the yield of this subtree rooted at v r if you consider the subtree rooted at v t the yield of this subtree if I call it as x then this w dash can be written as v x y this entire portion w dash. So, this portion is x now you understand this point because I have considered c n f this is node a and this a goes to say for example, b c because it is in c n f I will have you know this subtree here which is marked rooted at this v t this should be a subtree of you know a subtree of a tree which is rooted at b because I am having you know two branches here from this node. So, this subtree will be entirely within the subtree rooted at b or it should be within the subtree rooted at c because you look at that say a now the rule say for example, b c is like this now you have a tree connecting to this b and you will have a tree there will not be anything common. Now, what are the subtree I am showing rooted as v t this v t will be therefore, within this or within this it cannot be you know this v t cannot be in the common of these two and therefore, from this you can observe that other than this portion x you know when I have c here suppose if it is within b the c should get terminated and at least you know some string some yield you will have. So, when I am writing w dash is equal to say v x y. So, the remaining portion other than x should be at least you know you should have at least one symbol and thus it has to be non empty the length. So, this is the argument I am placing here since the first production used in the derivation of w dash at v r must be of the form a goes to b c for some non terminal symbols b and c the subtree this t t must be completely within either subtree generated by b or the subtree generated by c and therefore, you see at least one of this b or y should be non empty. So, the string v y length should be greater than equal to 1 thus what do I have this a infinitely many steps I am producing v a y and this a infinitely many steps we are producing x where v a y length that is what is w dash is less than equal to n that is what we have observed and also with this argument we have observed that v y length should be greater than equal to 1. And hence because of these two derivations we have for all i greater than equal to 0 we have a produces infinitely many steps v power i x y power i. So, this is what essentially I have mentioned that we will give a derivation within the given derivation as a sub derivation that a non terminal symbol will be repeated and it produces strings of this form v power i x y power i once I have this then I am through. So, I can conclude this result now note that the string w can be written as u v x y z for some u and z because this is the rest of the portions other than the subtree what we have discussed. So, the derivation will be of this form this s produces u a z etcetera and now the derivation connecting to this if you place it here you can see that all the strings u v power i x y power i z for all i will be in the language l. Thus you see every string w of length greater than equal to n what are the constant that we have set here the constant is in this particular context I have set n equal to 2 power k because we have considered Chomsky normal form and you can observe this all the strings u v power i x y power z will be in the language l. So, this is what is pumping lemma for context languages and you can compare with pumping lemma for regular languages and understand this better. Because the philosophy is similar the way that we are working there we have worked with the states if the proof is through you know finite automata here we are working through the non-terminal symbols here. But what is the philosophy once again if you look at you know I have to produce the derivation of this form I have to produce the derivation of this form as I had explained here. Once I had produce the derivation like this I can always produce the strings u v power i x y power i z in l for all i. So, for that purpose we have considered you know the tree in which we have considered longest path and such a way that you know the non-terminal symbol is repeated. So, the constant we have to set accordingly. So, we had set accordingly and we have produced this result. Now, let me talk about applications of this pumping lemma because you see in case of regular languages the pumping lemma you have used to observe certain languages are not regular. Similarly, here in case of context language also we use this pumping lemma to observe certain languages are not context free. Let me just give this example if you consider the language l a power n b power n c power n m greater than equal to 0 this is not a context free language. How do I observe this? Suppose l is a context free language. Now, as per this pumping lemma you should have a constant let me say n naught be a constant as per pumping lemma for CFL. Now, as we have worked for regular languages here also same debate if you choose any string w whose length is greater than equal to n naught what pumping lemma says if it is if it is a context free language we should be able to split it into 5 parts in which you know at least one of them is non-empty that we are y such that if you pump those strings simultaneously for all the powers the resultant string should be in the language. Now, I will give you a string which fails that particular condition and so that we say that our assumption is wrong. So, when I am assuming it is a context free language I get a constant n naught as per the pumping lemma. Now, let me smartly choose the string say because here the strings are of the form a power n b power n c power n I am particularly choosing you know this a power n naught b power n naught c power n naught you see the length of this string is 3 n naught and therefore, this is greater than or equal to the length n naught number n naught length of this thing. So, if it is split it into any strings of the form u v x y z into 5 portions if this w is split it like this with the which satisfies the conditions that v x y length is less than equal to n naught and at least v r y is non-empty once I have once I take this restriction v x y length is less than equal to n naught you see how this portion v x y looks like how does it look this v x y since it is of length less than equal to n naught it will be within a s or within b s or within c s or you know you can have some a s and some b s of course, you cannot have a s b s c s because the length is length maximum n naught. So, this can have if it is in the border you can have some a s and some b s or you can have some b s and c s, but you cannot have all a s b s is together. So, now you see what do I do as in case of regular languages I have chosen this string now as shown below this u v power i x y power i z is not in l for some i for some i I will produce that which contradicts pumping lemma. So, that the language is not a context free language that is what is will be the conclusion if I show that for some i u v power i x y power i z is not in l. So, we have the following 5 cases as I had just mentioned for v x y for some m m 1 and m 2 it either it can be of the form a power m this v x y or it is of the form b power m or it is of the form c power m or if it is common to the portion this a s and b s I will have a power m 1 b power m 2 form or it is of the form if it is common to b s and c s b power m 1 c power m 2. If I argue for a s then it will be similar for to these 2 cases if I argue for one of this the other thing will be similar. So, we discuss the cases 1 and 4 other cases will follow in a similar manner let me consider the case v x y is of the form a power m then w that is u v x y z it is of the form a power k 1 because when a power m when I am saying this side and that side you can have some a s. So, this k 1 k 2 you know greater than equal to 0 because if a power m is equal to a power m naught then you can have you cannot have any number of a s either side but anyway this is the generic format when v x y is of the form a power m. So, this u v x y z if it is in this form since this v r y is non empty since v r y is non empty that means within this a s you know at least 1 a if you consider when I am pumping this v s and y s whatever is non empty what will happen the number of a s will increase but the portions in this z you know you have v s and c s which remains same because I am just pumping the words v and y the number of a s here for example, here within this v y if 1 a if it is there somewhere it has to because the length of v y is greater than equal to 1 if I keep pumping the number of a s will increase number of number of a s will increase and therefore, the number of a s will be different from the from that of number of b s and c s and hence what will happen you know the resultant string will not be in l. Similarly, if v x y is of the form a power m 1 b power m 2 if it is like this you know I have some a s here a power m 1 b power m 2 and the rest of the b s I may call it as b power k 2 c power n naught it is of the it is now since again v n y s when I am pumping I do not know v may have some a s y may have some b s you might be pumping them or you know v n y s may have only a s in which case when I am pumping them if it is i greater than 1 if I have if I raise the powers what will happen the number of a s or b s will increase whereas, the c s which are in the portion z you know that will not increase and therefore, as the number of a s or b s will be different from number of c s in the string this cannot be in l this cannot be in l. So, I can produce strings for certain powers which are not in l, but pumping lemma says that for all i it has to be. So, since I am producing the strings like this as observed now you have some i such that u v power i x y power i z is not in l which is contradicting pumping lemma and thus you can say that this language is not a context free language. Similarly, people can you know several other languages say for example, w w as a w belongs to a a b star you can observe that this is not context free using pumping lemma. Only thing is if you say this is context free language what you have to do you have to there will be a constant you have to now choose some string and observe that when you divide it any of the form u v x y z then by pumping v r y v and y simultaneously you will have certain strings which are going beyond the language. So, that is how we have to do now you can take this as an exercise and ascertain that this language is not context free some more languages I will display now that they are not context free you can observe them.