 Today, we will provide algorithms for these two decision problems for context free languages. The first one is that we need to determine given a context free grammar g if the language generated by the grammar infinite or finite. For this problem, so let us say problem one algorithm will provide a sketch. So, you are given the grammar g that is the input. So, first what we do is step one find from g its Chomsky normal form grammar g 1. So, we know from the definition of Chomsky normal form grammar g that this grammar g 1 that we obtain the language generated by the Chomsky normal form grammar for the grammar g will generate the language l g except possibly one string and that is the empty string because it is called that a Chomsky normal form grammar cannot generate the empty string. So, clearly g generates an infinite set if and only if g 1 generates an infinite set. So, equivalent now therefore, we will look at is l g 1 infinite. This is the question we resolve where g 1 is in Chomsky normal form. The way we do it is actually quite straight forward. We create a directed graph for the non terminals of the grammar g 1. So, let me define this consider the directed graph and this sorry this is the non terminals set off. So, let me let me clearly say what I mean. So, g has this graph now this script g is the graph that we are building given the Chomsky normal form grammar g 1 g has as vertices the non terminals of g 1. So, in other words if my g 1 was the set v n sigma p s the directed graph that we are building that will have as vertices this set of non terminals that is what we have written. Now, let us say a non terminal a in g. So, now remember now we can see a non terminal has kind of two roles one as a vertex of this graph g of course, as a non terminal of the grammar g 1. So, from a non terminal a in g there is an edge to the non terminal b in this graph that we are constructing. If and only if there is a production a goes to b c or a goes to c b for some non terminal right. So, what we are seeing is this in this graph that we are building there will be an edge from a non terminal a to non terminal b. If there is a production in which a occurs on the left hand side and b occurs on the right hand side of that production. Now, the claim is l g 1 is infinite if and only if this graph g has a cycle. We need to of course for proving this claim prove that if g has a cycle then l g 1 is infinite and vice versa. So, let us do this one by one if and the proof idea is suppose the cycle one cycle that we have in this graph is of the form the cycle is from a 1 to a 2 to a 3 a k and back to a 1. So, what does it mean this means that from a 1 you can derive some string which will have a 2 in it. So, we can say that you know this first edge in the graph this means that a 1 derives let us say alpha 1 a 2 beta 1 and similarly a 2 derives alpha 2 a 3 beta 2 and so on a k a k minus 1 derives alpha k minus 1 a k beta k minus 1. And then because of this edge back we have that a k in turn derives let us say alpha k a 1 and beta k. So, the crucial point is because the grammar g 1 is a Chomsky normal form grammar. So, this is something you can verify that it is not possible for each i alpha both. So, let me write it this way both alpha i and beta i are empty strings. So, in fact it is quite easy to see at least in one for example, in the right in the beginning when we say a 1 derives alpha 1 a 2 beta 1 and suppose both alpha 1 and beta 1 they were empty strings. So, that would mean what that a 1 derives a 2 is that possible because that is not possible in Chomsky normal form grammar because remember that Chomsky normal form grammar does not have unit productions as well as epsilon productions. So, in the absence of unit productions and epsilon productions a 1 derives a 2 is just not possible for two distinct non-terminals a 1 and a 2. So, basically from this what we get if you see what is happening is that you know from all this we can say a 1 derives some alpha a k beta you know we just this in this a 2 I substitute this and so on and then right then this a k I can of course, for this a k I substitute this. So, what will happen alpha alpha k a 1 right then beta 1 beta k beta k beta 1 right. So, as before both of these cannot be empty right. So, that means a 1 derives a 1 a string with a 1 and plus some non-empty things. Now, both these things because again it is a the grammar is Chomsky normal form it is not possible that both of them finally, they derive the empty string this both of these alpha alpha k together could be a string of terminals and non-terminals. But it is not possible that both these strings derive only empty strings. So, in that case what it means is that I will have a 1 deriving something let us say x a 1 y finally, when x and y I should be able to find terminal strings x and y such that a 1 derives x a 1 y where both x and y are not empty right. And now also because see this is this fact I am getting because of the cycle right. Now, from another course we have another fact about a 1 that a 1 does generate some terminal string which is non-empty why because a 1 is not a useless symbol. So, a 1 has to derive some terminal string because a 1 is a non-terminal of the Chomsky normal form grammar. So, now put these two together what does it mean just see this is right away it is very simple that from a 1 you can get x a 1 y then you you can just keep on doing it as many times you know this a 1 then you can again replace it by x a 1 y. So, you will get x x a 1 y y then x x x a 1 y y y something in that form of R u v w x y theorem. And then finally this a 1 you can rewrite by the terminal string z and. So, therefore, a 1 will derive what x i z y i for all values of i greater than equal to 0. And what also we know. So, we can say a 1 derives this string and for two different i's let us say i and i plus 1 it cannot be that x i y i and x i plus 1 y i plus 1 these two strings are same why because x and y cannot be both empty together. So, therefore, it is very clear that in this case this a 1 will generate infinitely many strings. And then of course, they will be starting from the start symbol s because a 1 is reachable because g 1 is a Chomsky normal form grammar you will have some string starting from s with a 1 in it. And now this a 1 can derive infinitely many strings. So, therefore, s itself can derive infinitely many strings the idea is fairly simple. And all that depended on is the fact that we have such a cycle and we have crucially use the fact that the grammar g 1 is a Chomsky normal form grammar. So, now we consider a very important problem the so called parsing problem that given a context free grammar g and a string x you would like to know if x is in the language generated by the context free grammar g. Now, let me write this that without loss of generality we assume that this g is in Chomsky normal form. See any grammar g for any grammar g we can create another equivalent grammar in Chomsky normal form Chomsky normal form such that the two grammars will generate the same language except possibly the empty string. So, the only case that is left out will be the case of when your x is the empty string, but that can be handled separately and will not spend a time on that. So, it is not too difficult to see whether epsilon is produced by an arbitrary grammar g because all it means see remember that we define the set of nullable non-terminals and if that means all those non-terminals which can derive the epsilon string and if the start symbol happens to be nullable then of course the language will contain epsilon. So, that so case of epsilon when for when x is epsilon that can be taken separately otherwise we might as well can take that the grammar g is in Chomsky normal form and now the idea is actually there are two algorithms we can talk of one is a kind of brute force algorithm and therefore, it is extremely inefficient. So, let me just without writing anything let me just say what is that brute force algorithm. So, basically what you can do is you can keep generating keep enumerating the parse trees of the grammar g in increasing order of its length of its frontier. So, it is not it is of course messy, but conceptually it is fairly simple and so as you keep generating the set of all parse trees of the grammar g and once you cross the length of the string x then of course no further parse tree could have produced x by then if you have not produced generated x as a frontier of one of the parse trees then of course x is not in the language. However, you can see this will be an exponential time algorithm as it happens we have a far better polynomial time algorithm for this problem and that is an interesting algorithm it was given by originally proposed by proposed by Koch Younger and Kasami three people. So, that therefore, this algorithm is called the CYK algorithm. So, we will describe what CYK algorithm is essentially this is a dynamic programming algorithm. So, suppose your x is the string and without again loss of generality we can assume x is a string of non terminals although I have just said the string x, but really speaking we are interested when x is a string of sorry a string of terminals. So, x so let me let me write away see that x is in sigma star where sigma is your set of terminals. So, this is the string x and let me identify portion of the various portions of the string as let me call this thing as x ij this part provided and what is this way of naming sub string contiguous sub string of the string x what we mean is x ij is that part of x which begins at the ith position of x and is of length j. So, this there are two you know this is indexed by this x ij. So, what I mean is x ij is that sub string of x in fact the unique sub string of x which starts at the ith position of the string x and is of length j. Now, suppose in some way suppose we have a way of determining the set of non terminals that can generate x ij. So, let me call this set as v ij so what is so let me write it kind of formally v ij is a set of all non terminals such that a this non terminal all these non terminals derive the string x ij and suppose we manage to do it for all possible values of i and j then it is quite clear that the string x is in lg if and only if v 1 n contains the start symbol where the input string x is of length n I mean this is really this is just coming from the definition of the set v ij. So, you know it is saying v ij is a set of all non terminals which can generate x ij and the entire string x is of course it starts the string starts at the position 1 and is of length n. So, x ij is nothing but in this way of looking at x is nothing but x 1 n. So, by definition s if s derives x 1 n then of course the string x is in the language and similarly if s does not derive x 1 n then it is not in the language. So, really we have not done anything much by stating this way. So, the however interesting thing is what we require is of course we require therefore v 1 n is not it in order to see whether the string x is in the language or not. Now, what we are going to do and that is done in a typical dynamic programming paradigm that we will start with the smallest value of j for which we you know this these symbols are defined and then. So, essentially first we define v i 1 for each i then v i 2 for each i see the range of i in these two cases will be different and this way we go on right here i will range from 1 to n here i will range from 1 to n minus 1 because you know at position n you cannot have a string substring of length 2 starting at position n they can be only a substring of length 1. So, the point I am saying is making is that this is the crucial point that given v i j I mean let us let us write it slightly differently. We can define or determine v i j provided we already have the definitions of all v i k where k is strictly less than g of course, k can be not be less than 1. So, that is the range. So, so I will go get into the you know details a little later, but already we you can see the dynamic programming flavor of what we are trying to do because you see these things are very easy to see determine what are v i 1's we will see that. So, the point is that from v i 1 I will be able to determine v i 2 from v i 1 and v i 2 I will be able to determine v i 3 from v i 1 v i 2 v i 3 I will be able to determine v i 4 and so on. So, this way we keep on building upwards for this index and then finally we will be able to get the value of we will be able to determine what is this set when j is n and then we are through we require v 1 n and that is the dynamic programming approach as you can see. It is therefore prove this crucial thing I mean you see that if we can manage to prove this that we can define value for a value j provided we have the definitions for all the j's which are less than this particular j. So, this is what we said that we can define v i j the set v i j we can set of non-terminals which generate the substring starting at i of laying j provided we know all those non-terminals which can generate for all v i k no see basically here actually I will just not I will not only need v i k, but also and let me just say for all i as well because I will I am going to need. So, let me let me write it this way for all v l k where k is less than j where l is you know all possible values. So, this is what we are saying in other words that if you are you know this set if you if you can define for a higher range you can define for a higher range provided you know the definitions for lower ranges. So, that is the basic idea and let us see why it is doable first of all what is v i 1 v i 1 is the set of all non-terminals which generate the substring of length 1 starting at position i of the of the string x. So, is it not that v i 1 is essentially all those non-terminals see such that a goes to is a production where a is the ith symbol of x this is very simple. Because v i 1 is basically string of length 1 and string of length 1 can be generated by some non-terminals directly using the production and. So, we are saying v i 1 is essentially all those non-terminals a such that a goes to a is a production where this a is that substring at of length 1 at position i you know both i I mean you know i. So, therefore, you know what is a so you can just pick up this. So, let us now let us assume that I am talking of a string of substring of length j and suppose this substring starts at position i which all non-terminals can produce it. So, suppose non-terminal a produces this. So, I will have essentially what it means is that I am just writing it slightly differently a is in v i j if and only if a can derive x i j that substring of length j starting at position i I mean this is of course the definition of v i j. So, we are saying a is in v i j if a can derive this. So, consider this parse tree which generates x i j starting from the non-terminals. Now, can this parse tree can this tree derivation tree I mean what can be the first production in this derivation tree. Can it be see remember in Chomsky normal form there are only two kinds of production either a goes to a where a is a terminal or a goes to b c where these are two non-terminals of course one of them can be both of them can be same as a. So, b and c in general two non-terminals. So, you will agree with me that if j is not equal to 1 then clearly this production cannot be the first production that we use here in this derivation tree. So, it must it be the case that we have used the production some a goes to b c in this derivation tree. So, then this b will generate a left half of the tree and c will generate the rest of the sorry left half of the string x i j and c will generate the rest of it and this let me say that this length is k. So, what is this length is of course j minus k correct. So, therefore, can you see what b in the is in terms of these v i j's which. So, clearly b is a non-terminal which generates a string of the sub string of length k starting at i in other words a has to be sorry b has to be a member of you know this implies that b is an is a member of the sub string of v i k and where does this string start from what is this symbol not what is this symbol but would rather what is this position this is clearly the position i plus k correct isn't it. So, then similarly when it is not difficult to see that c is a member of the sub string has to be a member of v i plus k and as we said this length is j minus k. So, it is j minus k right b is in v i k and c is in v i plus 1 k comma j minus k. So, b in other words and by the way what is k what can k be what is the smallest value of k see k cannot be 0 what would that mean k cannot be 0 because that would mean that basically the c is generating this entire string and therefore, b is generating the empty string but in Chomsky normal form grammar no non-terminal can generate empty string. So, k has to be at least 1 and how big k is for the same reason c can't generate the part c generates in this derivation tree cannot be empty. So, this is less than equal to you know you can go at j minus 1. This k will can range from 1 to j minus 1 and now you see that a is in v i j if and only if there exist. So, let me say this a is in v i j if and only if there exist b and c such that b is in v i k c is in v i plus 1 k comma j minus k and in v i plus 1 k comma j minus k goes to b c is a production of the grammar. So, do you see that suppose in other words you just think of the other way that suppose you have the definitions of v all v's where the second index is less than strictly less than j then surely because you see the second index here I am using k and here j minus k and in both these values are strictly less than j. So, if I can have the definitions of v i k's such that k is strictly less than j then I can determine whether or not a is in v i j and therefore, I can determine I can take each of the non-terminance in turn and look at it that whether or not it will be in this set and therefore, I can define the set v i j. In other words this is the dynamic programming recurrence I am not writing in that form, but you should be able to see that I can define v i j provided I have the definitions of v l k where k is strictly less than j and that is the idea. So, therefore, you see what we started up by saying that we need to prove this in order to get the basic crucial idea behind the dynamic programming algorithm that now we have managed to demonstrate that we can define v i j provided we already have the definitions of all v l k where k is strictly less than j. Therefore, you can see the C y k algorithm is going to be at the higher level very simple to describe your input is Chomsky normal form grammar g string of terminals and suppose the length of the string is n then determine v 1 n and output s that means x is generated by g x is in the language l g if and only if the start symbol of g this start symbol s is in v 1 n. So, now all I need to do is to tell you in a more precise manner how v 1 n will be determined. So, how do we determine v 1 n which is what we need to do first of all we determine all the v i 1 for different values of i and that is the same thing that we have written here that v i 1 is this all those set of non terminals a such that a derive small a is in p this small a being the ith symbol of x this is quite clear. Now, what you are going to do? So, therefore, you have determined all v i 1 you know this length one things we have determined and then we will have this j j will vary from 2 to n the length now the bigger lengths that we are considering already length one v is taken care of. So, j will range from 2 to n and we build up in that order that is the dynamic programming. So, j is ranging from 2 to n. So, starting from 2 then 3 then all the way up to n what we do we now once I once I have fixed a j then for that j I define all v i j's see that is the idea so I can range in this once j is fixed with for some value and clearly I can range from one is it not such a substring can start from the first position of this of the given string x and all the way up to n minus j plus 1. So, now we are defining v i j and this is just to say that v i j is empty to begin with because you know it can be empty and then you see what I have written here is exactly the same thing that I am writing that for k ranging from 1 to j minus 1 this is what we had seen if you remember if you go back and you can see this that k can range from 1 to j minus 1 that is exactly what we are writing here k is 1 to j minus 1. The v i j is basically this set and I am taking it with the old union so that in case this is empty this whole thing will become empty that is fine. So, is the set of all a see it is it is like this that we fix a value of k get something and all that. So, what we are doing is that a all those a such that a goes to b c is a production and b is in v i k and c is in v i plus 1 k j minus k that is exactly what we have written here that this b part generates the string starting from i of length k for some value of k that we have said and c is generating the rest of the string v i plus k. So, therefore, c has to be in v i plus k because that string will start from this position i plus kth position and its length is going to be j minus k that is what we have written. So, you see so essentially we are saying that first c all those a's such that it goes to b c and b generates a length 1 string and c generates rest of the string and then with that we take the union you know the k is as k is varying we are you know getting different sets here and all the I mean the we are taking their unions and therefore finally when we come out of this loop v i j would be determined for certain value of j and then we go to the higher value of j j plus 1 and then j plus 2 and so on and finally when I manage to find the value n for j then that is of course v 1 n and then I look at whether the start symbol is in v 1 n if so I say yes the string is generated by the grammar otherwise I say the string is not generated by the given grammar g. So, that completes the description of or c y k algorithm and in fact as I have said that this is an interesting algorithm and we said also that this is a fairly efficient algorithm at least it is a polynomial time algorithm. So, the time complexity of this algorithm is not too difficult to see see this part this is a for loop which is what. So, you are iterating this for loop as it will you know this body will be you will this body will be repeated n times where n is the length of the string length of the string x is n and v i 1 is essentially what you know how much time would you take to do this essentially you are looking up the production set and the symbol a and therefore, you can generate this as you can see this will be linear in the description of the grammar and x and now therefore, what will be the you know the determining time complexity. Determining factor for this algorithm is essentially this for loop, but this for loop nest is has within it two other for loops you see this for loop this range is n of the order n this is again I can you know the it starts from 1 and the highest value is you know whatever j is starting from 2. So, n minus 1 again that is order n and since j can go all the way up to n. So, this k can be at most of order n. So, you can see this entire loop saying will be done in order n cube where n is the length of the string and this body what you are doing you are looking up the production set and you are looking up the already defined values of v's and that again to look up the production set that will be of the order of that you know looking at the production set would mean looking at examining g. So, again we can say this is essentially other time complexity will be determined by bounded by n cube and multiplied by the size of the grammar and if your grammar is already known because often the problems are like that that the grammar is fixed and you are essentially giving different strings and to see whether that string is in the language or not generated by the grammar. In that case this length of the grammar g is a constant. So, therefore, we can see the time complexity is order n cube for a fixed grammar g and when g is also in the input. So, basically it is size of g multiplied by this n cube and that n c this n cube is coming from this is the three nested loops each one is of order n and hence n cube that is clear. So, that completes the time complexity it is a very simple time complexity computation for CYK algorithm. Thanks.