 Okay, so today we'll have the last lecture on parsing. So we'll finish everything for you to understand how parsers work and how to build one. And with that, you can then go ahead and build pretty much any compiler. You have all the tools ready for you to build the AST, the type checking, generating code. So a few things remain to be covered. I think four smaller items. What it means to disambiguate the grammar. What is an early parser? What is a syntax directed translation? And we'll refresh the CYK parser from last time also. So these are the things. We'll start with the CYK parser. It is a parser that builds the parse tree which is the result of understanding the input string bottom up. You may still recall how we actually derived that parser. We started from a prolog simple parser which looked exactly like the grammar. We took the grammar, pretty much rewrote it into prolog. That gave us a recursive descent parser. It worked great, it was slow. We could not use left recursion so we had to massage the grammar a little bit to get the parser, but it was fine. Then we took a look at the parser and realized, look there is no negation in it. So with a little change, we can turn it into a parser in this data log language which is a well-behaved subset of prolog. And that we can evaluate bottom up not with backtracking, essentially with dynamic programming algorithm. And we got the CYK parser. So before I'll show you how it works, remember what is a grammar. A grammar has the so-called terminals and non-terminals. The terminals are things that appear in the input like commas and semicolons, the tokens also. Then there are these non-terminals which are always on the left-hand side of some rule and we can rewrite them into something. So type is rewritten either in or float. And the non-terminal here is always the start non-terminal from which we start deriving the grammar. So here is a simple grammar very typical in programming languages. What sort of language fragment does it encode? Declarations of variables, a single variable, or a list of variables? It's a list of variables, right? Okay, so here is one input string. I'm showing ID because that's the name of the token that comes from the lexical analyzer. Each of these IDs, of course, comes with the name of the actual variable. And here is how you could derive the string. You start with the start non-terminal and you rewrite it into the right-hand side of the production which is type variable is semicolon. There is no other choice because declaration has only one production. Then that is rewritten by taking this production, type goes to int, we are leaving the rest unchanged. And now we keep rewriting until what we have is nothing but a list of terminals, the thing that appear on the input. So deriving a string is essentially this process. Parsing is the opposite process, taking the string and figuring out what sequence of derivations would produce it. How do we visualize such a sequence of derivations? Looking at this list is instructive but not really convenient. What data structure do we use to visualize it? A tree, the parse tree. So here is the parse tree for this derivation you can see what's going on here. The root is always the start non-terminal. And then the leaves of the tree are the terminals. And what about the internal nodes? You look at this, see you can write this box here. So the parent and the children and the edges between them correspond to a production in the language that was used in this derivation. So if you look at the parse tree, here is one production, another production, another production, and another one here. So that's a refresher from last time. Other questions about this? So just to make sure we understand, could this parse tree be obtained with a different derivation sequence than what we have here? Is the derivation sequence unique or not? We could take some of these, for example, variable list and do essentially more recursion, but that we could do, but it would then create a different string at the end, right? Because if we take one more iteration in the var list, we would create a different list. What I had in mind when I asked the question, to obtain this string, is there just a single derivation sequence? The one whose fragment you see over there or could I create different sequences of derivations that obtain the string? Who thinks there are multiple? Okay, who thinks there is only one? All right, so what if I change this? Here, I rewrote type into int, and then I presumably started rewriting var list into this list of two identifiers. Could I reverse the order? Could I first change the var list into a list of identifiers and then at the very end, turn the type into int? I could. So indeed, different sequences of these derivations could produce the same parse tree. And in fact, if you think about this, as you are going through this derivation sequence, you are gradually building the parse tree, and parsers differ in how they build the parse tree. Some of them top down, some of them bottom up, and so each of them essentially derived the strings differently. So there are different orders, but you probably don't need to remember that to understand the CYK parser. So the CYK parser works like this. You can visualize it in many ways. One of them that will probably be most handy for your project is to think about the CYK execution as a graph. So at the bottom, we have our input from the grammar that we are looking at so far. We usually give all these tokens and non-terminal subscripts just so that we can distinguish this ID from that ID. And now what do we do next? We start with this graph, and then somehow we process it until we can say this string is from the grammar or is not. So what would be now the sequence of actions that we take? How would you look at the string and determine whether it is in the grammar? Okay, so these are our terminals. That's our starting point, so we took non-terminal, I see. Okay, so all right, exactly. So what do we do specifically when we'll see somewhere in the graph something that is a right-hand side of a production? We'll say, oh, we have recognized the right-hand side of a production. This production here has only one right-hand side. It's the int. We'll place this edge here and say, oh, what do we know? This part of the input from here to here, this part of the input can be derived from this non-terminal. How do we know? We have found an int and int is on the right-hand side of that, so this entire thing from here to here is derivable from this non-terminal. How about this? A single variable is a variable list because there is a production in the grammar that says so when we have found an ID between here and here, so we place this edge. What would be the next step that we take? Now you should be able to see what is the next step that the parser will do. I'm sure more people know than one. So which step are we going to take next? It's okay if you aren't sure. So we'll go from here to where? Right after the ID, okay. So are we ready to, we'll eventually do that, right? Because there is some non-terminal in the grammar from which we can derive this entire thing and then say it is followed by a semicolon. Oh, I see, one more. So we are starting here, uh-huh, between two and three, okay. Is that right? All right, we need to find a better remote control. So it would be, by the way, great if you could draw. All right, so you don't see the grammar, so it's harder for us to double check, but let's go back here and see which production are we using, right? Here we have one, two, three, four, five productions in the grammar, right? Whenever we add an edge into the C by K graph, we are taking some production and we are saying, oh, we have found the right-hand side of a production in here. So this part of the input is derivable from the left-hand side non-terminal. So which are we using now? Are we using this one? We don't quite have a production for this, right? Because what we have here, we have found type ID colon, okay? And there is no such right-hand side in this grammar. So we are not quite ready to do that. So what would be the next step? So this would be a var list. In fact, why am I drawing it when this is the right step? What would be the next step now? Okay, that's right. So now we have found another right-hand side of a production, okay? Which one? Let's go back and look at the grammar. The right-hand side is var list comma ID. I should put the grammar here. I'll do it before I post it. So what right-hand side we found? We have found this edge followed by this edge followed by that edge. So not this, but that. So var list comma ID is the right-hand side. And what do we do? We place this edge for another var list. So from var list, we can derive both this and that, and the whole string as well. All right? What's the next step? We'll match the whole thing, which is, I believe it's called declaration, because we have now a type, a var list, and a semi-colon. And indeed, this is what we do. And now we cannot add any more edges, but we can ask the question, is the input string from the grammar? Well, technically from the language of the grammar, which is the set of string derivable from the grammar. We can ask the question now because we have placed all the edges we can. We have reached a fixed point. And so this string is from the grammar because we now have an edge from start to end of the string that is labeled with the start non-terminal. And that is a proof that that string can be derived from that start non-terminal in some number of steps. Okay, exactly. So the question is, what do we mean by we have placed all the edges? So the way the algorithm works is that it looks at the graph, and if it can find adjacent edges that correspond to the right-hand side of some production, it will say, oh, I have found a piece of string, piece of input that is derived from the left-hand side of that production. And you place a corresponding edge to signify that you have found, oh, this is actually an expression or a declaration. At some point, you will have placed all possible edges. Well, one reason to see why this will terminate, the process is that you have a finite number of inputs, you have a finite number of non-terminals, so there are only so many edges you can place. So it will stop at some point. You essentially reach a closure. Exactly. So you're now asking, what if the input is a syntactic error? For example, that semicolon is missing, right? Or maybe there are two semicolons. Then we would not be able to place the edge for declaration because that crucial semicolon here, which is on the right-hand side of that production, would not be on the input, and so you cannot place the edge. And after you place all the edges, you are ready to ask the question, did we actually parse it correctly? And the answer would be no, because we would not have this edge from start to end. Well, let's try it. So imagine that we have a, I'll do it this way. Let me extend the input this way here. Here we have a semicolon, so now we have two, right? Okay. Look at the grammar because the grammar is important. You could of course, you could of course define the grammar such that it permits any number of semicolons, but let's see whether this grammar allows it. So this grammar is what? This is the start non-terminal. So this grammar derives exactly one declaration rather than a list of declarations. Just that's the fragment of the grammar use. And it has exactly one semicolon at the end. So we would be able to place this edge here for declaration as we did. So the start non-terminal can derive everything from here to here, but we would not be able to place an edge from here to here because we do not have a right hand side that has either type, variable list, semicolon and another semicolon or something else. So this edge would never be placed and therefore the two semicolons would be a string that is not in the grammar or in other words a syntax error to the very end, right? Which is another word of saying, I am able to take the start non-terminal and use some productions to derive that string. We are discovering whether such derivation is possible, bottom up and if we can place the edge we have found this derivation, okay? So now continuing on your question, here is the parse tree, right? We have seen it before on the previous slide. Here is the result of the CYK parse. The result of the parsing, we don't really care about that graph. The graph is there only so that you can do the parsing so that you have a way of saying, oh I have recognized variable list between here and here and you use the graph to remember whether you have already placed the edge so that you know no more edges can be placed. It's an intermediate data structure but really we don't care about the graph as a result of the parsing. We care about the parse tree because that's the result of the understanding of the input. So how do we find that parse tree in that graph? Is there a way to find that parse tree in the graph? You will all know it because you will implement it but it's nice if you could spot that parse tree in that CYK graph, okay? So indeed we have a route and we have pointers to the children, okay? And now we do this and this one has exactly. So you can see it better here. And now, so do you see anything fishy here? Uh-huh, so there is an edge here, bar list eight which is not part of the parse tree, okay? How would we, that's a, this is a problem for several reasons. Not a big problem. We'll revisit it later in the lecture but when building the tree you would like to somehow figure out that it's not part of the parse tree, right? It's this one here. It's not part of the parse tree. How would you know not to include it in the parse tree? How would you build this parse tree, okay? So the, a proposal is to build sort of a largest possible span. I don't know whether that's quite the rule. So what rule should we use to build the parse tree? What does the parse tree signify? It shows us that this input, right? Remember, no, not this input, this input. The leaves of the tree is the input. It has been derived how, well, this ID was derived from this bar list. Using this, this is one production, right? And here is another production that we used. So the parse tree always between the parent and the children shows the production that was used to derive that input string. Does the CYK parser discover these productions? Or do we need to somehow, during the post process of the graph, discover them, right? So again, the parse tree shows exactly the production that we're used to derive the string. CYK does not quite perhaps show them explicitly, but does it discover them for you? Exactly. Those productions that you see in the parse tree are those that derive the string, and those are the production that CYK discovers. Discover when? Well, for example, when you discover that you see bar list adjacent to a comma adjacent to ID, and therefore you have found this right hand side, and therefore this derives the entire string, the substring. What do you really do? You have discovered that this production is going to be used in the parse tree. So you find the right hand side, place the edge, at that time you can build the parse tree. That's as simple as that. So the derivations, the rules that you use to build the CYK graph, I used also to build the parse. So the CYK parser builds the parse tree bottom up. You know, if you have a grammar with such rules, whenever you find BC adjacent in the graph, you reduce BC to A. So these steps are called reductions. We saw the algorithm already in lecture eight, or it is there in lecture eight somewhere, but knowing this should suffice. Okay, so let's move to ambiguity. Now things become more interesting. Have a look at the grammar. This is a sort of very simple grammar of expressions. You can have variables plus and multiplication, and here is our input, which is at the bottom. Question number one. Is this CYK graph complete? Is some edge missing perhaps? So who thinks we may be missing an edge? Okay, good. So what edge are we missing? The complete edge, right? Perfect, it's here. Good, so we have the edge here. Now it's complete. I can tell you that much. How many parse trees are here? You just saw how to build the parse tree. Now is there zero parse trees, could that be? It couldn't be, right? Because we do have an edge from the beginning to the end from the start non-terminal, which means there exists a parse tree in the tree, okay? So not zero. Now one or more than one? This is computer science way of counting, right? Zero, one or more than one? So we have just two more options. So one or more than one? More than one, okay? And so let's find them. So one of them is here, right? ID goes to E, and now we'll do the plus here. And then this one goes like this. Oh, sorry, there is an E. So this is the one where plus is on the left, sort of lower in the tree, and there is the other one where star is lower in the tree. So there are indeed two, and that's because there are two productions that we could use here, right? Where are they? Well, here is one. You take this E times and that E, and the other one is this. So you found two right-hand sides adjacent to each other that both want to place this edge. So you can think of it as, this edge here we have twice. One's for this, one's for that. You may wanna keep it in the graph only once. Doesn't quite matter. You need to make some smart decision as far as how the data structure is recognized. But one is, again, here, and the other adjacent three edges, also E plus E is right there. So what's up with that? Do we care? We don't care. Is this a problem or not a problem? Probably it's a problem since we are talking about it, but why is it a problem? Right, if we care about the order of operations to be meaningful, right, then presumably we care, because one of them places plus lower in the tree, which means it's evaluated first, which means it's higher precedence than times, and that's not usually what you want, I think. All right, so this grammar is ambiguous. So what does ambiguous grammar mean? It is a grammar which, for at least some input, has multiple distinct parse trees. It's not ambiguous, but no matter what string you feed it, it will have one parse tree. Well, or no parse tree if it is not from the grammar. And it encodes grammar, therefore, whether this depends before that, whether associativity of an operator is left associative or right associative. So the parse tree is important, therefore we want to design a grammar that has only one parse tree. In other words, ambiguity is bad because if you get more parse trees for an input string, then what could happen? You build your compiler, you pick one of the multiple parse trees, and that corresponds to some semantics, times being done after plus. Somebody else builds the compiler, and they choose the different thing, and their compiler behaves differently than yours. So ambiguity is really common. You saw it in arithmetic expressions. We mentioned it briefly for if than else. So we need to deal with it somehow. You'll have an elegant way of dealing with it, but let's understand it first better. So in case you need to really digest it, here is a grammar like that before, except the added parentheses, and we have two strings here. And let's see. If I give you int plus int plus int in an ambiguous grammar, you get that. You clearly want the one on the left because plus is less left associative. It doesn't really matter for integer addition, but matters for floating point because floating point computed on this and on that could actually be different. With multiplication, you again want the one on the left, right? Because it times must be performed before you do plus, right? It's lower in the three. So how do we deal with ambiguity? There is clearly no automatic way of doing it because when you write a grammar that has multiple parse trees, you need to choose the one which you want. No algorithm can pick which one you want. It's the programmer who must decide, I want this or that, meaning the programmer needs to say plus is left associative, for example. Ambiguity seems dangerous, but it's actually useful when used in care because it leads to much simple grammars. So we would like to use ambiguous grammars and just put in some declarations to say, well, from among all those parse trees, pick the one that is left associative, all right. Then that fails and that approach is pretty good, but not general, right? You need to rewrite the grammar by hand to make sure it only generates parse trees that you care about. In PS6, you'll write the grammar for the Google calculator. Remember the one with units? And on that grammar, these declarations will help you somewhat, but not fully. So you actually have to rewrite the grammar by introducing new non-terminals to make sure that just the right parse trees are created. Okay, so how do you disambiguate with precedence declarations, okay? So instead of rewriting the grammar, we keep the grammar ambiguous but we'll tell the parser which of those parse trees to choose. So here, we wanna choose the one on the left. So we'll put this declaration into the grammar. We'll say plus is left associated and the parser will discard this parse tree and choose the one on the left, okay? In this example, you want the rule on the right. So you will now have this declaration which says that star has higher precedence than plus because it is lower in the list, at least that's our convention and therefore the parser will choose this one over the other one. You can have operators with the same precedence and they would appear here on this list because they are all left associated with the same precedence, so essentially considered equal. So how do we actually implement them? Now, we need to start thinking what the rule will do and that will be another implementation task for you. So imagine that we have already reduced the input to E plus E times E. So we started with only terminals and then we placed some edges and we got into this state. So this, whatever is derived from these E's could be arbitrarily large part of the input but we have already recognized that that part of the input is an E. So to the parser at this point, the input looks like this. Now the next step is what we need to decide whether we are going to reduce this into E times E, essentially placing parentheses here or whether we are going to reduce it to E plus E, essentially placing parentheses here. So which one do we want, first of all? We want the one on the right because we want multiplication to be lower in the tree and if we reduce this first, then it will be pushed down to the tree because the result of that will be, this will be an E and now we have what on the input? Now we have E plus E with this multiplication being lower in the tree. So indeed we want this one. How would we make that decision? I can tell you that you do need to consider all possible reductions that you could make at this point. So what reductions can you make? You could either reduce this to E or you could reduce, well, you can do both of these, okay? Let me actually create a bigger example. So we now have edge for E plus E times N E, okay? We'll make this reduction with one E. Here is another E, this corresponds to the star, okay? And now what reductions can we make? Can we make more than one? Yes? It could now make E that corresponds to this, that, and that, okay? And there is the other one that corresponds to E plus N E, okay? So you see we now have two different ways how to place an E edge from the start to the end. Each of them choosing different children. One, there is another one, right? So here is choice one and here is choice two. Both of these choices place the same edge. Now which one do we choose? And what rule would we use, presumably? So let's think about that because if you understand it now it may save you a long time in implementing the choice of the two trees, okay? So we choose the bottom one because it does push the star deeper in the tree. So if you look at the choice as a competition between taking this production and that production, this one is which one? E times E and this one is E plus E. Both of them have the same left-hand side, right? So if you view the choice as a competition between this production and that production, which one are you choosing? The one with the lower precedents or higher precedents? With the lower, right? And that leaves the higher one lower in the tree as you want, okay? Now if you come back here, similarly for left associativity we have a choice of doing this or that. Now the choice will be what? We now need to choose between two productions, one with a smaller subtree on the left and the other one with a bigger subtree, right? So now the competition is one between one that has sort of a small subtree on the left and another one that has a bigger subtree on the right, on the left. And which one do we choose? We choose the one, yes, this is our choice here because that will push the pluses down the tree, excellent. So fantastic understanding, okay? So here we spell out what I just told you and that's ambiguity. This is what will be implemented in the parser. Now we are going to talk about the early parser now and you may be surprised, I'm already talking about the implementation of the disambiguation, which is the choice of multiple trees should multiple trees arise. And I haven't even told you about the early parser. Our early parser is essentially the CYK parser. It builds the tree bottom up, but it is a bit more efficient. Well, more than just a bit, but let's go through it slowly. So in other words, the whole disambiguation business and parse tree business is the same in CYK and early. So let's talk about early now, essentially as an optimization of CYK, okay? So the problem with CYK that it could place edges that will not show up in that parse tree. And you could say, well, who cares? We have garbage collection. They will eventually be cleaned up. And that's true, except it has also problems with asymptotic complexity. CYK would be cubic and early would be quadratic. And in fact, if you give early grammar written just so it will become linear as you will find out on the homework that we just assigned today. So does it really matter? It seems like N squared and cube is roughly about the same. So can you tell me how much parsing will early do? Let's assume this quadratic early compared to the cubic time C like A. So imagine I give you 10 seconds. Maybe you are willing to wait 10 seconds to parse your input file. So how much work will early do in the time compared to CYK? I want you to start thinking about it because the way you'll debug the early parser that they give you, well, you won't debug it. You'll make it more efficient. We gave you an clean, beautiful, but inefficient implementation of early parser. We put into it instrumentation statements which will tell you what is the asymptotic complexity of the parser. You run it on small input, bigger input, bigger input. And then when you look at the numbers, put them into the spreadsheet. It will tell you, oh, it's N to the fourth, actually, because it is so beautiful that it is so slow. And then you will try to understand where the inefficiency comes from and fix the data structures and other things to get it down all the way to linear execution time. Now I want to convince you that it actually matters to look at these asymptotic numbers. So if the N squared early in some unit of time, say 10 seconds, parses say 1,000 characters. How much will CYK do? 100, excellent. So is there a difference between 100 and 1,000? It's only a factor of 10. As it happens, it is that factor of 10, which if you don't do the job right, you will say, I cannot use my parser. Can I continue using your parser on the cloud because mine is dog slow? So indeed, it is the difference between 10 and 1,000, but think about it. Most files might be 1,000 characters long versus 100, which means CYK will not parse them while early will. And so that is actually a big difference. So let's see, look at the graph and find some edges that CYK will build that are actually not part of the parse tree. So how many are there? Now we'll do the mathematical counting. So who thinks there are zero? Clearly not. Not one because I'm using plural. So are there two or three or four? So who thinks there are two such edges? Look at the grammar, by the way. The grammar is a bit different. I'm using now ID here rather than E. That matters. So how many edges? Who thinks there are two useless edges? Two, okay. Three, all right. Four, so it is three. It is this one here, this one here and this one here. Why, well, you can see the parse tree is here, here, okay. So there are three of them. And it looks like, well, three out of what, 10 or so. But yes, if you look at the big graphs, it would actually, the number of edges would be the difference between being able to do 100 units of work versus 1,000. All right. So the key idea is what? The key idea is to rather than going completely wild bottom-up as CYK does, right? Because CYK can start placing these edges wherever it wants. In a sense, it's a beautiful parallel algorithm because you can start doing the closures, the edges on the right, while somebody does it on the left, somebody does it in the middle, then eventually they'll merge. It's a beautiful parallel algorithm. Never mind the asymptotic complexity is so bad that the parlism would not buy you much unless you have infinitely many machines. So we won't do it quite wild that way, but we'll go left to right. We'll really process the input left to right. And we'll only produce those edges. We'll only do those reductions that appear at the given point like they still have a chance to be in the parse tree. When you go down the input furthermore, you realize, well, they are really not part of the parse tree, but at that point, we couldn't tell. So given the left context of the input that we have seen, we'll do our best suppressing inputs. So we look at the input seen so far and decide what edges we are willing to place. So we'll propagate a little bit of a context of what we have seen so far. That context happens to correspond where in the parse tree we might be. And we use that context to place only useful edges, effectively suppressing those that would be useless and cost us too much work. So here is a grammar that we'll use. E goes to some term plus ID or ID and T just goes back to E. A silly grammar, but beautifully illustrates what we are going to do. We are going to use CYK edges, also known as reductions, but we'll have a few more edges. You'll see them soon with the dots. The dots will be the context. And the crucial question is this. We ask at each point, initially starting at this point zero on the input, which is before we have seen anything, okay? What CYK edges could possibly start in here? Having seen zero of the input because we are at the very left. We don't have much context. We have zero context, but we have the grammar. So we do have quite a bit of knowledge there, right? And so we ask, what CYK edges in the CYK graph we could possibly see here on some input string? Not on this input string because of that input string we have seen nothing so far, but what edges could actually start here? Okay, so let's think about that. What edges could be there? Okay, so I'll write them here. So the one corresponding to this, clearly these two could be there, right? Why? Because they could be in the root of the parse tree, right? And they would go all the way down, right? They would, in other words, span the whole thing. They are edges of the start on terminal. And then this one as well, E goes to ID. These two can definitely start in here. Could some other edge start there, right? Also this one here. So in this case, all of them, but in some grammars, not so. Why can this edge, T goes to E, also appear there? Because you may need it to produce the E, right? Sorry, you may need this edge here to produce the T which is then used here, right? In other words, you could have here an E from that you obtain a T. And given that you have a T, you can now have this. So this T and E edge, all three of them could appear there. So in other words, what edges can we have? Those that reduce to the start on terminal, like these two, or those that may produce non-terminals needed by one, right? So you need an edge that produces T, and here it is. And those that may produce non-terminals needed by two and so on and so on and so on. So in order to determine what edges could be on the left, you do these steps until you are sure that you have found all the edges that could potentially start there. Those you must not suppress because you could fail to parse the input. It's not quite significant. By the way, E is recursive, right? Because you can go from E to T, right? And then E goes back. It's not directly recursive. It's indirectly recursive through T, but it is recursive. The reason why I have this extra T is that I wanted to show you this process here, right? I wanted to show you that you need to place, you need to allow the T goes to E edge on the left because if you wouldn't put it there, then you would never get these T, this one, which you need to process that. So this interaction is there to show you how this fixed point, finding those edges, needs to consider a chain of edges, right? Always the left top, well, the first non-terminal mentioned in the grammar is the start non-terminal. That's a convention that you can rely on. That's right? We always, okay, so the question is whether the complexity depends on the size of the grammar and it does, but we'll consider the grammar to be constant. Yes, you will see that as you add into your grammar, your parser may get slower. But in general, grammar is fixed and only the input grows. So that's an excellent question. By the way, early could be cubic too for these ambiguous grammars, but in general the ambiguity is somewhat local and so that doesn't blow up all the way to n cubed. Uh-huh, excellent. So in fact, I'm showing you only the rule that applies at the very start. Right, so what I have written is correct when you are deciding what edges could start at this point. Right, at the point right here, okay? Now inside the grammar we'll do inside the parsing we'll follow pretty much the same rule. So it's best now if we run through an example it will become clear. So we are now at the result of the first step. We have determined which edges could start at this point. Right, we have found these edges, okay? Now we are not going to draw them going into the heavens. Right, that would not make much sense. What we'll do instead, we'll make these edges loop back, okay? To signify that we have seen nothing from the input and the special dot is going to show you where in the input you are. The dot simply means that you are at the stage where you're expecting an ID, or you're expecting an E, or you're expecting a T plus ID. Okay, it will, I know this seems puzzling but as we play out the example and then you can play it out again on a few animations that we'll have on the web, it will become clear. So our cursor is here on the input and we want to move the cursor now here. We could do it because we have already found all the edges that could potentially start in here. They don't go to the heavens, they loop back but we are able now to proceed and move the input from here to here. What do you think will be the result of reading that ID one? Okay, I guess it's all right. So we'll go and having seen this, we are going to move the dot in here and what does it mean? This edge which is now self-looping will now move its endpoint to pass the ID. Oh, by the way, there is one more step. I usually represent these three edges as one edge with three labels, just for compactness but now we can go and this is going to be indeed the edge where the dot has moved from here to here. As a result of scanning or reading the ID. Okay, so now we have did scanning, read the input, found the dot, moved it, right? As a result, one of these edges moved its endpoint to here. Well, what do you think is the next step? Oh, you have just seen the next step. Why are we doing that? So we are going now say, okay, we now have a dot at the end of the E, right? Think of it as having a dot here. So that effectively means that we are now going to move the dot in here, okay? So now you have a dot at the end of this rule. What do you think is the next step? Is there another edge that we add, right? You mean something like this? Well, do we want that or not? So before we even consider this issue, look at what we've done here. We have moved this dot from here to here as a result of this ID, okay? That's good. Then we said, oh, now we have a complete edge, right? Think of it, this is like a CYK edge. We have found a complete E here, right? So now we said, okay, if this is an E, then this is also a T. And therefore we have placed the dot here. And now this is also like a CYK edge because the dot is at the end. What should we do with this edge? It currently loops on itself. The dot is here. Are we moving the dot? So now exactly we are going to move this as a result of having a complete T. This entire string, in other words, in the input is derivable from T. So we are going to add this edge. So we have did scanning, this is called a scan. We moved the dot across the terminal. And now these two are called completion. We have this dot at the end and therefore we moved it to this end and because it was at the end here, we moved it across here. So when the dot is at the end, we sort of propagate it to the parent like in CYK, exactly. And we got this, all right? So now we are ready to answer your question. Are we going to place some self edges here? So look at what edges we have. Now we would like to read the plus, right? Because our input cursor is here, we would like to move it here, scanning the plus. The edges that we have, do they allow us to do it? How about as a result of scanning, we do exactly what we did here. We moved the dot from in front of the D ID to after it. Can we move it now here? Okay, that's exactly what we'll do. Can we place some other edges? We have now no new edge with the dot at the end. So there is no completion that we need to do, like we did in here. So no completion like we did here. What do we do next? Can we move the cursor from here to here? Yeah, absolutely. So as a result of that, this dot is going to go here. And now we have found an E. This is sort of the classical CYK edge. The dot is at the end, which means the whole thing is derivable from E, okay? Therefore, it's also derivable from T because we have this production in the grammar. And also we are moving the dot past T, right? Why are we doing that? So why did we place this? So let me ask the question differently. I think the answer will be clear. This here corresponds to an edge that we just added into the graph. This edge must be the consequence of one or more existing edges, right? So which edges are those? Essentially, the reasoning in the parser goes, oh, I have this edge here, which means I can derive something from this part of the input to that part of the input, okay? And now there is perhaps another edge or a part of the input and say, oh, that other edge with this other edge, the two together forced me to place another edge, just like in CYK, except these edges are somewhat funny because they have the dot. So this edge, does anybody know the consequence of which other edges it is? This one here, all right? So this is an edge here that essentially says the whole thing is derivable from E, okay? So could we say that we have an implication which looks at this edge and that edge and derives this edge? How about this one here? Could this edge play a role? So again, we are asking, what made us place this edge? Which by the way goes of course from here to here, but what was the reason for placing it? And the hint is that maybe this edge here, which is the self edge, goes, is one of the reasons. Exactly, the completions are always best thought of as classical CYK edges. So if you look at this here, what have we done? We have concluded that, well, this here was an E and therefore this part also is a T, right? Because from T I can derive an E using this, okay? So because of this E, I have, I'm using an implication, I have derived this T, okay? So now I know that from here to here, the whole thing is derivable from T. Now combine it with this. Here we have the self edge, which says, oh, whenever you see a T, you can move the dot past the T. Have we seen a T? Yeah, we have just seen this T, complete CYK T. So as a result, we are allowed to move this dot in here and this is indeed what happened. So these two together are the reasons for why this edge was placed. So again, we discovered that this is a T. This one is waiting for a T so that it can move the dot and we have moved the dot and we now expect a plus and an ID. This step is probably the one hard step of the whole thing. Playing a few animations will be clear. So now we'll just move the dot in here. Now it's here and we read in the ID. So the dot moves here and now we have a dot here, which means we have a start non-terminal, okay? Because we have an E, we also have a T and we also need to move the dot here even though we had the end of the input because this is what the parser does. So have we parsed the input? Yes, because we do have an edge for the start non-terminal from the beginning to the end. Exactly, so right. So you could have a grammar where you need to place the self edges there and those will then grow, right? We didn't have them because this grammar was fully left recursive. There were no time terminals there. Well, what would happen? I'll show you soon, but imagine just, okay, imagine we are here. Imagine we had a rule that says T plus E and now the dot would be here. Now you would do so called prediction. You would take the dot and push it down the E. Say, oh, if the dot is in front of E, which means it is also in front of whatever is on the right-hand side, right? So you essentially move it from the beginning of E in here and this is the rule you would place and this is a self edge because the dot is at the beginning of the right-hand side. So we are not showing the prediction here because prediction is usually the easy part. It is the completion what usually puzzles people but then moving this across T. What is the good thing about the homework that what we gave you is an inefficient early parser but it's easy to read and it comes with a visualizer that when you run it, it draws these graphs. I believe we gave you that code if not we should. Or we want you to hook it up a little bit so that you know what it's doing, please. Well, so the issue of the grammar being ambiguous is completely orthogonal, which is to say independent from suppressing certain edges. So if I give you a grammar that is ambiguous, which means it has multiple parse trees, they would still show up in the early parser, right? Because if a grammar has multiple parse trees, we cannot suppress them. We need to build them and then decide, oh, I want this one or I want that one. So if this grammar was ambiguous, we would get these multiple parse trees and then we would choose among them like this so before. So it's essentially as simple as that because it is separate. Excellent question though. So here are the three edges that we suppressed and why did we suppress them? Because we never allowed placing edges of the form ID, E dot ID. But CYK would happily place them because CYK would see, oh, look ID. And the grammar has a production of this form, so it would place this edge and so on. But if you look at the propagation, we never place this edge. So we never enabled placement of these edges. And that's it. So you can look at this from the point of view of data log program evaluation. CYK is really just the evaluation of the data log version of the parser. And there is a transformation called magic set transformation, which you can apply on a data log program and it will make it much more efficient. Exactly in the same spirit, going from cubic time to quadratic time, or better depending on what the program actually is. And I'm not teaching you the magic set transformation in general, but what you have seen is a magic set transformation of the data log program. You essentially are not allowing the derivation of all possible facts, only those that somebody may need to answer the original data log query. And we are propagating the dot to essentially keep track of what facts might be useful. And that's the same in the magic set and the same in making CYK as efficient as early. So the funny thing is that people discover these things independently. The early parser was not discovered by saying, oh, there is this subset of prologue, which is data log, and I can evaluate it with this magic set transformation, which makes it faster. And these were all independent discoveries. But after all these years, we understand things better so I can teach you two things at once. I can show you what early parser is. You also understand asymptotic complexity, why it matters to get a program that actually runs reasonably fast. And in the process, you learn about how top-down backtracking prologue implementation can be changed into bottom-up if certain conditions hold and made really, really fast. And so these are the three kinds of edges which you have seen. These are like those in CYK, the dot at the beginning, the dot in the middle. This is just writing down what we have seen. Here is the algorithm that we followed for you to read. So now to conclude in a few minutes, we can get to what we actually wanna do, syntax-directed translation. And we have seen it already in the prologue parser. I have shown you two versions of the prologue parser, which not only recognize the string as being versus not being from the grammar, but essentially they evaluated the parse tree and gave you the parse tree itself or the abstract syntax tree or did the calculation. But what we'll do in our parser that you will build in PA4, you will read in a grammar in this syntax, okay? And on an input like this, you will get a parse tree that looks how. We have ETF, we have the numbers here, two and three. Okay, what is just above the numbers is an F. How do we wrap the times? The times is under a T. The plus, okay, this goes, I'm sorry, this goes into a T with one argument using this production. And now we get a T and now we have an E, okay? This is our parse tree. This is the grammar that is not ambiguous. It uses extra non-terminals, T for term and F for factor to make sure that we get the right precedence and associativity. So this is indeed a tree that we want. Now we want to do several things with the tree. What are the possible choices? What comes to mind? Yeah, it should be an E, exactly. It should be using this E goes to, yes, okay. So this is why I want to use ambiguous grammars because I cannot parse this in my head. So using declarations, you don't have to write it in this verbose way, but you can write it in the ambiguous way and then decide, but I want to illustrate what one can do with the tree. So what can we do with this tree? Can we build a parse tree? Can we build the abstract syntax tree? Can we evaluate it? Can we pretty print it? What other ideas do you have? Can we translate it into reverse Polish notation? Could we do that? A diff, which two? You could do it, but usually what happens during parsing is that you build a parse tree and do evaluation on that one single parse tree. Now you're asking about comparing them. So you could, but you need to generate these two data structures and then outside the parser compare them, okay? So imagine we want to just evaluate it. Well, we would use a syntax-directed translation, which essentially just tells you what. For each production, we have some rule which says take the value that is the result of evaluating this sub tree and that sub tree and do what with it? Do an addition and pass it up, okay? How about the rule for this step? You take the string, such as 10, and you convert it into an integer and pass it up. So if you look here, the input here is actually a string, not a number, and the type that passes along this edge is a string. What will be the value returned up here? An int, okay? And what rule are we using here? It is the rule which says int n1 value, and n1 value is this, and the whole thing returned from here is what is being passed up this edge. So what do you see here? Is just the description of what rules you apply as you propagate the value up the tree. Here you see description which shows you, well, a grammar of regular expressions. We do need some precedence declarations here because the grammar is ambiguous, those are missing. And here we are just building a parse tree, sorry, abstract syntax tree of regular expressions. So what else could we do with a tool like this? Could you build a tool that somehow visualizes the program and helps their understanding? So instead of actually returning the AST, it could directly somehow pretty print the regular expression. In fact, we could print it in such a way that the following is possible. So this is just the browser, but when I come to a parenthesis and I highlight it, you can see the matching parenthesis. When I highlight this bar, it shows what is on the left and the right-hand side, similarly here. So I could take this grammar and instead of returning the AST, I could return what? How would you achieve the effect that you just saw? It will be your next homework. So you can spit out essentially an HTML document with JavaScript actions, which have the effect that when I move the cursor over this parenthesis, it will highlight that as well as the corresponding one. When I move it here, it highlights the two children. And this HTML document, which of course looks horrible, but you don't need to read it because it is generated through the Syntax Directed Translation, is generated by putting something smarter on the right-hand side. And you'll see you can do it with a little bit of playing and thinking. And so now with the Syntax Directed Translation, we have a tool which allows you to compile pretty print, build visualization tools that are indeed interact. And from now on, sort of the rest of the languages will build partly by doing such translations. Thank you.