 Good Monday, I guess? Yes, some of the other cases are Mondays. Okay, so let's get right into it, because we're actually really close to the end of the section. I'm actually super impressed with how we're progressing on first follow-sets and syntax analysis. So we will get right to it. Okay, so on Friday we talked about how we calculate follow-sets. So what are follow-sets? What are follow-sets? What's the semantics of the follow function? That's about anything, you're really more specific. So what are the tokens or terminals that come directly after a non-terminal? So what are possible tokens that can come after this non-terminal? And so the input is a non-terminal, the output is a set of terminals for the end-of-file operator, or the end-of-file. And so the way we do this is very similar to first-sets. We start to initialize an empty follow-sets for all the non-termals in our grammar, and then we go through each of these five rules, which we've been talking about the last, I don't know, Wednesday and Friday, so about two classes worth. So we start with, the first rule is very simple, we start by adding the end-of-file to the starting non-terminal. Sorry, but we add the end-of-file to the follow-set of the starting non-terminal, that's our first rule. The second rule says that if the non-terminal we're interested in, in this case A, is all the way at the end of the rule, then we know we can add the follow of the left-hand side non-terminal to the follow of A. Rule three says, hey, but if we're not the right-most, and there exists epsilon in the first-sets of all symbols between the token we're interested in and the end of the rule, then the same thing applies, and we can say we add the follow of B to the follow of A. Fourth rule says, whatever comes after you in the rule, right, your A, whatever comes after you, add its first set minus epsilon to your follow-set. And the fifth rule says, well, you can keep doing this, assuming that C0 has an epsilon in its first set, then you can add C1, and then C1 has an epsilon and you can add C2 minus epsilon. C2 has an epsilon and you can add C3, so on and so forth. Okay, so we went through and we used our handy-dandy rules and grammar that we've been using as an example so far, and we kind of went through, I think we were going to stop here, more or less. Okay, cool, so we just did A. So now I want to calculate the follow of B, so where do we look? What do we look at? I want to calculate follow of B. This is everything that I know that I've already done. How do I calculate follow of B? So specifically, so you said look at the first rule. How do you know I look at this first rule? Yeah, we look for every instance of B where in the rules? On the right-hand side. On the right-hand side, so we look at the B here, and so we ask, does rule 1 apply? No, B is not the starting nonterm rule, rule 1 does not apply. Is B the last element of the rule, the right-most symbol in this rule? No. Nope, doesn't apply? Does epsilon exist in every symbol from B to all the way to the end of this rule? Yeah. So C and D, C has epsilon in its first set and D has epsilon in its first set. So then I add blood to the first of B. Follow of S to follow of B, exactly. And then rule 4, so which symbol does rule 4 apply? Which is C0 in this example? C, so I'm going to add the first of C minus epsilon to the follow of A, to the follow of B, sorry. So this will give me the first of C is C epsilon subtracted by epsilon, we have C. So now our follow of B contains the end of final, C, and then we have to say does rule 5 apply? So does rule 5 apply? Does epsilon exist in the first of C? Yes. Yeah, so then I add the first of D minus epsilon to the follow of B. And so the first of D minus epsilon is containing D, and we have no more, we can't add, there's no more rules there. So the follow of B should be end of file C and the questions on that calculation. So now I'm going to take the follow of C, how many different rules do we have to look at? Three different rules, right? We have to look at every single rule where C exists on the right-hand side, specifically every instance of C in the right-hand side of this rule, right? So we go through a pile of rules, we know, so C is not starting on terminal. Is it the last element? No. No. It couldn't be the last if there are epsilon's in every symbol that comes after it's first set? Yes. Yes. So we add the follow of S to the follow of C, and then we say we add the first of the one that comes after it, so the first of D to the follow of C. So the first of D is D, or the first of D minus epsilon is D. And does rule 5 apply here? Yes? So what do I add? But this doesn't say anything about doing follow. This says add the first of something to follow of A. Is it okay? So we did first of D here for rule 4, right? Rule 4 says always do the next one. So rule 5 says kind of do the one after that. So why does rule 5 apply or why does it not apply? There's nothing there. There's nothing after D, right? So it doesn't make sense to add the first of the next thing, which is nothing, right? There are no symbols here. So one way you can think about it is C is A in this case. I would be 0. But what's I plus 1? What's C1? It does not exist, right? So this rule doesn't apply. The rule is not in the form of the nonterminal interest it in, and then I and then I plus 1. There is no plus 1 anymore. Okay, so that's it there, but then we also have to apply these rules. And the second rule, A goes to C, D. So once again, I say are we the last element? No, could we be? Yes, so we should add the follow of what to the follow of C. Yeah, the follow of A to the follow of C. So that's going to add B to the follow of C. And then by rule 4 we need to add the first of D minus epsilon to the follow of C. And does rule 5 apply? No. Not a little bit. It's the same logic we just talked about, right? Can't possibly apply. Can I believe here? Am I the last? Yes, so what do I add? Follow of C to the follow of C? How do I know the follow of C? Do I explode in an internet loop? What do I use as the follow of C? What was it? Empty set. I use the last knowledge that I have of C, which is the empty set. Right? I am currently calculating the follow of C, right? And any time you need to reference it, I have the previous value I could pull from. That's why we start with empty sets for all the initial follow sets. Okay, does rule 3 apply? There's nothing after, right? This says you have to have something after. If there's something after you add its first set minus epsilon to your follow set, there's nothing afterwards here. And by the same token, 4 doesn't apply and 5 doesn't apply. There's nothing afterwards or at the very end. Cool. So we should have the follow of C is an end of file D and B. Then we calculate D. Once again, we can look at three different rules. This should be easier. Does rule 2 apply? Yeah, we're the last element. We are D. We add the follow of S to the follow of D. We're good. None of these other rules will apply because we are the last element. There's nothing that comes after us in this rule. What about for this rule? Yeah, same reasoning. Follow of A, right? The second rule applies. We add the follow of A to the follow of D. None of these other rules apply because we are the last element. Follow of D. Same thing. Nothing changes. So we get follow of D is the end of file B. And then if we do all these again, we'll get the same answers. And so we said we've gone through every non-term rule by every rule we have and nothing changed. What does that tell us? Done. We're done. We can stop. So now we have first sets and follow sets for this grammar. Awesome. Are we good on first and follow sets? Yeah, that's the correct answer. There's a question over here. Oh, sure. Why do we have to have the last column on the... Oh, when we get here, we said did we make any changes? Yeah, right? We added new follow of sets. So we have to go apply the rules again. We don't know if something is going to change. We can only stop when we go through every non-term rule, apply every single rule we have, and we don't get any new information. That means we're done. Cool. Any other questions on this example? So now we get to the why. So we want to build... So we started off the syntax analysis. We'll start at this first set by talking about parsing, right? We wanted to be able to write a parser. So what is a parser? Into? Into a yes, no, as to whether or not something syntactically accurate using a context for grammar. Close. Yes. It does one additional thing in addition to that. What was that? It creates a parser. A parse tree. It creates a parse tree, right? We want that yes, it is syntactically correct, and if it is syntactically correct, we want the parse tree that corresponds to that sequence of tokens. Cool. Okay. So we're going to... There are many different types of parsers that we talked about. We are going to study and talk about a very particular type of parser, predictive recursive descent parsers, which seems like a lot of words, so let's break it down. Predictive, what do we think that means? I think that would be like correct. How many tokens do we think it has to look ahead? This is a good meta thinking about what have we been studying the past two weeks. Two what? So when we're trying to parse a non-terminal and say, okay, we want to parse let's say an A or an S, like in this case, we want to parse an A. We need to make a decision based on the tokens that you're seeing. Is it A goes to CD or A goes to little a big A? So predictive in this sense means by looking at one token, just peaking at one token ahead, can I decide if the rule is A goes to CD or A goes to little a big A? And so that's the predictive part is where we get efficiency because we don't want to have to try all possible combinations of all rules, even though we know that will work eventually. What we want to do is know which rule to choose just looking one token ahead. So predictive, what about recursive? It's recursive. It's just what it means. So I'm supposed to trip the operator. So we'll see how it looks in the code. And we started that a little bit, but you'll see that it looks very recursive. Descent. We're going to start at the top. So we're always going to start at parse S at the top. And then we're going to figure out which one of S's rules apply. And then when we try to parse A, we'll figure out which one of S's rules apply and so on and so forth. Cool. And this is where it's important when we're talking about no backtracking. So here, for each grammar rule that we have, when we have a choice, does A go to B or A go to C, we want to make that decision just looking at one token ahead and we will never backtrack and change that decision. Okay. So what are some of the, we've actually been deriving the conditions for a predictive recursive descent parser. So what was some of the first ones? So my goal is I can only look one token ahead. It goes to, let's say, alpha and then I have another rule A goes to beta. So I need to look one token ahead. What must be true about the context free grammar in order for me to be able to write a predictive recursive descent parser? Yeah. So the first rule, so the first of alpha intersect with the first of beta better be the empty set, right? And this is exactly why we did first sets. By looking one token ahead, we can say, aha, it's got to be A goes to alpha and not A goes to beta. And I know because I calculated the first sets and the first sets tell me all possible strings that alpha generates start with these tokens and all possible strings that beta generates start with these tokens. So if I read a token, I'll be able to know exactly which rule generated that token. Cool. But then we ran into problems, right? What happens if epsilon exists in the first of, let's say, A? Well, S goes to, I think, A, S or epsilon? Is that it? So what's the first of A, S? A. And the first of epsilon? Epsilon. S is the first rule. But if I just read one token, if I read a single A, well, maybe we need to change it. If I read a single A, does that tell me which of these rules I should have? No, I don't know if this rule happens again or if I should go to nothing, why? Again, there's a non-terminal after A of S is the same as what S could start with, right? So whatever comes after S could also be what S starts with. And in the case of epsilon, epsilon could go to nothing. So I can't tell just from looking at one character ahead, does S go to epsilon or does it go to a different rule? So if epsilon exists in the first of A, then it better be the case that the first of A intersect with follow of A, better be the empty set. Do I have to rewrite this rule here, this guard, to say that if epsilon exists in alpha or beta, one of the right-hand sides? Yes. But why would I care if it's alpha or beta? So yes, you can think the first of A is going to contain the union of the first of alpha and the first of beta. It's definitely true. Do I have to worry about epsilon being in the first of alpha and the first of beta? Because of the first rule, yes. The first rule says they're going to have nothing in common. So I know one of the rules, if epsilon exists in the first of A, I don't really care which rule it is, but I know it's only going to be one of them. And then I say, well then, I'll know when to go to epsilon if what comes after A is never the same thing as what A can start with. Cool. And that's it. That's all we need. That is right. Cool. So I can give you any grammar. I can say, does this grammar support a predictive recursive descent parser? How should you show that? If I just give you the grammar? Yeah. You have to construct the first and follow sets to the grammar and then show for every rule where you have, for rule one, at every place where you have a choice, right, A goes to alpha and A goes to beta, then it better be the case that there's no intersection there. And you should show that. Here's some of the things you should not do. Just write yes or no. Other things you should probably not do is write these rules. That doesn't actually show me that you know how to apply them. It just shows me how to read and memorize these things and copy them onto the test. But it's actually applying these and saying, how do I apply this to a context for grammar to show that it actually does support a predictive recursive descent parser? Cool. Questions on this? We'll practice in the homework store. Okay. Now we get to the grand finale where we can actually write a predictive recursive descent parser. And we're going to write that using first and follow sets. So, first thing we need is a context-free grammar, right? It doesn't make sense to talk about a context-free grammar. That's not, I mean, that's what we're doing. We're parsing the need some kind of context-free grammar. Calculate first and follow sets. Why I've been going over this, right? You can calculate first and follow sets. You want to show yourself to demonstrate to yourself, demonstrate to me often that the context-free grammar actually allows a predictive recursive descent parser. And then we can actually, once we've done this, we can use the first and follow sets to write the predictive recursive descent parser. Let's look at an example of this. Let's go back to our language that we've been using. Write this like code. I am going to write code. So, I start from the top. I might parse that s function. What else do I need to be able to write this context-free grammar? I mean, sorry, this parser. Do I just need this? What was that? The production rules. Yeah, I need, well, sorry. Wait, what? These are the production rules. What else do I need? What was that? First sets, follow sets. I need all the follow sets, yeah. If I don't have these, I got nothing. Can I do it like this? Okay, what's the best way to copy these follow sets? There we go. I have no idea what's going to happen when I paste this. Copy, paste, all of these things very slowly. Looks like I can write it by hand, but I feel like this is still fast. Okay. So, I'm going to write parser app. There we go. I wish I could move these. I guess I don't know this program that well. Oh, well. What can you do? Okay. So, actually now I'm going to draw, because that's what I'm going to do. Okay. So, I want to write the parseS function. So, do I have a decision to make here? Yes, what decision? So, there's a couple things. A, do I, I guess I'm a little bit ambiguous, right? So, when I say do I have a decision, one thing I'm asking is, is there a choice? Do I have S goes to two different rules? No, it's only ever going to go to one rule. But, can I know right off the bat, by reading one token, whether this string actually corresponds to something that S could generate? No? So, what if I told you that the very first token of the input string starts with an E? Could I know that this string, or what if I got end of file? What if the input string is the empty string? So, can I know this just by looking at one token? And if so, how? If we get the first set of S, what's the first set of S tell us? Yeah, it must be either A, C, D, or B. If it's not one of those, then I can say it's a syntax error right away. But, I want to do T type, get token. So, if T type is A, T type in tiny font is B. Or, I guess I'm writing words, more Python than C. This is more pseudocode than real code. Let's see, A, B, C, D. Yes? We already have set operations. We could have said if the intersect of first of S and T type is not empty. Yes. Or you could say, if you have sets, you could do if T type exists in this set, of first of S. Okay, so if it is, then what do I do? What should I do? Now I have to actually think about how do I parse this rule? I already essentially did that here, right? I mean, I don't know what parse A is going to do. So, let's think about what are the semantics? What should this parse S function actually do? So, what are our two goals with our parser that we want? Valid syntax and a parse tree. So, parse S should probably throw a syntax error if there's a syntax error, right? So, let's go all the way down here. We'll say else. We'll assume we have some function called syntax error. We'll just assume that it's going to terminate execution and print out an error message. Okay, so I know right at the back, if I parse something that's not an A, B, C, or D, then I can say right away it's a syntax error. Great. So, what else should I do? We know we're going to use S, but if we just get another token and check that, we need to remember... Right, so one key thing is we only want to look at one token ahead, right? We've already... And when we call getToken, what does the lexer actually do? Yeah, it consumes that token from the input string. So, the next time we call getToken, it's going to consume more of the input. And the next time we call getToken, it consumes even more of the input. Right, so there's a couple things here. So, we're going to... Let's ignore the actual creating of a parse string a second. We'll see kind of how that will come about in the way that we're creating this. But for now, so S, so this parseS function is responsible for parsing an S. Creating... Think of it in a couple different ways. Verifying the input where it has and verifying that that input came from a parseS. But does it have to worry about checking a or b or c or d? If we have a parseS function, what do we assume we also have? ParseA, parseV, parseC, and parseD, right? Those are all functions we will have to write, but we can rely on them. So, if I know the rule is S goes to ABCD, the one is the first thing I know that should happen there. ParseA should happen, right? And that function is responsible for parsing an A from the input and moving the tokens until after that parseA. And then what happens after parseA? Right, and because we already talked about each of these parse functions is responsible for checking that its little chunk of the tree is semantically valid. So, if parseA successfully returns, then we know we have successfully parsed an A, and that's it. And that the next time you read a token it should be from b, but we know that we are semantically, we are syntactically good at this point. Now we have the parseA v, parseA c, parseA d, and then when I get here, what do I know? Yeah, that's what I'm just saying, yeah. They don't return anything right now. So, because we're just thinking of them as a little bit more abstract, we can think they'll do whatever they'll construct the tree for us. They'll do this much more in depth in Program Project 5. Now we can think they just do whatever they do. They'll call syntax error if they're just syntax error in their part of the tree. Otherwise, they'll just return. Yeah, now what do I need to check after with this? Why don't I need to check under file? Yeah, I know I have this S goes to ABCD, which means if I have ABCD, if there's anything else after that, it's going to be a parse error, right? So now what I'm actually adding here, I'm going to do a little zoom in thing. I'm also going to get rid of this. It's terrible getting token. If t-type is, what do I care about here? Yeah, it's not equal to the end of file. If it's anything else, then I know it's a syntax error. All that goes in here after this check. So now what about the top? So I'm calling parse A, but I just ran in a token. That was either an ABC or D. But where did that actually come? Did that come from the rule S goes to ABCD? Is it one of S's children in the parse tree? No, where did that come from? Yeah, from either first of A or B or first of B. And so it's actually not our responsibility to parse that and to move that input. So we need to make sure that we actually put that back. Because we are not actually consuming that token. The parser. Yes, it's like that entry point. This was asked a little bit of attention, but are we storing the tree? Because you said the parser returns yes, no, and the syntax is valid and it returns the tree, but we're not building it. We're not explicitly creating a tree, but as you'll see, this series of function calls create the tree. Do I need to store it? Not right here. Not right now. Right now we're just worried about creating the structure so that later we can do whatever we want. So later when we get here, we know, hey, I probably parsed S goes to ABCD. Now I can choose whatever I want to do here. Maybe I want to create the tree. Maybe each of these return nodes that I'm going to create into a tree structure. You'll get to this a lot more in project 5. But for now we want to think about it a little slightly more abstractly. Okay, let's take parse B first. Because that one's a little easier. To do parse B. Get token. Get token and then what? We can rephrase this. T type is not equal to B. Syntax error. So here I'm throwing an error. And then here we can we can print maybe something that says hey, I just parsed B goes to little b. So this would be where you do whatever you want to do when you know that this rule occurs. Like here I'd probably put print S goes to ABCD. Because I only know that this rule was successful when I get all the way to the end here. So B's really easy, but why don't I call unget token like I did here? I've actually parsed B. B, the rule big B goes to little b. This actually creates the B in the input. And I know, okay, now it must have this is the correct rule. And that way when B returns we say yes, we've parsed this B. And now everything after that will start on the next token following that. Okay, let's do D. Okay, gotta get the token. I was gonna get the token. Check forward with D. So I can spill like skittles or something and I'm gonna move it here. So I'm gonna call this parse D rule. We're gonna create this parse D. So then how do I distinguish between these two rules? First of capital D? First of capital D is D epsilon. So how does that tell me? What do I check? How do I choose which of those rules? You know, I'll talk at once. It's harder to be wrong. Yeah, so if we get a small, if we get a lowercase D, right? Then we know which of these rules must have happened. We have this one and this one. One or two? One. We know rule one must have happened. Why? Because if this rule occurs, what did we actually read from? Yeah, the follow of D, which is end of file or B. And we actually didn't go through it. We probably should have, but we didn't go through and calculate the first sets of CD and the first of A to make sure there was no intersect and also we didn't go through one of the rules we allowed with. Oh, I'm pretty sure it's good. C, C, B, B. Yeah, that's good. Yeah, okay. So I know if the token is what? Well, okay, so yeah, that's a good point. So why can't I say D or epsilon? Get token only returns tokens or end of file. It never returns epsilon. Right? Because we saw get token, right? It's just doing lexing and it's trying to find the longest token that matches. It's either going to return that token or an error. In this case, it's going to be a major error. I mean, nothing's going to work. Or it will be end of file. That's it. Those are the only options. So if we read a lowercase D, then what do we know? Which rule is it? Rule one or rule two? Rule one. Rule one. I'm going to do what? I'm going to get token like before. But don't we parse the D? Yeah. So we actually don't want to call unget token here. Why? Yeah, because we produced this lowercase D, right? This specific production rule. This instance of D. So we want to call unget token. Otherwise, we're just going to loop infinitely. We're never going to reach the end there. Okay, let's see if I can do this. Success. Cool. Okay, so then what else? So how do I know it's this rule two as opposed to a syntax error? So what? Follow D. Follow D? Yeah, so if it's an end of file or a B, then that means it must be D goes to epsilon. Otherwise, it's a syntax error, right? It has to be one of those things. Otherwise, it's not syntactically valid. Let's see, I'm going to be lazy writing it like this. Dollar sign. Dollar sign B. This just means is T type one of these two? Then what? Why do I want to unget token? Yeah, right? The rule is D goes to epsilon, right? Which means it went to nothing. Which means we should not parse anything from the output. There should be nothing that this parses from the output so we want to unget the token. Okay, then if it's neither of these two rules, we just did both of these two rules. If it's none of those, then it must be what? Syntax error. You made a mistake. Then let's do parse A because parse A is very interesting. Okay. So now we're going to look at parse A. When I have two rules, A goes to CD or A goes to A. A goes to little a big A. So I'll let you guess what's the first thing we do. Get a token. So then how do I distinguish between these two rules? If it's CD, but big C, big D? But I'm never going to read in big C or big D, right? Those are non-terminals. So I only read in terminals. There's somebody over there. Somebody over there, what do I check for? What was that? Little A, and what does that tell me? That's the second one. So I have two rules, one and two. So then how do I know if it's the first rule and how do I know if it's the syntax error? What was that? The first rule would be little C or little D or epsilon. So that's what I do. So I should also check the follow of what? A, which would be B, right? So the first one, the rule two is easy and we'll write that first since it's easier. So if T type is an A, then I know that A goes to little A, big A. New one. So then do I call unget token? No, because I want to parse this A, right? It existed here. And I call myself, I'm recursive, right? But if I didn't read in A and I wanted it to be the rule A goes to C, D, right? Let's think. What's the first of C, D? C, D, epsilon. And then we know the follow of A is B. So we know that it's either going to be a C or D, in which case it's in this rule. Or if this rule went to epsilon we need to check, is it in the follow of A? So it should be a B character, right? So C, D, B, because I can't check epsilon. What do we say? C, D. Unget token or keep the token? Unget, yeah. I don't generate this token. For C, for C and print that A I chose the rule A goes to C, D. And what if it's neither of these? Syntax error rule. And so we could do everything except for C, but we could do C very similarly. Question, yes? So each of these parse functions makes up our parser. And each one is distinct. You need a different parse function for each non-terminal in your grammar. Yes. So if I'm not mistaken in project 3 we're supposed to read in a context-free grammar. Yes. And then read in a string of tokens to determine if it's in the grammar. No. No, we're not doing this. For project 3 you're reading in the context-free grammar and you are calculating first and follow sets for that grammar and other things. In the event that you would have to do that how do you dynamically develop these parse functions? Different ways. People do it in different ways. There's tools like lex and flex and like bison and stuff which will actually, you give it input in your context-free grammar and it generates C code. C++ code that implements all these parsing elements. So yeah, that's part of what you're seeing in project 3 is you can do this all programmatically. And you can generate a parser programmatically. You can do everything programmatically.