 Happy Friday, everyone. Let's get started today. We've got a lot to cover. Project 2 is due today. And very soon afterwards, in this night of tomorrow project 3, we release, and it's going to depend on things we're going to learn in class today. So it's very important. And you'll have like four weeks, almost four weeks to work on this, which is a month. And if you're thinking, well, maybe I'll scare you on Monday. Maybe I won't scare you now. Just tell you, it takes four weeks. It does not take one week or three days before the deadline. That's how you're going to go. That's how people end up failing in class. So, we've been talking about parsing. Right? We've been talking about how... So what is parsing? What does parsing mean? What is it? What do you want to... Think about input and output. What are the inputs? What kind of tokens? A sequence of tokens, and what's the output that you're trying to build? A parse tree. Yes. Parsing generates a parse tree from the sequence of inputs that we created with the lecture. Awesome. So we already kind of got a high-level idea of what parsing is, right? So we have kind of seen, okay, we have this, maybe this parse function. We have some commentary grammar. And in the parseS function, we're going to get a token. We're going to check and figure out which one of these rules. Because we have three possibilities. We have S can go to A, S can go to B, or S can go to C. Right? And so based on that first token, we want to be able to decide which rules we actually choose. And so we kind of briefly went over this. I'm going over this very generally to refresh your memory. Because we now need to build up some intuition and some additional functions that are going to help us actually calculate and to create these parsers given a grammar. So maybe something we didn't take about. Check end of file. So after I parse A here in S, why would I want to check end of file? See if the screen is done. Why? Because that's what we talked about. So why do we check for end of file? As we keep going, let's think about this scenario where we have... I'm going to steal this. See if this works. Boom. Alright, cool. Let's think about this scenario where our input string is, let's say A, B. And our grammar here for refresh our memories. So this is our grammar. Okay, so this is our grammar. So first self-check. Is this string in this grammar? Can this grammar ever produce this string? So which production rule? Create a derivation. So let's start with S. Which rule do you want to use? S goes to A. Okay, and then which derivation rule? A goes to small a. Are these strings the same? So is this the derivation of this string? Alright, let's try again. How do I direct this string? Take B. So we go to B. And then which rule do I want to use? So let's say I choose B goes to little b. It's going to be little b. If instead, I choose the second, maybe the first one here. Big B, little b. I need to choose B again, so I choose B. So can I ever drive this string A, B? No, it's not valid syntax, right? The grammar says, so somebody, what does this grammar say? Like how we did with regular expressions, right? In regular expression English, what do you strings look like that are in this regular expression? Alright, sorry, in this context free grammar. So what are the strings that are in that this context free grammar can produce? Tell me about the other one. What was it? Single A. A single A, from this rule, S goes to A and A goes to little a, right? So a single A, what else? Single B. One or more B's. One or more B's, what about zero or more C's? Yeah, you can think about one or more C's or the empty string, that's long. So those are the strings. So the string A, B is not in any of those, right? It's not possible for this context free grammar to generate this string. So then let's step through what happens here for parse S. So this is our input, right? So we're parsing, we haven't consumed any tokens, any input. Small t type equals get token. What's it going to return? Small a, right? So remember now we're talking about terminals and non-terminals. So get token is going to return a terminal in this case A, B, or C. Or I guess any other character. So it's going to return A. So t type is A. So we say is A equal to A? Yes, just say yes. Or just don't say yes. Believe it in your parse. So we put it back. So we first called get token. So now we move the input here. And then we call unget token. So we put it back here. We call parse A. Let's go steal the source code of parse A. Oh, I don't have it. Okay, cool. So we already did that last time. So we call parse A. So what's going to happen to the string after this point? Yeah. So it's going to consume that A here. So we just stopped and said, okay, great, parse A returned. That means that there was no syntax error and parse A were all good. But we know we're not good. We know this is an invalid syntax, right? The string cannot be... We cannot create a parse tree given this grammar of the string. So now we want to check. So what would check end of file look like if you were going to write this? Yeah, some t type. Well, maybe let's flip it around a little bit. If it's not equal to... What do we use $, right? To specify end of file? Then what? Error. Yeah, syntax error. So let's assume we have a function syntax error. We know, hey, there should not be an end of file. There should only be end of file. If there's anything else other than an end of file character, then we know it's a syntax error. Now we can say it's a syntax error, and so now, now we're good. So now we're actually going to correctly say that this is a syntax error as opposed to just this string. Right? Similar things happen. We say t type equals getToken, we say a. So here we say a. We check here, we say ungetToken, so we're going to move back. Then we say parseA, parseA is going to be here. We're going to go into check end of file, we're going to check and say yes, that's end of file, and so that means we're good, and this string is good. So this is where we want to go. The question is, how can we actually do all of this? So to do that, we have to build up some examples. So, and we saw last time, right, that we saw here that I can check based on looking at the very next token, right? I can peek ahead and look ahead one token and I can decide is it this parseA? Is it this parseB or this parseC? So why could I do that? Each of them begins with a different Yes, A, B, and C, all strings that they could possibly produce start with a different token, or character in this sense. So what we really want to be able to understand, so let's think about a different, let's say I switch this language around and I say B goes to B, something like that. Now, by looking at one character ahead, can I tell? No, I'm in parseS, I read A, do I know if S goes to A or S goes to B? No, because they each congenerate a string that starts with the same character. So, to help us out, we need to write a function or we need to, well, we're going to study the algorithms for how to do this, but I want us to do this and develop this so it's not terribly complicated. We want some function, let's call it first. Let's spell first right why can't I spell first? F, I, R, S, T, and so on. There we go, perfect. Does that have a function? It's going to take in what we'll call let's do, what do we do, alpha? Is that alpha in your Greek anomalies? I'm not super good at my, that is alpha, right? Okay, cool. Okay, we'll say it takes an alpha where alpha is a sequence of terminals and non-terminals, right? So thinking about first is a function so it takes in a list or a sequence of terminals and non-terminals and it's going to return a set of of what? So what do we want this function to represent? So let's think about it like this. Let's get, change this back to here. So I want the first of A to be the set containing A and so what would I, what is the first of B then? Big B? Little B? So then what would be the first of C? Epsilon or what? C, epsilon. Right? So what I want is, I want to know given a sequence of terminals and non-terminals all possible strings that that sequence can generate what's the first thing that those strings could start with? And we know that eventually when we do these derivations, right, we're going to get a sequence of terminals right, concatenated together. So it only makes sense to talk about terminals so if you think about type checking we should never see non-terminals in this set. So there should be a set of non-terminals or sorry, just terminals and epsilon. This is a little tricky, right? Are we ever going to read epsilon from getToken? Is getToken ever going to read epsilon? What is it going to read? What is getToken return? Tokens, yes. And what else? Which are what in our kind of three grammar? Terminals and what else? What else could it return? End of file, yes. It can also return end of file. So it's only getToken can only return a terminal or end of file but our first sets can have terminals and epsilon. So we'll figure out how we can deal with that later. So this is at a high level. So we think about, this is actually a lot of times the way I like to think about code is okay, let's think about this function. What's the input to this function and what's the output of this function? And then we can talk about how to actually write this function but if you start with that basic building block, everything else becomes a lot clearer and a lot easier. So let's So I said alpha can be a sequence of terminals and non-terminals. So I say let's say that I have first of lower case A. What would that be? Just A, right? This makes sense. Literally, there's no rules, there's no derivation. It's a terminal. So that means that all possible trains that generate today are going to be a terminal. Let's think about this. Let's Does that logic change if I add an arbitrary number if I can catnate an arbitrary sequence of terminals and non-terminals after it? So think about it in concrete terms. Let's say, what's the first of like A, B, A, B? A or what? More precisely. Small a is more precise, more precise than that. Is this correct? Let's think about it this way. The set containing A, yeah. We're thinking about the sets, right? So this is the set containing A. Awesome. So then, A arbitrary. So would this change if I change this B to a capital B? Or if I change this B to a capital B or this A to a capital A? No, because for this sequence of terminals and non-terminals, it doesn't matter what happens after that first A. Every possible string is going to start with this lower case A. So then, what would be the first of A concatenated with any number of sequence of terminals and non-terminals? Oh, did? Sorry. Set containing A. Yeah, perfect. Cool. Variety of rules. What about epsilon? The set containing epsilon, right? Because we want terminals and epsilon in here. Just like up here, what we said the first of C is C or epsilon. This will work a little bit. So we have S goes to A or B. A goes to B goes to a little B, B or epsilon. So then, I said what are the first sets of all this S, A? So let's say well, let's say the first set of A is going to be the second containing A. Perfect. And B, why do you say or epsilon? How many rules are there for B? Two. Two. So actually what are we doing? So we're saying the first of B is really the first of each of the rules, right? Little B, big B, union with the first of epsilon. So this is so what we said the first of the terminals starting with the terminal will be epsilon. If you think about this, it's actually a really powerful function we're writing. Right? Because we're able, using this, to tell something about all possible strings that are going to be generated by this context-free grammar. So let's give this a little bit forward a little bit more too. What's the first of S? Second containing one element little A will be epsilon. Right? So these are, this context-free grammar generates an infinite number of strings but we're able to very easily well, in this case very easily tell but we'll actually see that we'll develop an algorithm that can run on any context-free grammar and it can tell you precisely for every non-terminalist language and any sequence of terminals and non-terminals all possible strings that can be generated what they start with. That's something that's pretty cool and then we'll see how that comes into play later on. Let's make it a little bit more complicated. So I have some rules here, right? In some sense. I have a rule here so let's say you can say these are the same rules. Anything that starts with a terminal the first set is that thing. So we have a rule here about epsilons, right? And here, I kind of have a rule about what to do and actually even here whoa, whoa, whoa that's very fancy fence. Here I have a rule about what to do when I have ORs and I have two different rules. So the first of B is going to be the first of whatever that rule is union with the first of any other rules that B has. We have the non-terminals in the same collection of other terminals and non-terminals in that non-terminal map. On here, each of these are rules production rules. I actually like to think of it as left-hand side and the right-hand side. So the left-hand side you have one non-terminal and on the right-hand side you have an alpha which is a sequence of terminals and non-terminals for epsilons. The first set of a non-terminal with several other non-terminals on the right-hand side would be the union of the first sets of each of the non-terminals in the right-hand side. Yes. We'll dig into that more, trust me. We're going to especially dig into this more. So Preview Project 3 you're going to be writing code that does this. So you're writing a program that takes in a context for grammar like this and spits out first sets of any arbitrary grammar. So it's super cool. I understand I can give you the formal rules but I find that if you actually understand why these rules work you'll apply them a lot better and be able to think about how to write a program that does this as well. Let's make it a little more complicated. So I have S goes to A A goes to B B goes to what are you going to do? A little B, B. So let's say I go in the same order I went before. So what's the first of S? Small B. So how do you get there? Think about your thinking about how you actually arrive to that small B. Try to break it down into kind of an algorithm or some kind of explanation. What was that? First of A. Yeah, so the first of S is really the YA. Yeah, so we kind of said it's like first of what's on the right-hand side here. So we can say first of A. Do we know what first of A is? No, we don't know. But we know, so if we look at it and we try to do first of A it's the same, I'm just going to use an F right now. First of B. But what do I know about first of B? Yeah, the second containing B and now if I revisit it as the first of A would I know what to put in here? Yeah. If I did this for first of S so what is first of A now? Second containing little B. So at this point you might be thinking, ooh beautiful recursive algorithm, right? Or to calculate the first of S I need to calculate the first of A, and calculate the first of A and calculate the first of B. So let me right now dissuade you of thinking that way because you will run into problems if you program things this way. So let me show you an example. New context free grammar, S goes to A A goes to say S or B B goes to little B. So right, is it, so first thing you have to ask is this a valid context free grammar? Why is the question? Yeah, right, so all we said really is the left hand side of the rule must be a non-terminal and a non-terminal by itself you have the production, the error symbol and then an alpha of any sequence of terminals and non-terminals. Right, and we know that this bar is just shorthand for another rule that says A produces B. So, then why the question? I understand the question, I just want you to think about why, why is it weird that we're different than what we've seen before? It gets stuck in an infinite loop or you get stuck in an infinite loop thinking about it? Yeah, but it doesn't break any of the context free grammar rules, right? We just said we can do anything here. And S is only important because we know that all trees must start with S and all derivations must start with S, but we didn't give any restrictions to say that S can't appear during the derivation. Right, so in this example valid derivation would be S, A, S, B and then a little B, right? This is a totally valid derivation. What quiz is this ambiguous grammar? You guys can really look up at that. I should say that one often. Yes, so is this ambiguous? Yeah, probably. Why? And how many part trees are there that can do the same string? Craft. Yes, I think an infinite would be probably more half. Well, maybe not more half, but more polite description I guess. Cool, okay. But it doesn't matter that it's ambiguous. We can still define first sets for these. So let's think about this. Okay, let's think about kind of like we were doing before. Right? So we said first of S is the same as first of A. And first of A is going to be the union of first of S union with the first of B. And if I was writing this recursively, I would do first of S as the first of A as the first of A. And I would get into an infinitely recursive loop and I would run out of stack space and actually crash. So this is the way we break that loop. Usually what we care about is we care about right, at least right now, we're going to care about what is the first set of the non-terminals in the grammar. And this is going to help us in parsing, right? I tried to give you a little flavor of that because when we have different rules we can actually distinguish between them based on the first sets. So what I'm going to do is before I even try to start thinking about first sets I'm going to assume that they're all empty set. So I said the first of S is empty, the first of A is empty, the first of B is empty. So this is my step zero. I'm going to say they're all empty sets. Now I'm going to say, okay what's the first of S? What do I know based on what I thought of these rules? Yeah, the first of A. What's the first of A? No, no, no. What is the first of A? We have it right here. Empty set, yeah. So I say, okay, it's the empty set. So I do one step and I say great, it's the empty set. Now for first of A I try to calculate first of A. So what's the first of A? First of S union with the first of B? So what's the first of S? Empty set. What's the first of B? Empty set. Empty set, great. Okay, what's the first of B? Small B. Small B, so that was, like, yeah. So this would be the second thing in B. Now because I've changed something I need to do this again. Right? Before when I first set things up I can think about it step zero. They're all empty sets. But when I started to go through and try to calculate each of the non-terminals here I made a change. I added something to one of these sets. So now I got to start again using let's see I didn't give myself enough room. A, B. So I'm going to start again using empty set empty set empty set B. So then I start with an S again and I say the first of S is what based on this? The N is the first of A which is empty set. Yeah, nothing changed here. Now I go to the first of A. What's the first of A? First of S union with, and the first of S we know is empty set and the first of B is second hand B. So union those together and we have the second hand B. We do this one more time for B and we see that B also is the second hand B. So did I make a change? Yes, so I got to do it again one more time. Well, maybe not one more time. I don't know. So now I see the first of S is what? First of A which is the second hand B. I just calculated that right here. I will do this. S union with B, the first of S is the second hand B. The first of B is the second hand B. So I can say it is the second hand B, right? Set number doesn't matter for the second hand B union with the second hand B. Cool, we do it again for B. Am I done? No. No, I need to change. I need to do it one more time. So I do it one more time and I would find that it's all the same. So I go through one more time here second hand B, second hand B and I say did I make a change? No, so then I stop. So you can think about I keep applying these rules that we are going to come up with empty sets. I keep applying them until I make no changes. So you can think about you are starting a state where you are assuming that they are all empty sets. Then every time you go through those rules you try to add more information to what you know. So in the first step here we found out oh, the first of B is lower case B. Then I do it one more time and I go oh now the first of A is the second hand lower case B. Then I do it one more time and I find out the first of S is B. But I apply all my rules one more time and I said that didn't give me any new information therefore I don't have to keep applying these rules because all of our first set rules depend on the other first sets. What we just went through is a way to get first sets with only terminals in them without going into some freaky infinite loop. Yes. Regardless of the context free grammar we can always get a first set with terminals without. You assume that they are all empty sets and then you work from there. So if you are calculating first of A so first of A is first of S. So what is the previous step S or the upper S? Actually technically either way will work I believe because you've already made a change but really you should use the latest value that you have of S. Exactly, because you just calculated it right? It may change in that we add more things but we never take things away so you can always use the latest value you have of calculating first sets. Exactly, cool. So if we never actually what we have right here this is enough for you to do I think like 60% well less than that. Of the first set calculations of project 3 it's enough to do about 50% because we didn't really talk about what to do with epsilon so let's think about how that changes things. So I have a new grammar S goes to A, B I have A goes to little A B goes to little B so using what we just did so can we calculate the first sets here for S, A and B? Yes Okay, so let's think about what's first S or actually starting with A and B, those are a lot easier right? So A is what? Little A and B is little B great and so first of S is what? It's the empty set. Well first time through what should it be? Intuition without calculating and just looking at it. It should be small A? Yes. Why? Yeah, so when we're doing first we're doing first of S is equal to the first of the right-hand side so here we have first of A, B Do we actually talk about what to do in this situation? But what should we do? That's a concatenation. Yeah, we don't care about the first character right? Yeah, so it's going to be we saw that a terminal makes sense right? And so here we have the first of S is really the first of just A B the second-hand A So this makes sense because this is essentially concatenation we have S produces A B So if you think about all the possible strings that S can generate like my cool squiggly line trees Whatever the first character that they generate is going to be the first all strings that A can possibly generate the first character of those strings Is it pretty straightforward? Yeah How do we know that one? Three columns worth of information so like the first one would be like an empty set Oh it would be, we were just going through calculating but thinking about it and just looking at it Yeah, this is more building up intuition time not calculation time but we definitely would we'd have to go through all those steps to figure it out So, yes Good question reading my mind So first thing to ask yourself is A, does this change things and B how and see why So does it change things? Yep Yes It does Why I guess how, let's go with how first So how would this change maybe the first sets like intuition based Yeah, so, okay good So first will this change the first set of B? No, it would change the first set of A Yes Based on the rules we already talked about Now what about the first set of S So let's talk about why this changes things So now I have I still have this same thing as here right, I first of S S is the same as first of A B right, the sequence A D But can I make that simplification of this is the same thing as first of A? Why not? Because A could go to nothing right, thinking about the tree again We have the tree of A A B and all possible strings that S generates right We care about this We care about what's the first character that all S's could possibly generate right, and so in one scenario if A does not go to epsilon then we have the first of A right, A is going to generate a bunch of strings and in some scenario where A is not epsilon we need that first set whatever A can generate but in the case that A goes to nothing, then what do we care about? So this goes to absolutely nothing an empty string, and so now the first of whatever S produces is going to be the first of what B can produce So let's try to put that into maybe a kind of rule So we have first of S and we have first let's call it A alpha right, so A non-terminal, different from our other rule of little A of a terminal so here we have first of S is the first of non-terminal A and alpha so we know one thing right, you already said well it's definitely at least going to be the first of A right, almost though, is that true? Did that apply here in this case? Sorry, here in this case? Why, what's wrong? What didn't we include from first of A and first of S? In the case of A goes to epsilon so you can think about looking just at this first character here I know that in the case that A does not go to epsilon then S will have could possibly start with all the first characters of A minus epsilon, everybody agree? Another thing is the only rule I'm saying is one rule we can apply Yeah, we're going to do that Don't worry, let's go Alright, so now we have cool, so what does this mean? So, the first of S we want to see is in some form of first of A alpha and epsilon is in the first of A so what does this mean we should add to the first of S? The first of what? So let's say this took care of the case right, so this will still apply so it will be first of A minus epsilon but what else should we do? So first of, in this case in the previous example we used B but here we need the first of whatever comes after A right, and so this alpha can be a sequence of terminals non-terminals, so if it's a terminal the first rule will apply if it's epsilon the second rule applies if it's another non-terminal then these rules will apply and I'm saying equal here this is really kind of like a union thing we're adding to our knowledge of what first of S is so let's play this a little bit, so using these rules so we start out with empty set of first of S A first of A I sort of like to put these in F of B start out with empty sets so let's start with B first because I don't want this to take forever Why does B start with empty set? Huh? So why does B start with empty set? Why does it B? Oh it would think about programmatically start everything with empty sets so you assume no knowledge of what the rules are and then you go through each of the rules so if I just start with B what's the first set of B? It'll be awesome first set of A so how do I handle this recursion here first of A is the first of A do you mean with the first of epsilon? what do I use as the first of A? empty set because that's what I know the first of A is empty set union with what's the first of epsilon? epsilon awesome and now when I do first of S so by let's say this rule 3 right here let's say this is rule 4 so by rule 3 here I have the first of S is what? first of capital A for what? minus what? minus epsilon so it's going to be what? empty set okay great and then what's the empty set? union that with what's the first of S for this rule so what does this say? yeah so this is going to be well in this case the first of S is going to be the first of little A big B which is we know from the other rule is the second containing little A do I go then look at the first of big B? no because little A does not have an epsilon in its first set right the first set of little A is little A that was our first thing so this AND here does not apply right it's also not a non-terminal that's also why this AND could not apply okay one more scenario it goes to epsilon it goes to D or epsilon that's easier cool so our first set S of A of B so I starting them off with all empty sets starting from the top down so the first of S would be what? the first of A minus epsilon which is what? the first of A is empty set so it's the empty set first of A is little A and what else? and epsilon and B is little B and epsilon did I make a change? yes okay go through it again so S S goes to capital A B so what I had first first of A minus epsilon and then where I had first of B first of B the first of B I had the first of B minus epsilon which is B but is this correct? why not? yeah so that wasn't anything about S if B can have epsilon too S can also produce an epsilon exactly it can start with an epsilon right so epsilon but how do we get there right so none of our other previous four rules apply the third rule said always take the first of that thing, add it minus epsilon the fourth rule basically told us when we move on to the rest if the first non-terminal in our sequence had epsilon in its first set but we don't have a rule of what happens if you get to the end and it was all epsilon right we can see here by intuition there is a possibility that both A is going to go to epsilon and B is going to go to epsilon S can also go to epsilon so we need a fifth rule that says something to the effect of the first of S is equal to well the second containing epsilon if what? yeah so every so if let's say for the rule S goes to alpha right let's call it I don't know if it's simple for this let me know yeah so for every if I was writing code I'd say like for all well not code but for math for all let's say A in epsilon or in alpha let's see I don't want to phrase this epsilon exists in the first of A then we add epsilon to S's first step so this would be if so the rule is in the form S goes to alpha so here alpha means the entire right hand side of the rule so we're not pulling out in some sense the first character like we did before with A alpha or little A alpha here we're saying the entire sequence of terminals and non-terminals if for all A in that sequence epsilon exists in the first of A then you add epsilon to the first of S and really when you're looking at it you say okay is epsilon here yes is epsilon here yes then I add epsilon to the first of S and wait a second we just derived all five of these rules