 Morning everyone. I see we're a little bit less than normal. Is it the weather or project to you think? Any questions before we get started today you want to homework 3? It's going to cover stuff we're going to talk about today, so I'll probably release it tonight All depends on what the stuff we get through right? I don't want to give you a homework assignment before we've actually covered enough material, so Just to kind of clarify the syllabus is a plan of how we want the semester to go But that doesn't say it's going to definitely go like that right especially with the homework assignments The project will be pretty very similar on the due dates. Yeah The next project is due was it March 4th And it's a lot more difficult than we've done so far, so you have a lot more time on it So yes, so there'll probably be two homeworks in between that. I think yeah When's the first midterm the midterms are set they're on the syllabus Next Friday cool next Friday Any other questions class stuff project stuff. All right, let me get cranking Okay, so let's kind of look at this example, so we're gonna go back to first set So some of this some of you remind us what are first sets again? So what's that let's start at the top what are the inputs to our first function? Oh Thanks for your grammar not quite not the whole grammar although You can't think about me the start that represents the context for grammar but part of it But we don't have to pass an entire context for grammar back. Oh, is it? Non-terminals are part of it. Yeah, that's one of the inputs. Can we also pass a terminal into the first function? Yeah, right, so it would tell us that hey, yes, we can do that. We can also pass a sequence so I'd say other important things so Right, so this is kind of how we're defining first sets. So we're defining first sets as They can take in a sequence of what we're gonna call grammar symbols. So We're grammar symbols are non-terminals terminals and epsilon right everything we use in the grammar Except obviously an arrow, right that doesn't make sense That's just a little syntax rule So if so this is the inputs. What does the sequence mean? Activity in any order and the order matters right in a sequence the order matters Just like the rules in our Context free grammar right are in the left-hand side as a non-terminal and on the right-hand side and then a production rule And on the right-hand side is a sequence of these symbols right terminals non-terminals and that's what what's the output What do we want? What do we want this function to return? The set impossible first characters. Yeah, definitely. What's the type though? What's the like if you think about this as types right the input types are? Yeah, set of tokens or set of terminals Right thinking about it in the grammar set of terminals a set of tokens Right, so yeah, so that's what we want to think of is that the first set returns a set of terminals and also epsilon Right, we've seen that we want epsilon to be in the first set And then the semantics are we want this function to return the set of all terminals and epsilon that begin strings derived from Alpha right from this sequence of terminals non-terminals and epsilon so starting there right so If we have this grammar So what was the language let's practice more cottage free grammar stuff So what's the if you describe this leg this grammar in English the language described by this grammar? Yeah, so single a any number of B's any number of C's right or the abducer So if you just kind of like one to calculate the first of s Can we do it just by looking at that top rule? Right, so we kind of can think of okay in this case right because we have s goes to a The non-terminal a s goes to the non-terminal B s goes to the non-terminal C Well, I really need to define those first sets and then I can use those to build the first set of a right So what would be the first set of a then and what do you look at to do this? So now we're gonna try to think about we want to build up some intuition here But we want to think about how would you actually compute this like what would you how do you right now? So what's the first set of a if I'm looking at these rules a big a Represents You look at what a represents so which of these three rules one two three four you look at You look at all of them, but you pick one Look at all of them. We're trying to find the first set of big a Right, so it would be a little a and you look at this rule. Why look at this rule It's it defines a It defines big a right so this this rule says hey This defines the strings that are derived from a and our tree that have a is their root. They're all gonna start with this rule So we know from looking at that. Hey all those strings are just Little a right so they have to be the lay What about for B so which which rule do I look at one two or three here three two two two one two three I want to calculate first of big B So I really get three I look at three and then what do I do? somebody help me out you look at the starting on How do you know that from looking at those two rules? Right so looking at the rule for B you can say okay I know it has to all strings all possible strings that be can generate right they have to start with a B a Single B character, so I know the first set of B What about for C? What do I do for C? It's either nothing or C or instead of nothing right because I say epsilon so we're very clear right So the epsilon is very it's pretty clear right? We have a specific rule C goes to epsilon Right, but how do we get the little scene from these rules? Because if we look at this rule big C goes to big C little C You picked a little C and then it has to follow by something You pick the little C, but this little C is at the end of this string that this rule can generate So why does why would that be in the first set because you got quick in the back there? Right, so the fact that we know that C Can go to epsilon right we know that epsilon is in the first set of C now when we see C here We can say hey C goes to big C little C But we know this big C can go to epsilon Which means that this production rule can start with the little C Trains the first set of little C is gonna be epsilon and C And so now using this information now, can we calculate the first set of s? So what's the first set of what's the first set of s and how I calculate it that's the big thing How do I actually type like this? The union of all these rules a first of a first to be first of C Right because these are all specifically rules here, right? So I know s can be replaced by may which means that the first set of a has to be the same as the first set of s I know s can produce B So I can say it's gonna be the first set of B as well And that s goes to see so it has to be the first set of C as well So it's gonna be a B C and epsilon so nothing to think about right when I'm parsing this Wouldn't it be a B C epsilon then a B epsilon then C And work a B epsilon C Order of sets doesn't matter Okay, yeah, so question is is this grammar So is this grammar ambiguous? And can we use first sets to calculate that how would we be able to tell using this data? So what does it mean for something to be ambiguous? Is it right there's multiple parse trees for a single string so hot or another way to put that right is when we're parsing Right, can we tell by looking one character ahead right the predictor I guess sorry ambiguous is one thing We're trying to focus here on predictive parsing so It can't be ambiguous to have a predictive parser, right? It's ambiguous. Then we have a big problem But for a predictive parser, we want to say can just by looking at one character Can we decide which of these rules to apply? So we're trying to parse s Can we decide between? Big a big B or big C? Yes Well the B is kind of confusing because it could be Just this rule S goes to big a big B big C So why so somebody tell me why using the information that I have here. Why can't we tell that yeah? Because the firsts are all unique The first of what the outputs of each of the first sets of a b and c are all individually unique Right, so just by looking at this first character, right? If I'm trying to parse an s and I look at the first character and I say well if it's an a I know it's in the first of a so I know this rule applies if it's B It must be first of B, so I know this rule applies if it's a C. I know it's in here, so I know this rule applies What about B? Do we have a problem here? So we choose just by looking at one character which one of these rules today Yeah, which one of these rules of B Right, so actually yeah stays within B Right the way we have this written Right both of the rules for B B goes to big B little B B goes to little B right either of these two rules These both what's the first set of little B? B and what's the first set of big B little B first set with one character or one to Yeah, we'll be right, so there's an intersection there so actually we can't write a predictive parser for this because We can't decide between which one of these two rules just by looking one character ahead Right as we saw if we look two ahead then we can try and tell Just looking one character ahead. We can't what about this one C. Can we tell? Which one to apply depending on this rule? If it's a C we know it has to be the first rule right because the second rule here Produces an epsilon right Although that's not a hundred percent Because it depends on what comes actually comes after the C so we're actually gonna look at there's actually another function We have to define later. So this is just kind of peeking ahead We're gonna find another function called the follow Which says what actually is the first character that comes after this so then we can see right if we know that there's not It's not possible for a small C to follow big C then yeah We can easily tell between these two rules right we can see either either it's a C which we know It's the first rule applies It's a second rule applies then it really depends on what comes after but let's kind of leave that alone for now and let's What we're gonna do now is we want to Okay So now let's try to derive the rules for how we can actually calculate first sets Right, they're not they're not difficult They're look very confusing if you just look at all five rules at once and you're just like ah, this is a lot of math-looking things But if we develop them with these examples I think there's will be a lot more clear for you to understand the intuition behind the rules and really the intuition follows How to derive them? So let's consider Okay Let's consider this S goes to big a or big B Big a goes to little a big B Goes to epsilon Let's just look at a I want to calculate the first of big a so how do I know what to do? Yeah, I look at the first symbol that's on the right hand side right so Oh and the right hand side yeah the right hand side right this is the rule here So then what is Okay, so let's say what's the first of little a the set containing little a right remember We always got to consider sets. So is this type Jack? Isn't our type of the first set of the first function? Right in returns a set the second takes terminals and epsilon's and the input is a sequence of terminals non terminals and epsilon's So what's so can we generalize any rule from this? What kind of inference can we draw from this? So why convince me They can't ever be anything but the term Does everybody agree are we agreement? Yeah Yes Because we're talking about we're defining So first sets yes, so the input right into the first set are non terminals terminals and epsilon's So basically terminals and at all those will be in the grammar, right? So yeah, so if it's not one of those then it's not a terminal or non terminal because it's just some other random symbols Oh, yeah, we don't need to worry about that So then is this true? Is that true? You would like it to be so So go back to the first set right? What are the what's the actual? What's the definition of the first set right at a high level? What did we want it to do? To Right so another way to think about it is with parse trees right so all strings derived from that rule So we think about a parse tree right we draw our parse tree Right, so this is saying like starting at that node in our parse tree applying all the rules We possibly can what are what's the first terminal that all strings? Deriving from that node in our parse tree can possibly start with so how many children can this terminal have? None right, it's a terminal by definition Therefore there's gonna be no children here So that way all strings that start with the terminal X any terminal X are Gonna have a first set of just X All this rule one makes sense right? It's an intuition So what about how do I so we said this is okay? It looks like this But to get this do we actually apply this rule? Yeah, we didn't actually apply this rule yet, right? We just use this rule to talk about this right-hand side We'll kind of build that up in a second, but keep that in mind, right? This is one of the things we need to do is how do we actually? Propagate first sets or how do we? Use these production rules to calculate first sets, but let's look at B So we looked at a right and we actually were able to derive a rule from looking at a let's look at B so I want to calculate the first set of B, so You know using intuition first. What's the first set of B? Epsilon, but that's the not Okay, so to determine this did we use this rule does rule on apply Question really is this epsilon a terminal Don't want to taste that you can be wrong. It's totally fine Right So this is basically saying for any x if x is a terminal then the first of x is the second Exactly this these are non-terminals. Yeah, we're gonna get there. We're building up I'm gonna build up our simple our simple cases first We haven't no not not to do calculate first of a or first of thing B. Yeah, I miss what she said Yeah, so what Sarah right? Can you repeat what you're talking about the epsilon so is it a terminal? Right, yeah, exactly sorry input to the first function is terminals non-terminals and epsilon right so we actually are We didn't formally define it, but we are keeping this idea of terminals right are distinct from epsilon Right, so this rule definitely doesn't apply because really we want to ask right, what is the What's the first set of epsilon right, so how can we think about this? So using the same like logic and reasoning that we use to think about rule one How can we reason about this? The empty set could be What is it? Right, so in our parse tree We started the node epsilon So why epsilon? I So does this follow our Does this follow the intuition and the rules and the type checking that we stated for So what's the return what what is the first set output? For the first function sorry, I call them first sets, but Is what it's a set containing what? Epsilon is one of them right epsilon's and terminals right so first set returns epsilon's and terminals exactly So we actually we want to include this epsilon right the first set of epsilon We want it to be epsilon right And so rule one doesn't apply because epsilon isn't a terminal, but it is a leaf in our tree, right? So epsilon can't have any more children. It's not a non-terminal Yes, because we're not going to adding those rules the same thing All right So I've calculated first of a first to be using some of these rules So how do I calculate s now? Let's make it simpler right. Let's remove just this rule and we have this Right, but so it would be a we know that I'm looking at it But why so what what what are we kind of intuitively doing to say that well in this case? The first set of s is going to be the second painting little a it's a recursive call Yeah Right, what if I added a vc and d here that changed what we do Just to find them their terminals Small b small d Like in your person Paul if you want to do personally Right so Kind of if we have it in this form what of all of these so if we think about the right-hand side of this first rule Right, how many symbols are on the right-hand side of this rule for? Which one do we care about right now for calculating the first set big a why it's the left most But where's that was that intuition where why? So just because we said so What if I put let's say the terminal e in front of here, but I still look at big a It doesn't matter anymore why Right, it's not the left-most symbol Right, then I would look at e so really what I care about so on the right-hand side well So let's think about this from the tree perspective to because I think this helps so we have s right and we have a potential rule That is s goes to a little b little c Little d right so when we're generating strings every time we see an s We're gonna replace it When we follow this rule right we're gonna place with big a little b little c little d right as the rule says So doesn't this mean that Anything that strings that it all the strings possible strings that a generates right their first set is The terminals that a can possibly start with does not mean s also has to start with those because it's the left-most symbol What's that we're not talking about right-to-left Good try No, we're only reading left or right right so all possible strings So we think this a is going to generate and do whatever right, but we can calculate the first of a Right, we've done that actually and so then we can say okay That means that all of those possible strings a contain the first of a contains the first characters of all of those strings And since we replace s We know that the first of s must also contain the first of a and we know that because it's the left-most symbol Right if we had an e here and our tree looked like this Then we don't really care necessarily yet what big a does We say what's the first of e right because every string that s generates is gonna have to start with the left-most symbol on the Right-hand side of the production rule. How can we try and capture that? In a rule does everybody agree with that? So s is gonna generate a bunch of strings right it's gonna generate strings by applying each of these rules Right of s goes to whatever each of these production rules here really have one rule It doesn't matter one or a hundred But when we look at each rule we say okay if this rule is chosen this rule s goes to little e big a bcd right What I'm trying to find out is what are the starting terminals that strings that s generates starts with and so I can generate S and if I look at this rule I say well, it's got to be the same as the left-most symbol of the right-hand side rule right we said the first or the left-most here Right, it's got to be the first of whatever that is has got to be the same The first of s has to contain that if this rule is applied right and we're thinking about all possible applications of the rules Yes Okay, so let's say the first I need to pee for a second to make sure it'll be better if we use the right terms Okay, yeah So let's say we have the rule right a goes to Be alpha so one of that kind of already use alpha as a little bit got sequence terminals non-terminals and epsilon's right so Alpha here that I'm using is not Just doesn't represent one grammar symbol it represents just a sequence of any number from zero to a hundred right or whatever It doesn't matter Right, so then what is this big B here some other definition the left-most symbol so actually that's a good point Maybe we should change it from B to Beta Right because it doesn't have to be the left. Does the left-most symbol have to be non-terminal No, right just like in here the left-most symbol is a terminal, right? So we'll call it beta On the slides it'll be B, but maybe I'll update them from now, but the idea is here So given this rule then how do we calculate the first of a? First day, yeah, yeah, break it down Sorry Yeah, so Yes, actually we can leave it here, right? We're just defining relations and rules So we say okay, how do we calculate the first of beta well depending on what beta is right? We have other rules to calculate that right so now I can actually use this rule I can actually use these three rules to say if I have this case right s goes to lily big a BCD Then which rule do I apply to calculate the first of s? What was it rule three rule three and then that says what the first of s is what it's the first of lily First of lily which then applies the first rule which then returns the second taining lily Which then says okay now the first of s is the first it is the second taining E Right, and if I get rid of that and it's the first of a which is a Right, so the first of s is the first of a the first of a is So how do we calculate the first of a? Rule one just says how to calculate the first of a terminal, so is big a a terminal you can't use that But what rule can we use? Three right then we can say okay, what's the leftmost symbol on this right-hand side rule? Little a what's the first of lily which rules that? Rule one yeah, we use rule one and we say okay, so now we actually using these three rules Right, we have a way to calculate the first sets of anything almost anything. Yeah, we're actually not quite there Mainly due to this little epsilon really messes things up not mess things up it makes things trickier So on part in project three This is actually what you're going to be writing a program to do is to automatically calculate first sets and follow sets So it's really important that you understand how to do this how these rules work and how to actually think about them algorithmically You know we're just kind of getting into it now, but I want to tell you kind of part of why we're talking about it It's not clear what the rule is Is this basically There are many different ways to write it We're basically saying for any rule For any rule of the form a non-terminal goes to something on the right-hand side We can take the leftmost symbol and say Of the right-hand side and say the first of the left-hand side of the production rule is equal to the first of The first symbol on the right-hand side of the rule symbol of the right-hand side of the rule Right, this is all this is saying right. It's like pattern matching right with math So you're saying so you're saying like okay, this can apply here to this rule big a goes to little a right because this a matches this a Alpha matches zero nothing. We said it's a sequence a sequence could be zero which means beta matches the Terminal a little a and then now we have And now we can just apply this rule right all we have to do is match it up to the death to our definitions Is it enough? That's kind of a question Because I already said it's not I found out the hard way Yeah, let's change a little bit. Okay, and we'll this will hopefully try to Yes Calculating first the first several spaces Okay, so we're gonna go to a goes to big a little big C. Little a B goes to little B And we'll say big C goes to little C or Epsilon right so intuitively looking at it. What's the first of big B? And what's the first of big C? Yeah, little C and epsilon right the second paintings little C and epsilon I guess So we can already look at this and are we using our rules? So first of B right we said so which which rules do we apply to get to that? Rule three then rule one right so then we say it's little B And now we want to know the first set of C So which rules are we applying Rule three with let's say this C goes to little C and then which rule Rule one to get the second tank C, but there's two remember the bar means there's two rules here There's two rules of the form C goes to something beta and then And then exactly so we apply rule three again and then rule two to say okay That means this is also epsilon So let's just straight apply our rules. So you're setting up hierarchy hierarchy Apply C first and then it doesn't really matter the order. That's kind of the nice things about these rules, right? They all apply all the time. So so the first of C could be epsilon First of C could be little C or epsilon. Yeah, that's why it doesn't matter. So the order doesn't matter It could be epsilon C Okay, so just applying our rules then how do we calculate the first of big a? C Right, so we apply three which is the first of a big C, right? What's the first of big C? Serious, is that correct though? So let's talk about Based on this rules, right? We actually don't care about s at all, but just on these rules, right? What are all strings that a could possibly generate? So if we drew a tree Right, I mean what are the strings like actually right now we can list them Right because these aren't infinitely long strings So what's the set of all strings that a could possibly generate? So what's the first set of this one of the first characters here Is that the same thing is what we just calculated above Do we have a problem? Yes. Yes, so it's a massive problem, right? The whole point of calculating the first sets is that We want to get this right we want to know for all possible strings that a could generate What's the first character that's in there? Yeah, so the first problem would be right so I did kind of already say right But the epsilon is basically what spoils it in first of C Right, so the first thing to think about is okay for rule three actually this rule isn't quite complete Almost what if I have but alpha could be Beta could be sorry so for here Beta doesn't necessarily have to have an epsilon in it or not Let's add another rule for that we're gonna add a rule so that's one problem, right? Yeah, we need to pick up At some point we need to somehow pick up this a right that's not the only problem at this set What's the other problem with it has an epsilon in it? So there's actually two problems with this set right and so we're actually going to We need to add a rule and we need to actually change rule three So where did that epsilon come from? Why did we add epsilon? Because it was part of the first of C right and we were applying this rule directly But does it really mean if a goes to beta? Whatever and something after it remember here. I'm not looking at anything after it I don't want to look into anything after it If there's an epsilon in beta does that mean a can also produce an epsilon? I'm not necessarily not necessarily and not in every case Right because just like in this case if you have an eternal after it well, then clearly a Is never going to generate the empty string right so a the first of a is never going to contain epsilon But just because C started here it means I copied it over Right, so maybe shouldn't we like take out? Epsilon here Exactly one problem Right, so then now we need another rule right So yeah Correct Yeah, that's the last rule There are two more rules to go one of which is gonna pick the a So the second one yeah, the problem you said is right as well Hey, we just took out epsilon here But what if beta and every character in alpha right all of them produce epsilon Then one possible string right is going to produce the empty string epsilon Exactly, so I should put the first set in there Yeah, let's look at that keep that in mind. That's a good point But we need another rule right, so how do I know when to look at the next rule? Or how do I know so basically okay? I apply this rule right? But how do I know to look at let's say the first character of or the first symbol of alpha? When can I do that if the first of beta right? Yeah, the first of beta contains epsilon yeah, right exactly. I can look at the first character there What if both those first two contain epsilon? What if beta and then the first of alpha both contain epsilon like what if I did this does this change anything? Big C big C Hey, ah Right, so this matches rule three with beta right? This C a big C. Little a is alpha So I say okay if there's an epsilon in the first one in beta Then I add the first set of the second one of a then you'll still call To get the first of that right I don't quite have a rule to handle that yet, but yeah, so the idea is we need to keep moving on right no matter how many C's and they don't have to be the same thing it could be D Right as long as D Let's say if you go to epsilon Right every time there's a possible epsilon here I need to keep adding but I stop right if one of these like if there's a B at the end here Right do I ever look at the first set of big B here? Right because one of these has a first set that does not contain epsilon right one of these symbols in that grammar So we're gonna write the rule I'm gonna write it a little mathy because it's very concise and nice So basically we have a rule of the form a goes to beta zero beta one All the way up to beta. It's way too small. Nobody can read that Okay, what I use I got beta I right and we have beta I plus one and then let's say we have alpha afterwards Then I want to say if there is Right, I'm checking if epsilon exists in the first of beta zero and right and Epsilon exists in the first of beta one Right all the way up to if epsilon exists in the first of beta I So if this is the case right if there's an epsilon in the first sets of all of these first symbols Then which one do I add? From this rules of this form, which one do I add to the first set of a? I plus one right why don't I worry about beta zero Yeah, because rule three handles that right rule three will add that okay, so then I want to say the first of a then First of what was it again? Is this is this right? What is alchemy? Whatever is out just anything that comes out exactly. So what do we have to do here? Why do we have to do this? make the exception right well For one case right, but this is just one So do we if there's a first in beta of I plus one can we necessarily add epsilon to the first of a? If there's an epsilon in the first of beta I plus one Right should we add it to the first of a even though there can be other symbols here Right, but if beta I right to get here, let's say there is no epsilon here Right at I plus one this is a recursive rule right so it also applies to beta zero beta one beta I right at beta one if I have an epsilon in it and it must if we got to beta I plus one because of this definition, right? Yeah Should I have added the epsilon to first of a? Right we want to single epsilon but the idea is okay if I get when I get here right, so this is beta I right Do all the betas from one all the symbols here have an epsilon in their first set? Yeah, ccc Then I can add beta I plus one Here, but if I also apply that logic here, right? If this is beta I Right, so I need to add the next one I plus one Do I add the epsilon from the first of C to the first of a? No, right because I haven't gone all the way through I only want to add the epsilon if I go all the way through the string So we're gonna want to actually just like we did before right take out epsilons here Which then leaves us to the fifth rule so we go through this quickly when this intuitively right when do I want to? Add an epsilon to the first of a yeah put another way when there's nothing there little existential But all of them yeah the entire every single Symbol on the right hand side if there's first and all of them then I'm gonna add Epsilon so kind of just very quickly the way we've been writing it. We'd say something like this beta zero all the way up to Beta we'll call it K for the end right so there's nothing after there's no alpha that catches everything after or and we'd say You know if epsilon exists in all of these in the first beta zero and all the way up to Epsilon exists in the first Beta K Then we add second tainting epsilon Right, that's when we can add the epsilon in here So we'll go over these rules right we'll go over them a bunch more formally and we'll see an application of them But they'll they look scary and complicated right but if the trick is just remembering the intuition behind these rules Right and the fact that Yes You derive these just from looking at examples, right? They're not They're not crazy complicated. They're not you know way far out there They just kind of follow from what we're trying to do here to try to calculate this Know this first of a is the second containing epsilon there's also a thing first of a B sub K Right there equals no, this is a statement. I've just tried to write this on one Yeah, this is the body of the Then