 Let's get started. Any other questions? I'll take, like, questions for, like, three minutes until we're going to get to material. Yeah. On one more three. On one more three. Question one, part two. Question one, part two. It says to show that the grammar is ambiguous, but you only get two different parts. Yes. But the first part asks the two different leftmost part. Leftmost derivations. Or derivation, yes. So is the second part supposed to be rightmost, or is it just any? Because technically you could just copy and paste the first problem. So part of that is showing. So we know that something is ambiguous if it has two leftmost derivations, or two rightmost derivations, or two different parse trees. So this is showing you basically that they're the same. That you can make a derivation and turn it into a parse tree. So, yeah. It shouldn't be too, you should be able to easily take that derivation and turn it into a parse tree. I'm not 100% sure the best way to ask this question, but on the project, I did this sense from reading through it a couple of times, that there should be some way to use the same code to parse the input CFG as you used to process the CFGs that that one describes to you. Is that more trouble than it's worth, or is there a point to do that if you think about it and work on it a lot? I'd say it's more trouble than it's worth. So I would say the way to approach it is to parse that input however you can. Kind of like the lecture in part two, right? It's doing longest prefix matching, but it's not doing it exactly as we did in class, it's doing it more pragmatically. And so that's definitely the way. And you're welcome to use code from that we gave you on project two. I think I made a note of this on project three, but I'm going to say it now. Some of you get bit by it. The language in project two is not the same language in project three. So if you just try to take parts of the lecture out of project two and use it as is, it's not going to work right. It's going to really mess you up. So, you know, some things are super handy. Skip space. Super nice function, right? You can rip that function out and use it yourself. But yes, I would just go about parsing. Basically, you need to read in the input, parse it, and create data structures, right? So if you have a data structure for a context-free grammar, and its context-free grammar has a set of non-terminals and a set of terminals, and you have a sequence of rules, right? Zero or more rules. And each of those rules has a left-hand side and a right-hand side, where the right-hand side is a sequence of terminals and non-terminals. Cool. So that's your big step. If you do that and are able to read in everything about this data structure, then everything else falls out from that. It becomes a lot easier. Okay, one more question to context-free grammars. Okay. So we've been talking about, we've been talking about first sets. And so why have we been talking about first sets? Yeah, so we can figure out, just by looking at one token, which production rule to use. Right? And so we saw, let's say we have an example like this. S goes to the A little A. A goes to big A, little A, or B. So is there any choice here on S? Do we even need to read one token ahead? No, there's only one rule, right? If you can just do that. So let's think about the first sets here. Let's actually calculate the first set. So first set, let's go S and A. It should be very easy. Okay. So let's start first. Got this. Okay. So we can calculate this. So S. So first of S is, what am I going to look for here? First of big A minus epsilon, I'm going to add that to the first of S. So I look in here, there's nothing in here. I add nothing to nothing. I still get nothing. And do I move on and add little A? No. No, not possible. Okay, cool. Then I go A and then I say, so first of, let's say, first of A minus epsilon, add to first of A. First of A is currently empty set. So I add empty set to empty set, which is nothing. And I say, do I move on to the other symbol? No. Can't do that. So let's look at the other rule. So I'm the first of little B to the first of A. The first of little B is little B. And there's nothing to move on to. There's no epsilon there, so we're going to add epsilon to little five. So now I recalculate. I made a change. So I go through this again. I say, the first of big A minus epsilon, I add to the first of S. The first of big A minus epsilon is B. So I add B to S. Do I move on to little A? No. Done there. I go here. I do the same thing. I first of A minus epsilon to A. I get B and first of, I then add first of little B to B, so this does not change. If I go through this again, I'll get the same result. So the question is, so from here I don't have a choice. I know exactly which thing I should probably parse. If I was going to write parse S, I'd say parse big A, and then I'd do something else and work out this A. So how do I choose between the rule? Can I distinguish between these two rules? Yeah, so remember, rules are, so we have the rule that the first, if I add two, something of the form, like A goes to B and A goes to C, or A goes to alpha, A goes to beta, then the first S should not have anything in common. So I have first of A little A intersect with the first B. And so what's the first of big A little A? The second containing B. Intersect that with the first of B, which is the second containing B. Is that equal to the empty set? No. No, so I actually can't tell which one of these should apply right off the bat, right? Okay, cool. Now let's think of a different situation. It does not necessarily mean that the grammar is ambiguous. Ambiguous would be partial trees. So we'd have to generate either two different partial trees that generate the same string, or we'd have to generate two different legumes derivations, or two different marigose derivations. Either of those would show that it's ambiguous. Here we're showing that we can't just look one token ahead and figure out which rule to take. Okay, so this is with kind of the first rule we derive. So let's look at a different example. We have A is similar. A, little A, or epsilon. First steps again, our trusty friend, empty steps to start. Let's start with A here. First of A minus epsilon to A is nothing. First of epsilon is epsilon, so I get epsilon here. Then to S. First of A minus epsilon to S, what does that add to S's first step? Nothing. The first of A is the second tiny epsilon. We subtract out epsilon, and so we get rid of that so we have nothing to nothing. Then we say, does rule 4 apply? Yes. So then I add the first of little A to the first of S, and then do I go on anymore? Rule 4? No, there's nothing to go on to. And I ask if rule 5 applies, does epsilon in the first set of all the symbols here? No, it doesn't apply. Then let's go one more time. So here, so we actually already calculated this up here. So what's the first of big A and little A? A. A. And then the first of epsilon is epsilon, so I have A epsilon. Here, let's quickly. And then here the first of S is still going to be A. If we do this one more time, we'll get exactly the same results. Okay, so let's go back to our first rule. We have a rule A goes to alpha, and rule A goes to beta. And so we want to say the first of alpha intersect with the first of beta. So first of capital A, little A, intersect with the first of, what was this? Epsilon, which is the second to A epsilon. This is the empty set, right? Cool. So by this rule that we derived, we decided that there's no possible way or that we should be able to distinguish by looking one character ahead which of these rules to choose. So let's say, I got to zoom out a little bit, it's getting crazy, okay. Let's say that we are writing parse. Here we say our handy t-type is equal to get token. Now how do I distinguish between these two rules? Yeah, so let's do if, so if t-type equals A, then what does that mean you can do? We'll start with parse A right here, right? So this is the only thing we care about. Which one of these two rules do I have to do? Big A to little A? Leaving some stuff out and not writing real code here because we haven't gotten into all the details and finer points here. We're technically going to get a token in here. So let's think about this. So let's start this up. S. How many choices do we have here? Fun. We have A. So now here we have two choices. A little A or epsilon. So let's go A little A. And then this A little bit of epsilon. Cool, so what strength is this? So based on our code up here, we're trying to parse A. So we're trying to build this part of the tree. We call parse, we check. Yes, get token returned to little A. Great. Then call little A big A, right? Which is going to... So the first little A is going to parse here. Remember we did unget token because A is supposed to actually generate that A token. So our input is going to be here. Now when we call get token, what do we get? A little A, which causes us to try to be what? To parse a big A again, right? So when we see an A, which of these rules are we going to always choose? The first one. So why are we reading A for epsilon here? Because the rule here is A goes to epsilon. So the epsilon generates nothing. So then what are we actually reading when the rule produced an epsilon and we had A goes to epsilon? What are we actually reading? Where did this A come from? What was that? The next token. Whose next token? What's the next token? Where did it actually come from? Yeah, whatever... So we talked about first sets, right? Where first sets are all possible strings that that non-terminal generates. What is the first token or first character that those strings can end up with? But here, where did this A come from? Yeah, it came from because it comes after A, right? It came after A and because when A goes to epsilon, it went to nothing. Now whatever token we're reading comes from whatever is after A. So when we're here in parts A and we read an A character, or we get a token that returns A, do we know that it's because it's this A that was generated in this rule? Or was it perhaps because it's... So let's talk about another situation. So we look there. So we have S goes to big A, little A, A goes to epsilon. So same thing, my resulting string is just A. So when I'm in S, I call parts S. Parts S just calls parts big A. Big A says get token, it returns me A. And then what? Based on the code we wrote here, I wouldn't say it's a parts A and then check a little A. But I would choose this rule. But once again, where did this A come from? It came from S, yes, it came from S's rule and it came because this little A follows the big A here in this rule S. So back to our original question, we pass this check, right? We have this check that says, hey, there's nothing in common between these two rules of whatever their first character is, but can I tell which of these rules to apply if S goes to big A, little A, or epsilon just based off one token? No, but the question is why? So what's special or different about this as opposed to the other grammars that we've seen? It's what? Same right hand side? So you mean the fact that this and this are the same? A, A, A. Oh, sorry, this is not the one. A, A, possibly. I could change it slightly. I mean, I could change this and mess it around, but it'll still work. Is our goal here given an input sequence of tokens to determine if we could derive that sequence of tokens using our contact programmer? Yes, we're going to construct a parse tree from the sequence of tokens that we're given. And the parse tree that corresponds to exactly two that generates that string. So A, the first thing is we need the language to be unambiguous. We need there to be only one parse tree for the thing we're talking about. But then the second challenge is can we do this by only looking at one token ahead to decide which rule to apply? So you think about it, when I'm drawing the tree, we're always starting from the top. And so I'm saying, okay, starting from the top with rule S, can I know by looking one token ahead which rule S goes to alpha or S goes to beta or S goes to delta? Or I don't know what you're talking about, but between all of those rules we want to know how to draw that first level of that tree. I think the first token from the input string we're comparing that with the first set of each of the rules, of each of the non-terminals in our grammar. Of all the rules that we have so like, exactly, with that left-hand side. So in the first case we're talking about parsing S. So we're starting with the root of the tree. And so we need, let's say we have rules of the form S goes to alpha S goes to beta S goes to delta for lack of a better term. We want to know by looking, so we've started our tree, our tree starts at S and it always starts at the top. So now we need to know just by, so we have some sequence of character of tokens. A, B, C, D, E, F, G. And just by peeking ahead at this one token which of these rules should it be S goes to alpha S goes to beta or S goes to gamma? Right. I asked that question earlier. So we have an unambiguous grammar with two rules of the same person. No, we don't know how to think. Correct. Yeah, so that's exactly where this rule comes in. Right? So this rule that we kind of derive basically says if we have two rules of the form A goes to alpha and A goes to beta then the first sets, the first of alpha, I guess I didn't write it all right here. First of alpha intersect with first of beta better give me the empty set. So this means by looking one token ahead I can decide between alpha, gamma, or delta which of these three rules applies? Yeah. Is not it the empty set is that unambiguous grammar? No. Not necessarily. So that was the question that I asked. Yes. Apparently if it's not the case that the set is empty then we need some other method to determine which rule we're going to use. Exactly. That's part of the problem. So the problem is in some cases like we just saw here, here I have a case where the intersection between alpha and beta is empty but still by looking one character, one token ahead I can't actually distinguish between these rules of A goes to big A little A or A goes to epsilon. So we're going to make a name for this in a second of what these rules are that we're deriving so that we can discuss between ambiguous and the thing we're talking about right now. In order to do this we have to develop a new type of function similar to first sets but we need to understand why we're doing this to understand the need for this. So in this case, this is not a good example because we already said we can't decide between them because the first sets are the same or the first sets do have an intersection. So let's look at a very quick see if we get into this problem again. So while I'm writing these if you want to practice on first sets policy first sets whatever feel free to use any of the grammars in here come back, look at them and calculate yourself to compare what we got in class that's always a good self-check mechanism. So we start with empty sets this will be B epsilon and so S will be B A So now we can do so by the rule of the guys okay the first of little B A little A the second containing B intersect that with the first of epsilon which is epsilon and that intersection is what? The empty set but nothing in common. So it passes over rule one. So then is it the case when we try to parse A we look one token ahead and we see a B can we determine between these two rules? Yes. Why? But in this case this first in this case they don't intersect but we decided we can't determine between these two rules right the first of big A little A as we calculated here is A A intersect of epsilon is the empty set and yet when we saw an A we didn't know we don't actually know which of these two rules applies maybe? Why do you think it has to do with epsilon? How does that change things? Because when we have a non-terminal it goes to epsilon it goes to nothing but we don't read nothing when we call getToken we have to actually read a token so let's draw a parse tree here I think this will help we have S goes to big A little A so we have two options here we have one here goes to epsilon and then we have S big A little A A here goes to B and here we have A go to epsilon so we have two strings right B, A, A and just the string A so as we said when we try to parse S S has no choice there's no choice we always know we need to parse big A and figure out how to check for this little A so all we care about is parsing this A so the question is here so in both cases I'm going to try to be here and try to parse little A and try to make the rest of this tree so I'm here I read one token ahead in this case it gives me B in this case it gives me A so based on that can I distinguish which of these two rules I choose reluctant yeses why? in the case of nothing what am I actually reading the next token what is the next token in this grammar what comes after A what do you mean for S I was talking about when this A goes to epsilon what are we going to read this after the little A that came from this rule here and when this A goes to epsilon what are we reading from this A that came after it because when we're trying to parse we've already parsed everything before it and so we'll read the next thing that comes after it if A goes to an epsilon so we have a B here so when we read when we're trying to parse A if we get a B token what does that mean yeah it has to be this first rule right it should be B A little A because we know if A went to epsilon what character should we be reading a small A because of everywhere that A appears in the grammar we know that if A went to nothing the next thing right after it is a little A so let's think about that we just did here and use that on our previous example so here what follows A what comes right after A in all the rules on the right hand side little A and A can go to epsilon we have a rule A goes to epsilon so when I read in an A which one of these two rules did it come from why can't we decide yeah it could come from the first rule because we calculated that the first of this rule is little A but in the case that big A went to epsilon it could also be any of these A's that came after the little A so we don't have a way to distinguish between these two cases because of how A is big A is used in the grammar and what comes after it still in the case where the intersection of the first sense is still empty yes so we have an intersection of the first sense is empty we still can't tell which rule it is exactly because of the nature of the rules because it really boils down to this epsilon because in the case of epsilon we get nothing in the output string that token that we read in did not come from A because A did not generate any output it didn't create any token there it went to nothing and so the question is essentially the question here is did A loop back on itself and call parse A again and generate another small A or is it done and did it go to nothing the problem is you can't tell if it went to nothing because there's the same characters that this string could generate the first of this string is what is also after A I think the grammar ambiguous I think this one probably would I'm sure you can come up with an example that is not ambiguous but has this problem anyways I think if I got rid of this A it would be not ambiguous but you still can't tell by looking one character ahead for the same reasons because you read a character one token you'll read an A you don't know just looking at one character ahead if you should stop in this case epsilon maybe you don't have this problem oh yeah because you got this A here you don't know if there's only one more A left or if there's going to be a bunch more A's left it's a good example let's walk through this bringing this I can just do S goes to S little A or epsilon so same thing so the first of S is what epsilon A so let's check our rule one so another question is this an ambiguous grammar give me a string that has two different parts trees A so I can generate A with S goes to S A goes to epsilon what's the other three S to what S to what so S to SA the only choice I have is SA S goes to epsilon so this generates the string A this generates the string AA right we need two different parts trees for the same string it says like sale so is it ambiguous no right because exactly how many times we apply this rule is how many A's are going to be in the grammar we have no other choices right we keep so this all this context we have described is all strings of zero length or one or more A's and for that entire length you just create and keep applying rule one until you're done and then apply rule two right there's no other choices there yes we're using we're using them interchangeably so well okay let's think about it in a couple different ways yes ultimately also in the context for grammar it's a terminal symbol which could be anything and here our terminal symbols are all characters and we're concatenating those characters together but yes ultimately we're using this to describe and find tokens so these will be our terminals will be tokens and the input is a series of tokens from the get token function to make it more confusing to get token function as we saw we'll try to look as head as many bytes as it can to find the longest token that it can return but then once it decides that it's done right we'll get token again epsilon is creating so many problems then why are we using it yeah I mean do you want to define languages where you like have to have something all the time I use it I guess if you didn't have rests in music it would just be sound for entire song yes that's a good analogy or you think about programming right think about an if-else structure right do you have to include the else clause by any time you use an if no but you can't use an else by itself because that doesn't make sense right so you have the option in some sense after you have an if to do an else right but you don't have to do it all the time but you better not have an if-else-else because that also doesn't make sense then you get to crazy things like if and then if-else and then if-else and then else right so you have any number of if-else's in there so yeah when you program you don't always want to have to do everything all the time then you couldn't even say things like I don't know and every function has to have one class it can't have zero classes you can't have zero imports you always have to have one import because of this epsilon problem how do you say that? epsilon helps the regression? yes I think so that's what I was going to say at the start but I don't know if that's true or not I don't like to say things that I don't know okay so we have a language here it's unambiguous it passes this test but the question is by looking one token ahead we know this is only a part strings of A's so I start at S I have this string A I call getToken that returns me one A which one of these do I choose? S-A or epsilon? in this case I choose S-A but why do you choose S-A? you're an algorithm you can only deal with the information you have you know that you read in the token you have no idea what comes after it so what made you make that decision? let's go back I'm going to change my style actually I think it's fine okay so the rule that we decided if we're reading A then it's got to be S goes to A but all valid strings that this language is going to produce or all sequences of tokens that this context through camera represents is all little A's so are you ever going to not read an A? so how do you know when you've reached the end? how do you choose S goes to epsilon? okay let's see if we run out of A's okay we start here, we read in one A we call ungetToken, we start here so now I read in another A I go here I call ungetToken I read in another A am I ever going to stop this? the problem is what comes after S in this rule? A and what's the first of essentially S lowercase A so in this if the left hand side rule applied we know the first is little A we're going to read in the little A for sure if this rule was applied but if this other rule was applied where did that token come from? if we chose this one where did that A actually come from? S from S A because A comes directly after S so the problem is we don't know when we're reading this A the fact that everywhere that S is used a little A can follow S right? and so now we're going to think about is we have first set so what does first set mean? all possible strings that this generates what does the first set mean? the very first whatever is the left most the first token that all possible strings generate and that's super useful because we need to be able to distinguish between these rules but if S can go to epsilon if you think about it S is a little tricky but we've already seen a recursive S so we know that S isn't the entire string the resulting string there could be other stuff after S so let's say S generates this string this part of the string and the question is what is the token that can come directly after S? so in all possible cases across all possible rules what's that token that character that can come directly after S? because if S goes to the empty string to epsilon then we must have read in this right? and so if I have the choice in S I have multiple different rules one of them has epsilon in its first set which means you can go to nothing and one of it obviously it does not well you have to assume that they do not because otherwise we find like a first rule if two rules can both go to epsilon then we already say no we can't do it by the first rule so we say we have A goes to alpha and A goes to beta and nothing in common but beta can go to epsilon and so this means that we actually want no intersection between the first set there and what follows right? the character that directly comes after it because if this is let's say B and this is an A this is the second containing B now when I read a character I'll know what is epsilon or does it instead go to this other rule to do that we're going to find a handy little function called follow where follow is what are all possible string tokens that can come after a non-term makes sense? intuitively not really intuitive but the fact that you've been sitting here listening for weeks you've developed some intuition about this right? so what this is saying if A can go to nothing then it better be the case that I can distinguish between the first of A intersect with the follow of A so that means that hey strings that I can create if there's ever an overlap here then I cannot distinguish based on looking one token ahead questions on this? sweet so now just like we did on first sets we need to define follow sets right? we need to define what do we actually mean what is this function so let's talk about it what's the input yeah how far? to what? so what's the first of what's the first of S little A I'm asking you now so the first of S is the first of S right the first of S is epsilon but why do we not include epsilon here? so one way to think about it and we didn't really talk about it we talked about how to do first sets we said there's sequence of characters one way to think about this this is completely equivalent to saying what's the first of B where B goes to S A taking that exact same thing in here so what's the first of B so S minus what? minus epsilon so it's A and then you say is there an epsilon in this first set? yes so we go on to the next one add the first of little A to the first of B so it's little A and then you say if I move on no there's no more because rule 4 applies are there epsilon in all the first sets? no there's not so I don't add epsilon here so that's why you get A for this one intersect with here it's clearly epsilon so this is just essentially a shorthand or a different way to think about doing this so we wanted to find this follow function when we define functions what do we want to think about inputs and outputs so what's an input to this function? yeah so for right now in the other one we talked about sequences of characters or sequences of terminals non-terminals here we only care about non-terminals alright so it takes a non-terminals input then what's the output? terminals yeah so it's going to be a set of terminals and let's think about this so first sets the output is what? terminals and epsilon so is epsilon going to be in our follow set? no right and but it comes from the definition here we want what is the characters the things that the terminals that can come after S is produced right in the resulting string epsilon is never going to get produced in the resulting string therefore it doesn't make sense to talk about epsilon here so there's no epsilon in follow sets and this is something you can use to essentially type check your work while you're doing it if you're calculating follow sets by hand and you get an epsilon in your follow sets you're doing something wrong let's think about this what's the follow of S? what comes after the S? let's try to treat it S A right so we want S produces this whole thing so we want what comes after it where are we in our input string? end of file end of file we've got all the way to end of the string so if we're here and we call getToken it's going to return end of file now this is actually really good so on Friday we're going to go through all the calculations for follow sets and we're going to take your first set so we're going to develop the rules