 While I change the battery in the microphone, hopefully this works. Anybody have any questions on project two that we can talk about now before we get started? Yeah. So the get token function in the lecture.c code, that's how we're getting input in? Yes. Or do we have to use c in the cn? No, you shouldn't be reading anything, right? So the token, the lexer, is reading all the tokens in. So the whole point of the assignment is you're learning how to use the API of the lexer. And that's... Useable. So that's what you'd be using. Okay. Yeah. Is there, would you be willing to like record some kind of like how to debug with the lexer baby? Because I was having problems and I couldn't figure that out. I just ended up copying the individual studio and doing that. But I know I won't always be able to do that. Correct. That's a good question. This battery also did not work. Anybody like AV tech? Try the third one. So we'll not be putting it in upside down. I don't know. I think it's a 9 volt battery. I think it only goes in one way. I mean, it did turn on, so it's not getting that. There we go. Hello? Hello? Yeah. Yeah, maybe send me an email. It takes a long time, but it's hard to come up with a good example. Or just a link to like a good resource that you can use. Yeah. Yeah. Email me. I'll be happy to do that. Yeah. I mean, GDV takes a little bit of time to get used to, but it's time well spent because I use it all the time debugging C code and C++ code. Okay. Yeah. I've been actually meaning to do that, but it's just a time section. Okay. Cool. Any other questions? Project two? Okay. So where we left off, we were talking about parsing. So what's parsing? What do we care about parsing? What is parsing? Turns a into potential strings. No. What is that? How do we take a grammar and turn it into a string? Derivations. Yeah. So derivations, right? We can take a grammar. We can do derivations to get a string. We can also create a parse tree from a grammar, right? That will also give us the strings. Parsing is essentially the opposite of that. What would be the opposite there? A string. No. That's lexing. Very close. After that, after we've lexed, we have a series of tokens. That's not what we want to do. Yeah. Match it to a grammar, right? And how do we know that it matches the grammar? I was just saying something. What? The rules of the grammar. The rules of the grammar. Correct. Yeah. So we want to build a parse tree from it, right? So we're parsing. Parsing is we want to take in either a string or a string. Either a string or a series of tokens, right? The series of tokens from the lexer. And we want to say, did this string come from this grammar? Right? And if so, what's the parse tree specifically for this grammar? So that's what we're studying now. We're studying now the reverse, right? So we've looked at, okay, it's pretty easy to, if you're giving a grammar, right? I can, like this, expression. You said taking a string and making sure it follows the rules of the grammar? Yes, at a high level. That's what it's doing. Our goal is, from parsing, we want to take in a string, and we want to output the parse tree that that string corresponds to of the grammar. Right? So if we have these rules, we can very easily apply these rules over and over, right, until we've reached a string that has no non-terminals, only terminals. And we know that string is definitely in the grammar. But now we want to do the opposite, right? We want to start backwards. We want to, we start with the string and say, can this string come from applying any of these rules? Is that easier or more difficult? More difficult. More difficult? Why? There could potentially be multiple ways to parse a single string. Yeah, there could be potentially multiple ways. Yeah, we'll get into that. It's kind of what we talked about with ambiguous grammars, right? There could be multiple parse trees. Actually, if you think about it, that could make our job easier, right? Because we have two targets or many targets to go to, right? We're starting at the string and we want to find the parse tree. So generating all these strings, right? If I asked you to generate, like, write an algorithm to generate all possible strings that this grammar could create, would you be able to do that? No, you can't write one algorithm to do that? You may never finish, but it's pretty easy to write it. Just apply this rule, apply this rule, and if this rule doesn't fit, then the next time you try this other rule, right, you're just trying all possible combinations of rules to spit out strings. But here we're going in the reverse, right? We're starting with this string 1 plus 2 times 3, and we're trying to ask the question, is there any possible way, any possible combination of applications of these rules that creates this string, right? So it's actually a very kind of a tricky problem. So let's kind of think about how we're going to do this. So what are the two things, what are the things I need to know to be able to do parsing? My inputs, let's say, to the parsing. You need your string of tokens, and then you need your rules of a grammar that you can try and match to. Exactly, right? So you can't, like, parsing is specific to a given grammar, right? If you don't have the grammar, you'll never know. You can't answer the question if it matches or doesn't match, right? So we have our string, let's go with, this will be our string for right now, and then we'll have the grammar, S goes to big A or big B, and then we'll have big A goes to little a big A or a, and B goes to little B, big B, or B. So can somebody describe kind of what this grammar, what strings in this grammar looks like? Somebody over here in the back who hasn't spoken in a while? Morning. What are you asking? What do the strings look like? What do the strings that this grammar produces or generates, what do they look like? Yeah? It contains only A or B. It contains only A or B, right? So you can see that S, right, so S is the starting non-terminal, so we're going to start from there, all strings must be derived from S, and then it's going to be either A or B, and this will be any number of B, A's, any number of B's. So when we're parsing, we want to build a parse tree, right? This is our end goal, we want to build a parse tree. So what's the, wow, as we're going to talk about, right? So S is always going to be the root node of our parse tree, right, whatever the starting non-terminal is, because every string that this language generates has to be, has to come from S, right? Any string that's derived has to be derived from S. So this is all we have. We have the rules, we have the string. How do I decide what's the child of S here? Pick one, check the token. Pick one, check the token, which token? The first one? Yeah. The first token? So check A, but A doesn't match big A or big B, so how do I decide which one to choose? The first one. Which events as children could yield little a? Which of these children could yield little a? What do you mean by yield? Like, which one, which one would generate little a? So how does that help us? It's possible to be A, to be big A. Possible to be A? Is it only possible to be A? So if you check B, it does a little B, so you have a little B. Well, out of, out of the children, the only one that can generate little a's is big A. So in this case, it's simpler, but in a more complicated grammar, it would be... We don't have to worry about other grammars yet. We're building up to that. But, yeah, we want to think about this example, right, and try to see, okay, how do we actually do this? So remember, once again, we're building, our goal is to build the tree from the top down. We're doing top down parsing. So we're going to start with S, right? So now we need to decide, okay, which one of these does S, which one of these rules was applied to generate this string? Was it S goes to A, or was it S goes to B? Right? And we can look at our input string, right? And we'll look at it left to right. And so we look at this first token, A. Say, okay, can big A generate little a's? Yes. As the first token? Yes. Can big B generate little a's as the first token? No. So we know it definitely can't be S goes to B, right? And so now we can say, okay, S goes to A. So we know that's the first rule that's applied. But did I actually match this little a when I applied this rule? No. In this tree? It's just possible. Right, it's just possible, right? We know it has to start with a little a, but we don't know that this rule here, applying this rule didn't generate any terminals, right? So we can't just move on and say, okay, we've matched that. Now let's try to match here, right? So let me go to this rule. A goes to little a big A or A. So which one of these rules apply? Because we need to generate more tokens than we are. Yeah, but how do you do it just with looking at this string, right? Well, if we only do little a, that's a terminal. We can't generate any more tokens from that. And since the string that we're trying to parse is not a single little a, we can't choose that child. Cool, so let's think, have we parsed this little a yet? Have we matched it up to a terminal in the grammar? Yeah. No, not yet, right? So we still have to consider this a, right? So is just looking at this a helped us distinguish between these two rules? No. No, right? Because, because why? Because we can't look that for the machine. We can't look that for forward. I didn't know about the machine, right? If we just think this a, right? These two rules, remember, before we can say, okay, we can determine between big a and big b, right? Just by looking at that first token, right? If it's an a, we know it has to be s goes to a. If it's a b, we know it has to be s goes to b, right? But here, just by looking at this, well, we can say, hey, either of these two rules could possibly match, right? It could be little a big a or it could just be little a, right? But if we look one more ahead and we see, in which case, so if we looked, you know, one more ahead and look at these two tokens, right? Then can we distinguish between them? Yes. So why is it not the little a? Why is it not this rule? This is more than one a, right? Yeah, if we choose this rule, we have to stop. And there's no possible way if we choose this rule, we match the string a, a. So let's choose this other rule. Isn't it kind of the same concept from what's called analysis where we're looking for the longest possible match without any other potentials? Current match would be the little a and possible. That's a good question. I think there are some similarities there, but I'm unsure if it will cause problems if I think about it like that. That's exactly what I was saying. You pick the little a, but you know you still have possibility for the little a big a. So you can't finalize it. We give here. So is this a valid parse string? No. It's a parse tree that this language can generate, right? It's a valid parse tree. Is it a parse tree that matches that input string? No. No, exactly, right? Because this represents the string a single a, which is not the same thing as our four a's. Exactly. And we can tell that by looking at two characters this time, right? We can look at two and say, okay, we know it has to be, this has to be the bullets applied. Now, how much of the string have I parsed so far? Have I matched up in the parse tree with the input string? Just one, right? But I looked at two. So does that mean I consumed those two and try to start parsing from here? It never reached the end of the line. Exactly. So this partial parse tree that I created at this point, even though I had to look two tokens ahead to decide between the rules, right? Even though I had to do that, I know that this token, I know that this one is actually in the tree, right? So I know I've actually parsed this token. But now I still need to match three little a's. So then applying that same logic to this a, which rule then do I choose? I don't want to terminate you. Yeah, we can look at the same thing. We look at the two a's, right? And so which rule do we pick? Little a, big a. So how much of the string have we parsed? Two characters. Two characters. Are we done? No. No, because we still got a big a, right? So now we can do this again. We say, okay, which one applies? By the same logic, right? Little a, big a. And so now I've parsed three, right, of this string. So now I get to this last one, right? So which of these two rules apply? Chess little a. Because you try to look ahead and you say, ah, there's no more, we've reached the end of the string, right? There's nothing more there. So we can say it has to be this one, right? It has to be this little a. So now the question, the self-check, right, is to first ask yourself. Let's do that a little bit. All right, the self-check, right? Is this a valid parse tree in this grammar? And how do you know? How do you tell it's a valid parse tree? It ends in all terminals. It ends in all terminals, yeah. All the leaves are terminals. Yep, that's one way. Each subtree corresponds to one of the rules. Yes, each subtree corresponds to one of the rules, right? Each parent-child relationship, right, each of the levels of the tree corresponds to one of the rules. So here we go as S produces a. We have the rule S produces a. A goes to a little a big a. A goes to a little a big a. And at every level we have that, right? So then we know it's a valid parse tree. Is this parse tree our input string? Our input string was four a's. This is the string as we go to all the leaves, right? Four a's. And so we successfully parsed this string. So we can say yes, this string was definitely in our input language. Right, this string is in, is one of the strings generated by this grammar. And how do I prove that this is one of the strings generated by this grammar? Yeah, the parse tree. I showed you the parse tree, right? And you can validate. You can say yep, all the leaves are terminals. And at every parent-child we have an application of the rules. So yes, this is definitely a valid parse tree. Another example. So here we have the string or we have a grammar. So what does this grammar represent? So some of you try to describe this for us in plain English. Is it an a's or an even number of all b's? An even number of all b's and an even number of all a's. I think there's one caveat maybe. It has to be at least two. At least two, yeah. Right? Yes, it is. Right? So this is at every a's going to generate two a's and a little a's or just two a's, right? So we'll be all even numbers. Let's try one. So what do we start our tree at that we're trying to parse? Kind of like a five or something like that. Okay, yeah, it still does. All right, so now here's my input of strength, just two a's. Right? So let's look at the first character. By looking at the first character can I tell which rule to apply? S, yes. Or S, yes, why? Because there's only one way that you have any chance of getting a little a's to choose. S goes to big a's and then you have to figure out what to do with that. Right, so if we choose, so it has to be a's because a only generates big a's or little a's. But what about if this next character's a b or a c or something? The whole string's involved. Yeah, okay, so we can look at this one a and this one a can tell us exactly which one to choose, right? So why is it that we can just look at this one a and we don't have to look at any of the others? You'll still have to. Yes, okay, so that's definitely true. So you're already kind of going ahead, right? Like how are we going to parse this when we get down to here? I want to kind of think about here, right? Just at this level, at S, how do I decide between a or b? And how can I decide just by looking at that one character? What do we know about every single string that b could possibly derivate and produce? There can't be an a, but specifically about that first character. It has to be a b. Yeah, exactly, right? So if I were to think about if I had a function, oops, that's not how you spell first, right? If I had a function that took in like a non-terminal in this grammar, I was able to calculate what's the first all strings, what are all possible strings? What are the... What are... What is the first character that exists in all possible strings of this language? Just a b, right? So this means that every possible string that is derived from a b must start with a little b, right? Any other options in here? So how do you know? How do you know that? Yeah, you can look at every rule of big b and you can look at this and say, okay, little b, little b big b, what's the first character that this sequence of terminals and non-terminals is going to generate? It has to be a little b, yeah, exactly, because it seems trivial, right? And then you look at the other side and say, okay, little b for this rule, well, I can also choose b goes to this rule, right? So what would be the first all characters, all strings that bb generates, what's the first token character terminal that they start with? Little b, yeah. And if it wasn't, you just put the or sign and put the other character there? If it wasn't, so let's say we could say or c, right? So this is a set, so we could just say the set containing d and c, right? So this is all possible characters that strings that b generates can start with. What about the first of a? What are we going to say? Anything else? What about the first of s? We'll get rid of c for now. Right, so we know s can go to either a or b, right? So that means if we choose s goes to a, well, since we replace the s by a, then whatever a starts with, s also has to start with, right? And the same way it goes through the other way, right? So whatever strings b generates, whatever character they start with, s also has to start with that. So it's kind of crazy, so I just want to reiterate, right? So we're able to, I mean right now we're kind of doing it by intuition but we'll see there's actually a way in algorithms to generate this. But we're able to tell something about basically so what's the, so if I call this whole thing g, right? All these rules. What do we know about the language described by g for this set? What are some things we know? Is it a finite set? It's an infinite set, right? It contains every possible string of even greater than two, even number of a's and even number of b's, right? So this set goes on forever. But with these functions, right, we're able to say something about the strings in this set, right? We're able to say something definitive. We're able to say, hey, every string with s starts with either an a character or a b character, right? And then we can say, hey, any string derived by the rule b has to start with a b. And any string derived by the rule a has to start with an a. So if I calculate all these, right? Now, if I have the choice of s goes to a or b, how do I know using these, how can I tell which rule to apply? Yeah, look at the first character of the string and then what do I use of the first sets? Yeah, whichever rule applies, right? So I can say, hey, is the first character of the string in the first of a? Then it's got to be derived from that. Is it in the first of b? Then it must be derived from that, right? So this actually helps me choose which rule to parse with. So what language does this grammar describe? A string that starts with a's and then has zero or more a's and then either a c or b. Yeah. I think one or more a's starts with one. It starts with a which means it has that one and then zero or more a's. So all strings that have one or more a's followed by either a c or a b. So let's try to parse this, right? So let's have the string, so I start with s, right? So which rule applies? So what do we do before? What was the first, how do we kind of try to tell which rule to apply? Looking at the first character but that wouldn't help this year. Exactly, we tried to look at the first character, right? We'll just look at the first character for your help, right? So why doesn't this help? Because both a and b start with a. Yeah. Right, so I actually can't look at only the first character because that doesn't help me choose between these rules. I have to look for the first non-a character? I have to look for the first non-a character? And even then, am I done? Well, the problem with this language is that it could be infinitely long or at the very least your worst case could be very, very long. So you would really not want to have to move through the a's from the front. You just want to grab the final Right, so part of the problem is right as I can't decide. So I could maybe try to say well, I can't decide, so let's try just for argument's sake we'll try s goes to b and we'll try to create a string from that so that doesn't parse anything here. Then can I choose which one of these rules apply? No? Yeah. How many characters do I have to look at? Two. Two, right? Yeah. If I look at two characters, I can see it's got to be this one. So I got a little a I can do that again so now I've parsed this one so now I have the second one so I can say a little a big b I parsed the second one now I try to look at this rule again so does this rule match either a, b, or a, b? Right, so I can say, oh crap I went all the way trying to parse this string what was the one assumption that I made? Then it was going to end with a b this first rule so I was like, I can't choose between just by looking at the first character so I'm just going to choose one but now I need to backtrack and revisit that choice because I can't parse the string like this so maybe I have to choose a so now I have to go back and start this all over again start with a and then I can decide here by looking at two okay it's got to be a little a big a and then I can choose it again I'm going to play big a and finally I can go with a, c and that's that string a, c where I got that there so what was some of our goals that we wanted from a parser? Is this efficient? Why not? Well it works out okay if we have two possible rules that we have to choose from but if we have like 30 it's going to take a long time to guess right What if I had two choices here and here I had two choices and here I had two choices right so my number of possible choices explodes right it's going to be two to the whatever number of choices I have to make and so I've got to try all possible of those choices until I either find a tree that matches or what it's been out here so I say no tree matches I tried all possible combinations of the rules and nothing worked so if we were to write c what's the first set of a what about c? right so every string that a generates right has to start with a little a right even though yes there is a case where it's going to be a little c followed by it the fact that there's a little a there means that it's just a what about the first of b? also a so then what was the problem here why couldn't we decide why looking at that first character? because what? the endings were different the endings were different specifically with the first sets right we couldn't tell just by looking at that first character because in the first set of a and the first set of b right their intersection they have an overlap there so if we see an a we have to say oh it could be either from s goes to a or it could be from s goes to b I want to go over how we're actually going to build a parser to look like code so let's go through an example and then we'll kind of walk through one so the idea is in top down parsing so we want to build these parse trees right because the parse trees represent the way the rules are applied when we actually create these we're going to want to essentially write a parser and our parser is actually going to mimic the tree itself and the structure of the calls so we'll do this very quickly so we have we're not very quickly but we'll definitely look at this so we have a grammar here s goes to big a big b big c a goes to little a b goes to big b little b b and c goes to big c c or epsilon so what's the language described by this grammar this tree is empty a single a any number of insect of little b's or any number of little c's yes is that right how can we just have one c, can you have one c? yeah yeah, why? yeah so we can first say c goes to big c little c and the next time we say that big c goes to epsilon we say epsilon to cat a with c is little c so what we're going to do we're going to write a function for each of these rules for each of these production rules so so what's the what's the first set of big a just a little a right there's only one possible string that a can generate and it's little a what's the first set of b little b what's the first set of c epsilon or c because c can generate nothing so we have to keep that in mind so based on what we looked at that before can I efficiently decide from s can I efficiently decide between which rule to parse for s each of the first sets are different there's no intersection there so to write this in code it follows pretty much exactly from our intuitions from this so we first say okay what's the get token function it returns the first token of the string yeah it returns the first token right this is exactly the same semantics that you have in lexer dot c that you've been coding against and using in project 2 this is why you do that not just because we like it so what do I know how do I check if I have rule big a applies I look at this t type I look at this token type what am I checking if it's a if it's a then what so what do we know about get token from the lexer what are the semantics there so it returns the token type what else does it do it also stores whatever token value stores the token value stores the line what does it do to the string yeah yeah eats the first token from the input and moves the lexer forward through the input because it made the decision to parse that token and so next time you call get token it starts after that input but if we think about our tree that parse tree that we want to create have we actually parsed little a here does s generate generate little a you're shaking your head possibly parse little a or parse a yeah so it doesn't say right the only rule we have right now right now we're just concerned with parsing s so we need to decide which rule to apply right big a big b or big c right but neither of these actually generates a non a non sorry a terminal none of these rules generates a little a free right we have s goes to big a but we haven't actually created that leaf node for a there right because this rule does not parse that and so if we don't want our token to be consumed yet because we haven't actually matched it and parsed it then what do we have to do with the lexer was that yeah we want to put that token back right and say okay we're not really ready to actually parse this because a goes s goes to a doesn't generate any terminals right I just had to look ahead and kind of peak at the next to tell which rule to apply but now I decided right I decided I know it's a little a so I know it has to be s goes to big a but I'm going to let parse a figure out how it wants to handle that so then we're going to call parse a right and parse a is going to be very similar it's going to make sure that it parses everything else do we know after after we've parsed a what what do we know about the input for the input we're done we'd better be done right what if there's more after we try to parse a right yeah that string is not in our language so if we get token again if we set our t type you can get token if that's not in the file then we know there must be a syntax error that we know that this string did not come from this language right now we're going to abstract this into like a just a check into file function that reads from the token exactly that reads from the token and say is it the end of file if it's not then throw an exception throw an error or say that there's a syntax error so what's the next t type we want to check of this get token so it's b so what's the first thing it's going to do unget the token and then what parse b and then what check the end of file so what about the next one what if it's a c if it's c or epsilon can't can you read in an epsilon token what's your lexer what are the types what things are your lexer going to return when you call get token so you're going to return token okay because you're trying to ID number tokens yeah so it's going to return one of your tokens that you've defined what else can you use the length the length the length no value well the problem is is you don't put epsilons in your generally you don't have epsilons in your string because you could have like if you they get abstracted away exactly yeah you don't have epsilons in your strings right so your token your lexer so you've read the lexer c code right it returns either the token what does this check eof looking for for the end of the file right that's a valid thing that your lexer returns it's checking for the end of the file right so end of file is kind of a meta token or token that represents that you've reached the end of the input file what's the third one error right yeah if you've read an error token but can you check so none of those are epsilon right none of those three are in epsilon so let's check okay so if so if this token type is a c right then how do we parse this we want to un-get the token we want to call parse c again and then what do we want to do check for the end of file to make sure but we need to make sure we match one other string what was it so each of these cases what are the strings that each of these cases match what was it yeah strings that start with a strings that start with b, strings that start with c is that all of the strings that this language can generate no which string is it not matching or what string the empty string so how do we know from what the lexer returns how do we know if it's the empty string if it's end of file if it's end of file why it's saying I didn't receive anything which is equivalent of say empty string yes let me check if it's the end of file right and I actually need to update this I should actually put that in here but we'll see other examples as we go forward because we still so can s itself generate an epsilon no right just like s itself didn't generate any little a's s itself didn't generate any little b's or little c's so actually this is incorrect because s shouldn't be handling the case where it's an end of file right who should handle that c because c knows what to do if it's an end of file right or c knows what happens when c goes to epsilon right s doesn't have to worry about that the only thing s has to worry about is s goes to big a s goes to big b s goes to big c those are the only things s is only deciding between those three cases that's it all the other languages are going to decide that so what if it doesn't match any of these three or four cases throw an error say syntax error so what is this part should be exactly like this part and actually you can put these in the same thing here you can say if t type is c or t type is end of file so note from this that we've so the only reason we're able to do this to look one token ahead and be able to decide which rule to apply is because we know that the first of a the first of b and the first of c they're all disjoint they have no intersection no overlap here but we can still write this in a similar style of hey try to parse a and if a says hey that's not a valid a you try to parse b and then if it says hey that's not a valid b you say okay then try to parse c so you can implement backtracking parser very easily but if you have a language like this where you can tell right off the back then you can make it a lot more efficient and so that's the idea of these what we're going to call predictive recursive descent parsers so recursive descent that means that we're going from the top down right and we're recursive because we're always calling this parse s parse a parse b parse c functions what does predictive mean predicting which branch s is going to take based on some parameter right so we're predicting and what we really want to do so it's predictive because we're we want to be efficient and the key point is that we're going to be efficient because we're only going to look at one token ahead and that's going to tell us exactly which way to go so are all grammars can all grammars support a predictive recursive descent parser or a grammar that does not if you had one that you needed to look at to tell the difference you needed to look multiple tokens ahead can we look at any of those examples today which one ones that we're looking at the end whether the difference was b or c yeah the ones that ended with either a c or a v right just by looking at one token ahead we can't tell exactly which one to take and so what helped us determine if we could actually support this kind of a parser exactly looking at the first characters that each rule can produce if they can each produce distinct characters then clearly we can tell just by looking at that one it's kind of a recursive definition we want to be efficient so we want to only look one character ahead so how do we know if each of the rules looking at one character ahead if they're all distinct and disjoint so that's why we're going to use this first function and this first function is going to be defined so this is just a little bit more specific here so the idea is that so alpha you're using alpha here to represent a sequence of terminals and non-terminals so we ask what's the first of little b little b we can say it's clearly little b or what's the first of little a big a well it's clearly just a so it can be any number of sequence of grammar symbols and then it returns a set of terminals and epsilon so this is something you should keep in mind because this helps you think about what the type of the first function is so that when you're doing this by hand you can say that inputs is a sequence and it's a sequence of terminals, non-terminals and epsilon and it only returns terminals and epsilon so if you're trying to compute first and you get a big a or a big b some non-terminal in there this helps you to self-check that okay and then on Wednesday we'll look at this isn't the only thing we need and we'll look at why we need that and what other functions we need